OpenTelemetry Best Practices
Identify different applications
GraphQL Engine uses the hasura
service name by default. If you have many different GraphQL Engine applications you
should configure different service.name
attributes for each application to identify and filter metrics more easily.
Use OpenTelemetry Collector
Most of observability services require the OpenTelemetry Collector as a proxy to push data. Using OpenTelemetry Collector also helps:
- Route and export data to multiple services.
- Optimize the cost with sampling and filter processors.
- Transform and calculate aggregate metrics.
Cost optimization
Sampling
The OpenTelemetry Collector supports many strategies to sample spans and log records. See OpenTelemetry Collector Contrib docs for more context.
Trace
You may not want to trace all operations, for example, you only want to trace GraphQL requests or individual GraphQL operations. Let's configure the Filter Processor.
processors:
filter:
error_mode: ignore
traces:
span:
# - 'IsMatch(name, "Event trigger")'
# - 'IsMatch(name, "Scheduled trigger")'
# - 'IsMatch(name, "websocket")'
- 'IsMatch(name, "/v1/version")'
- 'IsMatch(name, "/v1/entitlement")'
- 'IsMatch(name, "/v1alpha1/config")'
# filter unused graphql operation
- attributes["graphql.operation.name"] == "MyQuery"
Logs
Similar to traces, besides disabling log types via environment variables, you may want to filter individual logs:
processors:
filter:
error_mode: ignore
logs:
log_record:
- 'attributes["type"] == "query-log" and IsMatch(body["query"]["operationName"], "UnknownQuery")'
- 'attributes["type"] == "http-log" and IsMatch(body["operation"]["query"]["operationName"], "UnknownQuery")'
Check out more configuration examples here.
Monitoring
Logging
OpenTelemetry logs are printed in an unstructured
type with a warn
level if runtime errors happen.
{
"detail": "OTel exporter: Failed to deliver logs: Encountered retryable HTTP exception: ConnectionFailure ...",
"level": "warn",
"timestamp": "2024-06-19T06:39:51.704+0000",
"type": "unstructured"
}
Metrics
GraphQL Engine exports metrics to monitor the number of sent and dropped (failed) trace spans or log records. You can collect them via OpenTelemetry metrics or the native Prometheus exporter.
See the list of available metrics here.
Download and install the Grafana dashboard if you are using Prometheus.