Skip to main content
Version: v2.x

OpenTelemetry Best Practices

Identify different applications

GraphQL Engine uses the hasura service name by default. If you have many different GraphQL Engine applications you should configure different service.name attributes for each application to identify and filter metrics more easily.

Use OpenTelemetry Collector

Most of observability services require the OpenTelemetry Collector as a proxy to push data. Using OpenTelemetry Collector also helps:

  • Route and export data to multiple services.
  • Optimize the cost with sampling and filter processors.
  • Transform and calculate aggregate metrics.

Cost optimization

Sampling

The OpenTelemetry Collector supports many strategies to sample spans and log records. See OpenTelemetry Collector Contrib docs for more context.

Trace

You may not want to trace all operations, for example, you only want to trace GraphQL requests or individual GraphQL operations. Let's configure the Filter Processor.

processors:
filter:
error_mode: ignore
traces:
span:
# - 'IsMatch(name, "Event trigger")'
# - 'IsMatch(name, "Scheduled trigger")'
# - 'IsMatch(name, "websocket")'
- 'IsMatch(name, "/v1/version")'
- 'IsMatch(name, "/v1/entitlement")'
- 'IsMatch(name, "/v1alpha1/config")'
# filter unused graphql operation
- attributes["graphql.operation.name"] == "MyQuery"

Logs

Similar to traces, besides disabling log types via environment variables, you may want to filter individual logs:

processors:
filter:
error_mode: ignore
logs:
log_record:
- 'attributes["type"] == "query-log" and IsMatch(body["query"]["operationName"], "UnknownQuery")'
- 'attributes["type"] == "http-log" and IsMatch(body["operation"]["query"]["operationName"], "UnknownQuery")'

Check out more configuration examples here.

Monitoring

Logging

OpenTelemetry logs are printed in an unstructured type with a warn level if runtime errors happen.

{
"detail": "OTel exporter: Failed to deliver logs: Encountered retryable HTTP exception: ConnectionFailure ...",
"level": "warn",
"timestamp": "2024-06-19T06:39:51.704+0000",
"type": "unstructured"
}

Metrics

GraphQL Engine exports metrics to monitor the number of sent and dropped (failed) trace spans or log records. You can collect them via OpenTelemetry metrics or the native Prometheus exporter.

See the list of available metrics here.

Download and install the Grafana dashboard if you are using Prometheus.