GraphQL Observability with Hasura GraphQL Engine and Honeycomb

Observability (in software world) means you can answer any questions about what’s happening on the inside of the system just by observing metrics from outside of the system, without having to modify the working deployment to support this.

GraphQL Observability with Honeycomb and Hasura

Why an Observable System?

Without an observable system, it would be difficult to understand what is going wrong with the system unless you are monitoring for known issues. At any point of time, you should be able to ask any arbitrary question about how your application works.

GraphQL Observability: In a GraphQL application, these are the important metrics and context to capture:

time of query and query execution time
actual query payload / query hash
response status codes of queries/mutations/subscriptions
graphql server version
ip_address from which the query originated

and specifically in the case of Hasura GraphQL Engine, you might want to capture context like

user_id of the user who made the query
role of the user
metadata of the query

With this external information available, you can ask meaningful questions in a production deployment to find what went wrong internally, or why your GraphQL backend is behaving the way it is. For example, if you see anomalies in query execution time for a particular query hash, you can try to identify what is wrong with the query (may be there is a database bottleneck that you need to optimise).

Just having this information set up for monitoring also might not be enough. To make it meaningful, you can set up alerts — for example: triggers for requests/min, errors/min with thresholds and get notified when the server is behaving differently.

By the end of this tutorial, you should be able to setup Hasura GraphQL Engine with an observable system, Honeycomb in this case, and help you understand, optimise and control your GraphQL server. Here’s a preview of what you will get at the end.

Honeycomb Dashboard with GraphQL Engine Logs

You can also choose to create a visual view of your logs based on filters. For example, to monitor error codes across a given time interval.

Error Code Capturing across 2 day time interval

Set up Hasura GraphQL Engine on GKE

This documentation assumes that you have Hasura GraphQL Engine installed on Google Kubernetes Engine. Instructions for deployment is available here.

Set up Honeycomb Agent on Kubernetes

Signup on Honeycomb to start the Kubernetes agent setup. Honeycomb uses kubernetes secret to store the API key. This will be available on Honeycomb after Signing up.

kubectl create secret generic -n kube-system honeycomb-writekey \
— from-literal=key=<your-key>

We will use Kubernetes DaemonSets to automatically deploy the Honeycomb Agent on all nodes.

Once the DaemonSets extension is enabled, create a daemonset using the following command to deploy the agent

$ kubectl apply -f https://honeycomb.io/download/kubernetes/logs/quickstart.yaml

Note: in production clusters, remove mountpath configuration for `minikube-varlibdockercontainers`

Sending cluster events and resource metrics

Execute the following command to start sending cluster events and resource metrics from kubernetes.

$ kubectl create -f https://honeycomb.io/download/kubernetes/metrics/honeycomb-heapster.yaml

Sending cluster state metrics

Deploy kube-state-metrics and the Honeycomb kube-state-metrics adapter to collect more fine grained data.

Start kube-state-metrics

$ kubectl create -f https://honeycomb.io/download/kubernetes/metrics/kube-state-metrics.yaml

Start the Honeycomb adapter

$ kubectl create -f https://honeycomb.io/download/kubernetes/metrics/honeycomb-state-metrics-adapter.yaml

Check the status of your new pods:

kubectl get pods -n kube-system

Head over to the Honeycomb UI and query the kubernetes-state-metrics dataset to verify if everything is working.

Sending GraphQL Engine logs to honeycomb

Verify graphql-engine logs by executing the following command.

$ kubectl logs -l app=hasura-graphql-engine -c graphql-engine`

Here -c is specified because hasura-graphql-engine has multiple containers running.

Apply the following configmap.

echo ‘
 — -
apiVersion: v1
kind: ConfigMap
metadata:
 name: honeycomb-agent-config
 namespace: kube-system
data:
 config.yaml: |-
 watchers:
 — dataset: kubernetes-logs
 labelSelector: app=hasura-graphql-engine
 containerName: graphql-engine
 parser: json
‘ | kubectl apply -f -
`

After updating the configmap, restart the agent pods:

$ kubectl delete pods -l k8s-app=honeycomb-agent -n kube-system

In the Honeycomb UI, go to the kubernetes-logs dataset and open the Schema tab.

Scroll down to see options. Check the `Automatically unpack nested JSON` and select the level of nesting to `5`.

Unpack nested JSON to get detailed parsing of GraphQL Engine logs

You can now get detailed report on the logs / apply filters depending on the requirements.

You can see query_execution_time, query_hash, response_size, status code etc for inspection.

Dashboard Filters

Now start apply filters to get granular details. Lets apply a filter to get all queries which resulted in an error. The filter would be detail.status != 200 . Here’s what the dashboard looks like.

Now, lets get a performance overview by filtering/sorting queries using query_exection_time column. Here’s what a typical filter/sort would look like:

Alerts for Anamolies

Now that we have got a holistic view of your GraphQL queries, let’s set up an alert to get notified via Email when something is wrong.

We can apply a filter for query_execution_time ≥2 to get notified for slow queries. This is just an example. You can tweak the number of seconds to a desired value depending on the application requirements.

Now, the focus can be on the application development and this observable system setup could notify us on whats going wrong with the unpredictable parts of your GraphQL server.

For further reference, refer to Honeycomb’s kubernetes integration docs