Observability & Performance Tuning
Subscription Execution
For serving subscription requests, Hasura optimizes the subscription execution to ensure it is as fast as possible while not overloading the database with concurrent queries.
To achieve this, Hasura uses a combination of the following techniques:
- Grouping same queries into "cohorts" : Hasura groups subscriptions with the same set of query and session variables into a single cohort. The subscribers in a cohort are updated simultaneously.
- Diff-checking: On receiving response from the database, Hasura checks the diff between the old and new values and sends the response only to the subscribers whose values have changed.
- Multiplexing: Hasura groups similar "parameterized" subscriptions together and additionally splits them into
batches for efficient performance on the database. The batch size can be configured using the
HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE
environment variable. You can read more about multiplexing here. For example, in the image below, we are grouping the three subscriptions into a single multiplexed query.

For more details on how Hasura executes subscriptions, refer to the live query or streaming subscription documentation.
Observability
Hasura exposes a set of Prometheus Metrics that can be used to monitor the subscriptions system and help diagnose performance issues.
Active Subscriptions
Current number of active subscriptions, representing the subscription load on the server.
Name | hasura_active_subscriptions |
Type | Gauge |
Labels | subscription_kind : streaming | live-query, operation_name , parameterized_query_hash |
Active Subscription Pollers
Current number of active subscription pollers. A subscription poller multiplexes similar subscriptions together. The value of this metric should be proportional to the number of uniquely parameterized subscriptions (i.e., subscriptions with the same selection set, but with different input arguments and session variables are multiplexed on the same poller). If this metric is high then it may be an indication that there are too many uniquely parameterized subscriptions which could be optimized for better performance.
Name | hasura_active_subscription_pollers |
Type | Gauge |
Labels | subscription_kind : streaming | live-query |
Active Subscription Pollers in Error State
Current number of active subscription pollers that are in the error state. A subscription poller multiplexes similar subscriptions together. A non-zero value of this metric indicates that there are runtime errors in atleast one of the subscription pollers that are running in Hasura. In most of the cases, runtime errors in subscriptions are caused due to the changes at the data model layer and fixing the issue at the data model layer should automatically fix the runtime errors.
Name | hasura_active_subscription_pollers_in_error_state |
Type | Gauge |
Labels | subscription_kind : streaming | live-query |
Subscription Total Time
The time taken to complete running of one subscription poller.
A subscription poller multiplexes similar subscriptions together. This subscription poller runs every 1 second by default and queries the database with the multiplexed query to fetch the latest data. In a polling instance, the poller not only queries the database but does other operations like splitting similar queries into batches (by default 100) before fetching their data from the database, etc. This metric is the total time taken to complete all the operations in a single poll.
In a single poll, the subscription poller splits the different variables for the multiplexed query into batches (by
default 100) and executes the batches. We use the hasura_subscription_db_execution_time_seconds
metric to observe the
time taken for each batch to execute on the database. This means for a single poll there can be multiple values for
hasura_subscription_db_execution_time_seconds
metric.
Let's look at an example to understand these metrics better:
Say we have 650 subscriptions with the same selection set but different input arguments. These 650 subscriptions will be grouped to form one multiplexed query. A single poller would be created to run this multiplexed query. This poller will run every 1 second.
The default batch size in hasura is 100, so the 650 subscriptions will be split into 7 batches for execution during a single poll.
During a single poll:
- Batch 1:
hasura_subscription_db_execution_time_seconds
= 0.002 seconds - Batch 2:
hasura_subscription_db_execution_time_seconds
= 0.001 seconds - Batch 3:
hasura_subscription_db_execution_time_seconds
= 0.003 seconds - Batch 4:
hasura_subscription_db_execution_time_seconds
= 0.001 seconds - Batch 5:
hasura_subscription_db_execution_time_seconds
= 0.002 seconds - Batch 6:
hasura_subscription_db_execution_time_seconds
= 0.001 seconds - Batch 7:
hasura_subscription_db_execution_time_seconds
= 0.002 seconds
The hasura_subscription_total_time_seconds
would be sum of all the database execution times shown in the batches, plus
some extra process time for other tasks the poller does during a single poll. In this case, it would be approximately
0.013 seconds.
Name | hasura_subscription_total_time_seconds |
Type | Histogram Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
Labels | subscription_kind : streaming | live-query, operation_name , parameterized_query_hash |
Subscription Database Execution Time
The time taken to run the subscription's multiplexed query in the database for a single batch.
A subscription poller multiplexes similar subscriptions together. During every run (every 1 second by default), the poller splits the different variables for the multiplexed query into batches (by default 100) and execute the batches. This metric observes the time taken for each batch to execute on the database.
If this metric is high, then it may be an indication that the database is not performing as expected, you should consider investigating the subscription query and see if indexes can help improve performance.
Name | hasura_subscription_db_execution_time_seconds |
Type | Histogram Buckets: 0.000001, 0.0001, 0.01, 0.1, 0.3, 1, 3, 10, 30, 100 |
Labels | subscription_kind : streaming | live-query, operation_name , parameterized_query_hash |
Golden Signals for subscriptions
You can perform Golden Signals-based system monitoring with Hasura's exported metrics. The following are the golden signals for subscriptions:
Latency
The latency of a subscription is defined as the time taken to complete one fetch cycle for the subscription. To monitor
latency, you can monitor the hasura_subscription_total_time_seconds
metric.
If the value of this metric is high, then it may be an indication that the multiplexed query is taking longer to execute
in the database, verify this with
hasura_subscription_db_execution_time_seconds
metric. If the value of this metric is high as well, you can do the following:
- Check if any database index can help improve the performance of the query, analyzing the GraphQL query will show the generated multiplexed query.
- Avoid querying unnecessary fields that translate to joins or function calls in the GraphQL query.
- Consider adding more read replicas to the database and running subscriptions on them.
Traffic
The traffic of a subscription is defined as the number of subscriptions that are active at any given point of time. To
monitor traffic, you can monitor the hasura_active_subscriptions
metric.
If the value of this metric is high (and above your established baseline), then you may want to consider increasing the number of Hasura instances to handle the load.
Errors
Errors in subscriptions can be monitored using the following metrics
hasura_graphql_requests_total{type="subscription",response_status="error"}
: Total number of errors that happen before the subscription is started (i.e. validation, parsing and authorization errors).hasura_active_subscription_pollers_in_error_state
: Number of subscription pollers that are in error state.
If the value of the hasura_active_subscription_pollers_in_error_state
is non-zero, you should consider investigating the livequery-poller-log
logs (this will include the error message) to
debug the failing subscription.
Saturation
Saturation is the threshold until which the subscriptions can run smoothly; once this threshold is crossed, you may see performance issues, and abrupt disconnections of the connected subscribers.
To monitor the saturation for subscriptions, you can monitor the following:
- CPU and memory usage of Hasura instances.
- For postgres backends, you can monitor the
hasura_postgres_connections
metric to see the number of connections opened by Hasura with the database. - P99 of the
hasura_subscription_total_time_seconds
metric.
If the number of database connections is high, you can consider increasing the max_connections
of the database.
You can also consider scaling the Hasura instances horizontally and vertically to handle the load.
Tuning subscription performance
Hasura GraphQL Engine is designed to scale to handle millions of concurrent subscriptions. However, due to misconfigurations or inefficient queries, the performance of subscriptions can be impacted. This section describes how to analyze and tune the performance of subscriptions.
Analyze
Using the Analyze
button on GraphiQL API Explorer of the Hasura Console, you can see the generated
multiplexed SQL and its query plan. The query plan can reveal the bottlenecks in query execution and appropriate indexes
can be added to improve performance.
In addition to these, simplifying the subscription to avoid unnecessary joins or avoiding fetching fields which are not going to change can also help improve performance.
Performance tuning
The parameters governing the performance of subscriptions in terms of throughput, latency and resource utilization are:
HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_REFETCH_INTERVAL
- Time interval between Hasura multiplexed queries to the DB for fetching changes in subscriptions data, default = 1 sec
- Increasing this reduces the frequency of queries to the DB, thereby reducing its load and improving throughput while increasing the latency of updates to the clients.
HASURA_GRAPHQL_LIVE_QUERIES_MULTIPLEXED_BATCH_SIZE
- Number of similar subscriptions batched into a single SQL query to the DB, default = 100
- Increasing this reduces the number of SQL statements fired to the DB, thereby reducing its load and improving throughput while increasing individual SQL execution times and latency.
- You should reduce this value if the execution time of the SQL generated by Hasura after multiplexing is more than the refetch interval.
max_connections
of the source- Max number of connections Hasura opens with the DB, default = 50 (configurable via update source metadata API)
- Increasing this increases the number of connections Hasura can open with the DB to execute queries, thereby improving concurrency but adding load to the database. A very high number of concurrent connections can result in poor DB performance.
The default values offer a reasonable trade-off between resource utilization and performance for most use cases. For scenarios with heavy queries and a high number of active subscriptions, you need to benchmark the setup and these parameters need to be iterated upon to achieve optimal performance.