Hasura GraphQL on Your Hydra Data Warehouse
Developers around the globe have trusted Hasura’s GraphQL Engine. Hasura provides an extremely lightweight, high performance product that gives you instant realtime GraphQL APIs on a Postgres database.
Hydra is the open source data warehouse, built from the ground up on Postgres. With Hydra’s columnar tables, Postgres can now execute analytic reads 30X faster and with 4X better data compression than standard row-based tables. Together with Hasura, Hydra now supports: Instant GraphQL & REST APIs, Declarative Role Based Authorization, Advanced Security and Performance with rate limiting, allow lists, and caching. Get your open source data warehouse at hydras.io. Email [email protected] with the subject line “Hasura” to bypass the Hydra sign-up waitlist.
We'll cover:
- Instant GraphQL APIs on Hydra using Hasura
- Authentication and Authorization
- Run GraphQL on a Columnar Table
- Add Caching to GraphQL API over Hydra
HYDRA: The Open Source Data Warehouse
A Source of Truth Built on Postgres
A data warehouse aggregates data from different sources into a single, central, consistent data store to support data analysis, data mining, artificial intelligence (AI), and machine learning. Data warehouses are referred to as an organization’s source of truth because comprehensive business problems can be solved when querying and analyzing the full data set across multiple dimensions. While data warehouses have traditionally been deployed in enterprise, early-stage teams can also see business intelligence (BI) advantages to aggregating multiple data sources, such as tracking a prospects journey across marketing campaigns, sales opportunities, product engagement, and retention. Hydra brings the source of truth advantages to every organization by offering an open source data warehouse built on Postgres.
As Hydra is built on Postgres, developers can use the full Postgres ecosystem with their data warehouse. All existing Postgres tools, database extensions, language drivers, data pipelines, and ORMs just work.
Instant GraphQL APIs on Hydra using Hasura
Hasura is an open source GraphQL engine that gives you instant GraphQL and REST APIs over multiple data sources. Postgres compatible DBs are natively supported and Hydra being a Postgres compatible data warehouse, can be connected with Hasura as a data source to get GraphQL APIs instantly. Hydra supports both row and columnar tables and the instant GraphQL APIs work out of the box for both.
Apart from the instant CRUD APIs, Hasura comes with built-in support for Authorization, Triggers on Database events and the ability to extend your GraphQL API with custom schema via Remote Schemas and Actions.
Finally, you can also join the data across different data sources. Imagine joining data between two data warehouses on Hydra! This is possible via GraphQL Joins that lets you combine data from different database tables and servers in the same GraphQL query.
Let’s look at a quick demo to connect Hydra with Hasura GraphQL Engine.
RUN HYDRA AND HASURA USING DOCKER COMPOSE
Copy the following docker-compose.yaml to your local machine for setting up the environment.
version: '3.6'
services:
postgres:
image: postgres:14
restart: always
volumes:
- db_data:/var/lib/postgresql/data
environment:
POSTGRES_PASSWORD: postgrespassword
ports:
- "5432:5432"
hydra:
platform: linux/amd64
image: ghcr.io/hydrasdb/hydra
restart: always
environment:
PGPASSWORD_SUPERUSER: hydra
ports:
- "6432:5432"
graphql-engine:
image: hasura/graphql-engine:v2.11.1
ports:
- "8080:8080"
depends_on:
- "hydra"
restart: always
environment:
## postgres database to store Hasura metadata
HASURA_GRAPHQL_METADATA_DATABASE_URL: postgres://postgres:postgrespassword@postgres:5432/postgres
## this env var can be used to add the above postgres database to Hasura as a data source. this can be removed/updated based on your needs
PG_DATABASE_URL: postgres://postgres:hydra@hydra:5432/postgres
## enable the console served by server
HASURA_GRAPHQL_ENABLE_CONSOLE: "true" # set to "false" to disable console
## enable debugging mode. It is recommended to disable this in production
HASURA_GRAPHQL_DEV_MODE: "true"
HASURA_GRAPHQL_ENABLED_LOG_TYPES: startup, http-log, webhook-log, websocket-log, query-log
volumes:
db_data:
Now run docker compose up -d
to start Hasura and Hydra along with an isolated Postgres instance for handling Hasura metadata.
CONNECT HYDRA WITH HASURA
Head to Hasura Console to establish the Hydra data source as a connection.
Click on “Connect Database" in the end. Once the data source is added successfully, you can start creating tables and inserting rows to play around with the API.
EXPLORE GRAPHQL CRUD ON HYDRA
We can start by creating a table on Hasura Console. The console has a UI to create tables or if you prefer raw SQL, head to the SQL
page on the Data tab and execute the following to create the table:
CREATE TABLE public.message (
id bigserial NOT NULL,
username text NOT NULL,
text text NOT NULL,
"timestamp" timestamp with time zone DEFAULT now() NOT NULL
);
This will create a table called message
with id
, username
, text
and timestamp
columns, a basic realtime chat app schema used for ingesting and consuming large amounts of data.
Head to the API Explorer tab to see the generated CRUD APIs for the above table:
Now, let’s insert some data using GraphQL mutations:
mutation insertMessage {
insert_message_one(object: {
text: "Hello Hydra!",
timestamp: "now()",
username: "praveenweb"
}) {
id
}
}
Note that, we have an auto-generated GraphQL mutation that now inserts data into Hydra.
Let’s fetch the inserted data via a GraphQL query.
query fetchMessages {
message {
id
username
text
timestamp
}
}
This will give back a response that looks like:
{
"data": {
"message": [
{
"id": 1,
"username": "praveenweb",
"text": "Hello Hydra!",
"timestamp": "2022-09-06T17:29:16.724629+00:00"
}
]
}
}
We have now tried both GraphQL queries and mutations! What about realtime stream?
Just substitute the query with subscription and add _stream
to the GraphQL field and you can now start streaming realtime data! 🤯
Here’s a sample subscription stream that you can try out:
subscription fetchMessages {
message_stream(cursor: {initial_value: {timestamp: "2022-09-01"}, ordering: ASC}, batch_size: 1) {
id
username
text
timestamp
}
}
This would stream all the rows from the above cursor timestamp value, to the client. Learn more about how backpressure and scaling is handled when a large influx of data comes in realtime.
Authentication and Authorization
Hasura supports Authentication via JWT and custom webhooks. In order to expose the GraphQL API to a public facing client, you need to start defining Authorization rules to be able to restrict access to data. With a data warehouse use case, different administrators can access different data sets in an app. It is important to be able to define role based access control (i.e RBAC) to allow / restrict data access for these large amounts of data.
Hasura supports a declarative Authorization layer over Hydra DB tables and you can define row level and column level permissions for the same.
For example, in the above realtime streaming app use case, if we want to restrict access for users to the messages stream to access only their own data, we could set up a role based permission for select
access on table message
.
Head to the “Permissions” tab of the messages
table and apply a row level check like where the X-Hasura-User-Id
would come in from a JWT custom claim. Add the following custom check
{"username": {"_eq": "X-Hasura-User-Id"}}
Moving on to column level permissions gives you the ability to hide data from selected columns to be available on the API.
The next time you make a request with a JWT token containing the X-Hasura-User-Id
values in the claims, the permission rules will be applied automatically and only the authorized data will be available for access on the client. Read more about Authorization with Hasura.
Run GraphQL on a Columnar Table
INSERT SAMPLE DATA WITH PGBENCH
After setting up tables and AuthZ rules for the API, let’s setup some performance benchmarks to test Hydra’s performance with a Hasura GraphQL API.
We will make use of pgbench
to generate 5 million records quickly, to perform a small benchmark. Replace the last argument -s to insert more records.
pgbench -h localhost -p 6432 -U postgres postgres -i -s 50
This will generate tables named pgbench_accounts
among others. Head to the Console -> Data -> public schema and track the table pgbench_accounts
.
CREATE COLUMNAR TABLE AND INSERT RECORDS
Once we have the records for testing, let’s make use of columnar storage in Hydra to reduce the storage size and optimize for performance. Head to Console -> Data -> SQL tab to execute the following statements:
CREATE TABLE pgbench_accounts_columnar
(LIKE pgbench_accounts)
USING COLUMNAR;
and follow it up with the inserts.
INSERT INTO pgbench_accounts_columnar
SELECT * FROM pgbench_accounts;
Hasura can now be used to configure Authorization rules on these columnar tables as well.
Once the columnar inserts are done, you can verify the compression ratio by executing the following command:
SELECT
pg_total_relation_size('pgbench_accounts')::numeric /
pg_total_relation_size('pgbench_accounts_columnar')
AS compression_ratio;
These records are simple data and the compression ratio turns out to be ~50x.
QUERY USING GRAPHQL API
The tables are already tracked by Hasura, including the one with columnar storage, we can try out the GraphQL APIs to query the data.
You can execute the following GraphQL query to fetch all the records.
query {
pgbench_accounts_columnar {
aid
bid
abalance
}
}
The pgbench_accounts_columnar
table will have columnar storage enabled and would result in smaller storage size and faster performance for large amounts of data, i.e. millions of records.
This is just the beginning of what you can do with the data warehouse now. Using a columnar storage already results in faster performance for large data sets. In case you are fetching smaller data sets, you could make use of GraphQL Query Response Caching with Hasura to speed up queries.
Add Caching to GraphQL API over Hydra
Hasura Cloud supports query response caching that gives faster response times by automatically caching authenticated data. Going back to the message query that we used earlier to fetch the latest message stream of a user; We can quickly add an @cached
directive to the query to start caching data, thus reducing latency and a roundtrip to Hydra.
For example:
query fetchMessages @cached {
message {
id
username
text
timestamp
}
}
This will cache the response for the first time, and any subsequent hits will retrieve data from the cache that Hasura Cloud maintains.
There’s a whole bunch of security configurations that can be added to your API and you can monitor them on Hasura Cloud.
Combining Hasura GraphQL API with Hydra enables a clean data access layer over a large data warehouse, aggregating data from multiple sources with a secure Authorization model and a performant API at both the database layer and at the API layer with caching.
Break the Lock!
At Hydra, we believe that an organization’s source of truth must be built on an open source foundation. Hydra users have the control to create a pull request, the visibility to flag issues or review Hydra code, and is accessible to every developer to use freely and without corporate permission on their local environment.
Open source is the standard that drives our product development and decisions. People can build amazing things when you give them equal footing with the best tools available. We don’t believe your source of truth should sit behind proprietary software, do you?