Consolidated data access in Hasura DDN with data connectors
The launch of Hasura v3 in beta marks a significant milestone in our journey toward providing the most robust, flexible, and efficient framework for building APIs on all your data sources. As we began work on v3, we sat down and discussed the principles we wanted the new version to adhere to. One of our principles was consolidating data access to data sources.
In this blog post, I’ll walk through how we implement this principle using data connectors. But first, a little history…
Hasura v1 and v2 data sources
The first version of Hasura focused exclusively on PostgreSQL – if you provided a PostgreSQL data source, you got a GraphQL endpoint with a standardized schema. As work on v1 progressed, we added several features – actions, remote schemas, event triggers, and new ways to query and mutate data – but the focus stayed on PostgreSQL.
Version 2 of Hasura was a significant refactor of the original product and enabled several new data sources – various new flavors of PostgreSQL, MSSQL, MySQL (alpha), and BigQuery. The new architecture of v2 allowed us to add new backends for new data sources, but we realized it was taking too long. Build times were slowing down, because we were compiling more and more Haskell modules, with an increasingly complex dependency graph.
So in 2021, we started investigating a more lightweight architecture for data sources, embracing more of a microservices mindset. We initially called this “dynamic backends,” and it worked with the existing v2 architecture by providing a new “generic” data source type. Instead of talking directly to a data source, the generic data source would delegate execution to a service. We decided on a protocol for communication between graphql-engine and these new services and wrote a specification. “Dynamic backends” became “GDC,” or “GraphQL Data Connectors,” and we were quickly able to build several new data sources using our new approach: Oracle, MySQL, MongoDB, MariaDB, Snowflake, ClickHouse, and Redshift, along with several others.
So when we began to discuss v3 principles, it was sensible to consolidate access to data sources entirely through a similar protocol, having seen such success with the GDC architecture in v2.
In the end, we decided not to use GDC exactly as it existed then, but instead revisit the protocol in the light of new design decisions coming in v3. We created the new NDC specification, informed by the needs of the v3 project. And as of this week, I’m happy to say, we finally released the first stable tag 0.1.0 of the specification.
NDC stands for native data connectors, in contrast to GDC (GraphQL data connectors). As of v3, NDC is not tied to Hasura as its only client (it is useful more generally), nor is Hasura tied to GraphQL as its only transport mechanism, so it made sense to generalize the name. Native indicates that we want connectors to expose the native capabilities of the target data source, pushing down execution as much as possible, and imposing as little structure as possible from the protocol itself.
So now let’s look at this protocol, why it’s been such a game changer for adding new data sources, and where it’s heading.
NDC in Hasura v3
We want to make it as easy for developers to add their own data source types, whether those developers work for Hasura or not.
In Hasura v3, developers enable access to their data sources via connectors. These connectors can be chosen from the Hasura Connector Hub, and customized. Alternatively, developers can build bespoke connectors for unique data sources and APIs.
To facilitate the development of connectors, we’ve written the NDC specification to clarify what is expected from connector authors, and how connectors can be expected to work and be used. The specification includes tools for connector authors – an automated test and benchmark runner, a reference implementation, and language SDKs for popular languages.
The NDC automated test runner, ndc-test, makes it possible to build a test suite for a connector, completely automatically, with tests inferred from the structure of your data source. Those same tests can be used as the basis for a benchmark suite, and developers can augment the generated tests with their own custom tests.
The NDC reference implementation is provided as a simple example of every specified NDC behavior and can be used to check the implementation of a connector by cross-referencing its behavior with the reference implementation. It’s also used internally by us at Hasura to verify that new proposed features make sense and are implementable by connector developers.
Finally, the language SDKs make it easier to build connectors by automatically implementing many of the boring details of a connector. Observability, packaging, error handling, and the details of managing a HTTP server are all handled by the SDK. All the developer needs to focus on is the interesting work of interacting with the data source.
In addition to the specification and tools, we’ve provided tutorials, guides, and examples of connectors, so that connector developers can learn in the style that best suits them. In particular, the Let’s Build a Connector course works through constructing a connector to a SQLite database, in small steps, one connector feature at a time.
NDC benefits
When we started work on this new microservices architecture, we already expected many of the benefits below, whereas some have become obvious over time. Some are just benefits of microservice architecture in general, and others are more unique:
NDC supports progressive enhancement
Shortly after we agreed on the principle that all data access should be consolidated behind NDC microservices, we landed on a corollary principle: A connector is just a URL. This is little more than the definition of a URL itself, saying that the service is uniquely identifiable by its URL. but the principle highlights an important detail for consumers of connectors. The simplest way to consume a connector is to provide its URL to Hasura.
Now, the question becomes, how do you get hold of a connector URL? There are various ways, but the simplest way is to already have one in hand, by having deployed the service yourself. But for connectors from the Hasura Connector Hub, we need to provide a way for users to obtain a URL from the name of a connector. That is, a user needs the ability to go from a name like “Postgres connector,” and some configuration like a connection string, to a working connector and its URL. Hasura provides conveniences like this in our CLI and console UI, which are integrated with the Connector Hub.
Once the connector URL is created and managed by Hasura, that connector can support more features. For example, the CLI can enable a “watch mode,” which reconfigures a connector based on changes to configuration files on disk, to facilitate rapid development. By standardizing things like deployment for hub connectors, users are also offered improved experiences for things like configuration management and observability.
So the NDC architecture enables a progressive enhancement. For connectors that are early in their development, under development, or company-internal, a URL might suffice. But as those connectors become more developed, or become publicly available, they can implement the additional specifications required to be included in the Connector Hub, and users benefit from additional features.
Support for non-cloud-native data sources
The earliest versions of this microservice architecture enabled access to other data sources available in the cloud, such as cloud-hosted databases – but, there’s no reason why we can’t provide data access across networks. The NDC microservice encapsulates a data source via a protocol served atop HTTP, so it’s simple to move it next to the data source, whether that source is a cloud-hosted database, a database or API hosted on an internal network, or even a bunch of flat files on disk.
Users can now extend the product without sending us pull requests and waiting for reviews.
Resource allocation and execution mobility
Because NDC encapsulates data sources, you can access data sources that require significant memory or resources, and the management of those resources can be handled independently of any required by Hasura’s core technology. You can also move the management of those expensive resources to a location that makes the most sense.
A microservices architecture allows for load-balancing access to data sources and independent scaling of these sources. Whether the solution for a resource problem is to split data up or to deploy more capable machines, you can adapt by deploying an appropriate topology of connectors.
Independent configuration
Data sources can be configured independently of each other – and independently of the product itself – providing greater flexibility and customization. Once a connector is configured and running, it is accessible via its URL, and it doesn’t need to be reconfigured or redeployed unless its configuration changes.
For example, just because you change something about your API layer doesn’t mean you need to reconfigure your running PostgreSQL connector.
Language independence for developers
In the previous architecture, adding a new backend to Hasura meant writing a PR against our Haskell server repository. If you didn’t know Haskell, you would have had to learn it, or else you’d have been out of luck. We love Haskell at Hasura, but we also want to enable connector developers as much as possible, so this approach wasn’t a great long-term solution.
Now, developers are no longer tied to a single language and can work independently of the main codebase, thanks to the microservices architecture. So if you need to access data on your mainframe using COBOL, go ahead!
Improved build times
Another issue with a consolidated build was the ever-increasing build times. The introduction of a microservices architecture has led to improved build times, even accelerating our own development process.
Separation of concerns
From a team-building perspective, team architecture often closely mirrors software architecture. If your software architecture has poorly defined abstractions and boundaries, teams tend to have a poor sense of code ownership and respective responsibilities. A monolithic architecture discourages any form of official code ownership.
On the other hand, a microservices architecture leads to a much healthier team dynamic – responsibility can be clearly defined so autonomy can increase. The complexity gets pushed to the areas between software components, which need to be specified clearly in the form of agreed interfaces.
This is how NDC improves team structure for us – it specifies the interfaces between teams creating data sources and the teams consuming them. Those teams can each work independently and with maximum autonomy, further improving development velocity.
Packaging of database drivers
Database drivers like libpq and ODBC can be packaged away with connectors, eliminating the need for these drivers to incur a dependency on the main product. Again, this simplifies your build, because you no longer need to package dependencies for every sort of data source (even if you might not be using them).
Adapter connectors
You can write adapter connectors for additional logging, tracing, performance monitoring, access to proprietary data formats or services, and multi-region or failover deployments. You can also write or extend these adapters, gaining flexibility to tailor the product to your specific needs.
What’s next?
Now that we’ve released the first version of the NDC specification, we’re committing to a few initiatives to make it as useful as possible:
- Build more connectors!
We aim to build as many connectors as possible, for as many different types of data sources as possible. So far, we have a number of different connectors available, but we will be working to increase this number as quickly as we can, so please get in touch and let us know what you would like to see. - Improve development tooling.
We want to provide the best possible tools for connector authors. In addition to improving the test and benchmark runners, we want to provide SDKs for more languages. Again, let us know what you would like to see. - Research new capabilities, in public.
NDC enables advanced connector features through capabilities, and we have several planned new capabilities to allow access to more data in richer ways. We’ll be sharing details of these proposed capabilities in RFCs on the specification repository, for public comment and review. - Better learning resources.
We will be increasing the amount of documentation and video training content available for connector authors.
Conclusion
The introduction of data connectors in Hasura v2, and its exclusive use in v3, represents a significant leap forward in our architecture. It not only addresses many limitations of our previous architecture but also brings a host of ancillary benefits that enhance Hasura’s flexibility, scalability, and efficiency.