Less pipelines, more shipping: Renaissance’s journey to Hasura with Snowflake
Renaissance is a leading K-12 educational software company. Founded in 1986, the company offers popular solutions like Freckle, Schoolzilla, and more to students and teachers in over 100 countries. Renaissance products are deployed in more than 50% of U.S. schools. Their unwavering mission: Accelerate learning for children and adults across all ability levels and backgrounds.
Renaissance has standardized on the Snowflake Data Cloud for their analytics and reporting needs. Since Snowflake aggregates data from across its full suite of products, it contains powerful insights across the whole user journey. Renaissance recently decided to surface some of these rich insights from Snowflake in one of their new applications, tentatively referred to as the Teacher Journey, to view insights and recommendations for their students from across all of Renaissance's products.
After attempting a few DIY options, Renaissance settled on the Hasura-Snowflake connector as their preferred solution for pulling the required data from Snowflake into the app via APIs. “Hasura eliminates so much of the work of building and maintaining APIs,” said Jesse Matlock, one of the core Renaissance engineers on the Teacher Journey team. He estimates that without Hasura, they would need a few extra full-time engineers to build and maintain the API application.
This blog describes how Matlock and his team evolved their application architecture and the steps they took along the way to simplify and improve their data pipeline.
Snowflake at Renaissance
Matlock and the Teacher Journey team use Snowflake to pull together data from Renaissance products like Freckle, an app where students practice math and English Language Arts (ELA). Each Renaissance product has its own data store. “Renaissance has acquired best-of-breed products, and not all of those products write to the same backend data store. Having an analytic data warehouse allows us to have a view into student learning while allowing our product teams to continue to innovate and develop quickly,” Matlock noted. The different Renaissance educational products sync their data into Snowflake daily.
All this aggregated data from our full suite of apps had lots of great information and insights. Renaissance wanted to enrich the experience for their end users – teachers, students, etc. – by pulling some of those insights into their consumer apps, leading to the development of the Teacher Journey.
“Snowflake is very good at building reports and aggregating data on a cadence,” said Matlock. “It's very cost-effective to just run it a few times a day to build reports that feed into the business, but it’s not as cost-effective to have an API hit it 100,000 times a day with very small queries.”
To tackle this challenge, Matlock's team initially decided on a pipeline that would funnel the required data from Snowflake to PostgreSQL. PostgreSQL would serve as the operational data store for the app. They decided to use Hasura to accelerate API development on PostgreSQL.
The first data pipeline: Snowflake to PostgreSQL
The Renaissance team used a combination of lambdas, SNS topics, and SQS queues to push the required data from Snowflake into PostgreSQL. “It worked, but there were bugs with the data integrity and lots of nuances,” said Matlock.
Their initial data pipeline used a Snowflake feature that copies data into Amazon S3 buckets. The tables in Snowflake are huge – 300 million records and gigabytes of data. Dropping the data into S3 is fast; the problems start with moving it to PostgreSQL. “It's slow because there's so much data,” explained Matlock. “You’re limited on RDS instance size and how many connections it can handle. We had massive instances that we would spin up when we were consuming data.”
Large RDS instances are also pricey. “It costs us $6,000-7,000 dollars a month just to run them for a few hours a day,” Matlock noted. “The instances had to be that big to get sync times down to an hour or two instead of eight” he added. Minimizing that window is important since the new app needs the freshest data possible, as the data in Snowflake already lags a bit behind live prod data.
Data mapping hiccups
Mapping data from Snowflake to PostgreSQL took some effort. Data comes into Snowflake from multiple software products, and each is structured and typed differently. Snowflake handles variations gracefully, so the discrepancies were not evident until the data got to PostgreSQL.
Take UUIDs, for example. “One app may use a UUID, but not be strict on how it is formatted,” said Matlock. “It could be a UUID or a string or a number. In migrating that data to PostgreSQL, we would set the ID field to UUID, and halfway through syncing we realized this product uses a string. Then we would have to go into PostgreSQL and make that work by storing all UUIDs as a string.”
Constant pipeline issues
The data pipeline was becoming unwieldy because it moved so much data from one system to the next and required strict schedule coordination. The team was spending a lot of time managing and debugging this pipeline.
In short, running the pipeline took a long time, and it was expensive. It was also hard to maintain. “We realized it's going to cost us more money to build and maintain this pipeline than to find a cost-effective way to leverage Snowflake directly,” said Matlock.
Rapidly building APIs on PostgreSQL with Hasura
Matlock and team were already using Hasura on PostgreSQL in this architecture, and they had experienced its benefits for accelerated GraphQL API development. Matlock first heard about Hasura through a coworker who was dabbling with it for a personal project. The coworker suggested using Hasura for a different project they were both working on and it worked really well.
“We had previously handwritten a GraphQL API using Express and some Node libraries,” said Matlock. That handcrafted API performed well enough, but it had been a lot of work to create. “Hasura is hands-off. I don't have to make a lot of changes. And I don't have to make new endpoints. Hasura does it all for you,” said Matlock.
When they decided to explore ways to directly connect the app to Snowflake, the then newly-launched Hasura connector for Snowflake was an obvious first choice.
Hasura + Snowflake = Goodbye unwieldy pipelines
Using the Hasura Snowflake connector means no more unwieldy pipelines; instead, Snowflake is now the operational store for the new application.
“We transitioned to using the Snowflake connector in the last month. We've gotten rid of this big pipeline of copying things from Snowflake to PostgreSQL,” Matlock explained. “Now we just hit Snowflake directly.”
From a Hasura standpoint, moving to the Snowflake connector took no time at all. “It's super easy,” said Matlock. “We pointed from PostgreSQL to Snowflake and added some config files.”
The smooth migration and simpler architecture was a great start. However, the team needed to make sure two other criteria were met since the new app will be used by thousands of teachers multiple times per day.
- Avoid ballooning costs from frequent queries to Snowflake
- Keep end-to-end latencies low for the end user
The Hasura caching feature helped with both of these requirements. When teachers log in each morning, their results come from Snowflake and are cached in Hasura. If they log in again at lunch, the app goes straight to the cached results in Hasura.
On the performance side, one of the big questions going into this transition was whether Snowflake could handle a live API call, which isn't necessarily quick most of the time. Matlock says, “Snowflake's data structure isn't built to handle those sorts of queries. We heavily structured the data and partitioned it in Snowflake so that when you request it by a specific user ID, the query returns very quickly.”
Combined with Hasura caching and query plan optimization, the team was able to achieve latencies in the 300-350 ms range in this new architecture. Previously, a query from Hasura to PostgreSQL took 200-250 ms. The difference isn’t noticeable.
Less engineering, more app development with Hasura
The Snowflake connector simplified their pipeline, and using Hasura saves Matlock’s team a lot of time.
Making changes to Hasura is also very simple. Matlock reported they spend just a few hours each month on these changes. “We just go into the config files and tweak the permissions or change a few tables here and there,” he said.
After changing the configuration files, the team applies them with a CI/CD pipeline that uses Hasura's command line tool. “Hasura is hands-off, low code. I don't have to spend a lot of time building and testing our endpoints,” said Matlock.
The team uses advanced Hasura features, including authentication via JWT and row-level permissions. They’ve also started experimenting with Hasura observability.
Matlock estimates that time savings from Hasura are equivalent to the work of a few full-time backend software engineers. “If we didn’t use Hasura, we'd have to hire an engineer or allocate one of our engineers to be in charge of building out features in the API code – things that Hasura comes with off the shelf like pagination, caching, and all these things that we would have to build ourselves. Hasura just does it for us,” said Matlock.
Looking to the future
The Teacher Journey team is looking at combining Snowflake and PostgreSQL as a part of their roadmap. They anticipate that the need to serve more fresh and near real-time data in this new application will grow, such as displaying results from a recent test that a classroom has taken.
Frequently hitting Snowflake with smaller queries to get the latest data will become less cost-effective than having some of the less complex data cached in an alternate database like PostgreSQL. Since Hasura seamlessly integrates into multiple databases, the team can keep the architecture simple by making the data from both of the databases available under the same GraphQL endpoint.
In addition to multiple databases, the team is also looking at implementing a few Remote Schemas to interact with other internal tools and services at Renaissance. This will allow the GraphQL API to serve data from more than just the databases.