Improving Query Performance with Hasura’s Response Cache

Talk

Transcript

Phil

00:00

Hi everyone, my name's Phil. I am the server engineering lead here at Hasura. It's a pleasure to be able to give this presentation today. I'm going to talk about the Hasura's Response Cache, which is a tool you can use to improve the performance with your queries and something the server team's been working on for a while now and improving. First of all, here's a quick look at what I'm going to talk about. I'll talk a bit about why you might need caching in your application, how caching is implemented in Hasura and why Hasura is a good place in your stack to implement it. And then I'd like to go through a few of the newer features related to caching. And finally, I'm going to talk about a few different setups for building a custom cache in Hasura, so actually, as we'll see, caching in Hasura is less of a single-feature and more of a toolkit. And we can build some more advanced caching setups by piecing together the tools from that toolkit, which I'm going to try and demonstrate at the end.

Phil

1:00

Let's start by looking at why we might want to use response caching in the first place? And why it's such a natural fit for Hasura? Let's say we have a query like this, and maybe it runs a little too slowly. And now there's many reasons why a query might run slowly, of course. Maybe it just selects a lot of data, here we're selecting a hundred rows and the cost of fetching those hundred rows maybe the data fetching and serialization just takes time, or maybe we're not fetching that much data, but the database still has to scan through a lot of data because the query uses features like aggregations across a lot of rows. Maybe we even need to do something like ordering data by an aggregated measure, which is going to just push a lot of costly work to the database, and that just takes time as well.

Phil

1:45

In this query here, I'm fetching the most expensive albums by cost in a database containing musical artists and albums, so I'm ordering the rows by the total cost. And that total cost itself is computing using aggregation, so the limit of a hundred rows is getting applied after the sorting, so the database still has to scan through potentially all of the data in order to do that ordering, and each row needs to incorporate an aggregate, so it has a lot of work to do. There's other reasons why a query might be slow, so here I'm using an action in Hasura to pull additional artist information from the musicbrainz API, and I've used an action relationship in Hasura to relate that information back to the rows in my own database, so in general, we don't have a lot of control over how long it might take to go and fetch data from an external API like this one.

Phil

2:38

We don't have the tools for optimizing database queries in this general case, so maybe it's not surprising that it could be slow. And of course, this with other effects can just sort of compound in real life applications and just lead to naturally slow queries, so caching becomes a benefit. And even if a query is not slow by itself, we might still want to avoid fetching the data each time, because even a very quick query, we might need to run that query several times because a large number of users are all querying the same data. And again, caching becomes beneficial. One solution for these problems is just to cache the responses. And this is a feature that Hasura has had since version 1.3.2, so this was available actually in the very first release of Hasura cloud. And the approach we take is very simple. If you want Hasura to cache query responses for you, you can simply add a directive, this cache directive on any of your queries.

Phil

3:33

And caching of course, is a trade off between performance and data freshness, so every time we go to the cache, instead of doing a fresh lookup in the database or from a data source, there's always a risk that the data we get back from the cache is going to be stale, so the cache directive if you like, it has an argument called TTL. And if you like that sort of a trade off between, it determines how happy you are to take that risk of getting stale data back. TTL stands for time to live. And it's an estimate for the maximum amount of time that the response is going to be stored in the cache before we go and enforce another lookup from the source, so if I say TTL 60 that's 60 seconds, that means that for 60 seconds after I initially cache this response, I'll ideally get back that same response, but after 60 seconds, I'm going to be forced to go back to the database and do a new fetch.

Phil

4:28

What happens when you add the directive to your query? To a first approximation, the approach is very, very simple. We first computer hash of the query tree, that's the AST that Hasura passes. And then we use that computered hash as a key in the cache to look up the response, so if it's there in the cache, we return that response. If it's not, we send the query to the database or data source as usual and store what we get back, the response in the cache with the same key, that's the hash. This is the first reason why caching in Hasura makes so much sense. There's no need to think about what inputs might affect the construction of an appropriate cache key Hasura already has the AST, and it already knows when two queries need different cache keys and when they won't, because all of the relevant data is right there in the tree.

Phil

5:17

And all the irrelevant data so for example, small syntactic differences, use of fragments, white space. Those have all been normalized away by the time we get to the AST that we're going to hash, so we don't need to worry about those. Since the initial release in 1.3.2 of response caching we've made a lot of progress working on this, on the server team. We can cache responses for almost any query you can think of building in Hasura. We've added basic support for integration with client side caching, as well as adding some APIs for both cache metrics and cache invalidation, so I'm going to go over each of these in turn. First let's talk about how we've been able to support caching for all the different sorts of queries that you can write in Hasura. That means queries that involve select permissions, multiple database sources, so for example, Postgres, also MS SQL, BigQuery, any of the other data sources we support. Also remote schemers, remote joins, actions and action relationships.

Phil

6:20

Anything you can query in Hasura can now be cached. How does that work? Let's look back at this outline of the approach that we use. I said the first step was to pass and hash the query AST so that we can use the hash as a cache key. Well, that's not entirely accurate, that's an approximation. In fact, what we do is to hash the elaborated query AST, so we have this process called elaboration and that involves taking the syntax of the query that the user provided and expanding it with all of the permission data attached to any of the tables involved, so two queries will only get mapped onto the same cache key, and therefore give the same response back if the elaborated query ASTs are equal. That is to say, if taking into account the permissions on the roles for any two users making a query, if after taking into account permissions, the queries are still identical, then we can share the responses in the cache.

Phil

7:18

And in fact, that's also an approximation. That's not quite right either because we also need to take into account relevant session variables in our hash computation. What does relevant mean here? Well, we don't want to just create a hash of all the session variables because the chances are that some of them won't be used in the query itself for any given query, so a session variable is relevant to a query whenever it gets mentioned in select permissions for one of the tables in a query, whenever it gets used by remote schema permissions in a preset, or whenever it gets forwarded to a remote schema or an action because of the configuration setup on those data sources. Whenever we determine a session variable is relevant, we make sure to include it in the computation of the cache key, so that's another reason why Hasura is a great place to implement caching for your queries.

Phil

8:09

All of the role based permission method data is already declared. We know when a session variable needs to get taken into account for the purpose of caching. Alternatively, if you wanted to implement this as a layer on top of this or on your own, you'd either need to recreate the permission logic in that layer, or you'd need to include all the session variables in your cache key, whether or not it made sense to include any given one. And the end result is with all of this Hasura can now take any query that you can express and apply caching rules just the same as for those simple query examples that I wrote up at the start. All you have to do is apply that cache directive on your query.

Phil

8:55

The second big improvement in caching that we've added is better integration and support for client side caching, so that is caching that takes place in the end user browser or in various intermediate locations, such as CDNs on the internet. This is a simple change to the approach I outlined. Here's the outline again, right at the end of the process, once we're about to return the response, whether it's from the cache or from a data source directly, we just need to attach any caching related HTTP headers into the response. And in particular, we set the cache control header based on the number of seconds that we expect the cache entry for the response to continue existing in our own cache, so we may have set a TTL of 60, but it's already been sat in the cache for 30 seconds, so we'll return a cache control header with a max age of 30 seconds.

Phil

9:43

This means that the browser or CDN continues seeing that response from its own cache until we would no longer have been serving it from our cache on the server. Now you might be asking what the point of this is because GraphQL requests are usually served via HTTP post requests, even for queries. And it's unlikely a CDN or appraiser is going to respect a cache control header for a post request. Although technically the spec, I think doesn't prohibit that, but even so it's still useful to send this header, for a couple reasons, one, there are some extensions that allow us to run GraphQL queries over, get requests, and then cache control headers do apply. And also as of recently, we can now turn GraphQL queries into GET endpoints via a new feature called the rest endpoints feature that we added earlier this year.

Phil

10:35

I'm not going to go into detail on the rest endpoint feature right now. But the idea with that is that you can take the queries that you've built during development using Hasura and with graphical and all the wonderful tooling for rapid development that we have. And once it's ready freeze it into a more production ready rest API. And at that point, queries can be turned into GET endpoints instead of post endpoints, which makes those a natural fit for client side caching. And automating it in this way makes sure that any client side caches are naturally going to expire the data approximately the same time that our own server side caches are going to expire, so we're able to line up all of the expiries in all of these different caches of the browser at the CDN and on the server and increase performance that way.

Phil

11:28

Okay, so speaking of performance, how can we measure the performance of the cache itself? And this is something we needed to solve for ourselves in development for our own debugging purposes. And we added this API cache metrics in order to solve it and it's available for general use, so here's a new endpoint pro cache metrics. And if you curl this endpoint on your Hasura cloud server, you'll get back a collection of metric information as [inaudible 00:11:55], so I've pasted some of the output on the right there. And you can see that there's a collection of opaque identifiers with some data for each one, for each one we see cache hit and missed numbers. And those are the number of requests in each of those buckets, which hit and miss the cache respectively, so hopefully we're seeing higher hit numbers and lower missed numbers in each of these buckets, but what are the hashes?

Phil

12:19

They're actually not the hashes that I was talking about earlier, the hashes that we use to compute the cache keys themselves, because there are just far too many of those. We have potentially very many cache entries and each one gets its own hash, so instead these are what we call the query family hashes. And these are a coarser grained hash of roughly the same inputs to the cache key. But for example, queries that have the same AST, but different variables substituted into the AST end up in the same bucket. Those won't get a different query family hash, even though they would get the same cache key. If you want to find out the performance of any given query, you can get these hashes from the response headers of the query, so here for example, I'm picking out the family hash from the Chrome network tab and then correlating that hash with the output of metrics API.

Phil

13:13

And in this case the performance to the cache actually isn't so great because it looks like I'm getting about twice as many cache misses as hits, unfortunately. There's a few reasons why that might be the case, which we'll get briefly into later, but for now, this tool sort of gives you an impression at least of what's going on inside the cache for any given query. The last feature we've worked on to improve caching is one more API, and this is for cache invalidation, so we have three new APIs under this new endpoint pro cache clear, you can clear the entire cache, you can remove individual response caches, or you can remove entire families of queries at once using the query family hashes that I just showed, so both of these hashes can be found in the response headers, just like I was showing. And you can use this during testing, or you could use it for some more advanced automation use cases of the cache, and we'll see a couple of those in a minute.

Phil

14:14

And finally on the subject invalidation and this is actually a brand new feature, so hopefully this gets released by the time you're watching this, we can refresh an entry in the cache by adding another argument to the cache directive, so what this means is that regardless of whether or not the cache currently contains an entry for a given cache key, we're going to go back to the data source and refresh the data in the cache anyway, so where this comes in useful is where we want again to automate the construction of the cache. And that's something we're going to look at next, so we'll have a look at a few different examples of ways in which we can use all of these different features to automate the construction of the cache. This is sort of an advanced feature. You definitely don't need to use this, but it can be useful in some cases. The first case is where we want to anticipate bursts in the incoming traffic. And we want to make sure that the cache is ready to handle requests without incurring too many round trips to the database during a given burst.

Phil

15:10

In this case, we can use Hasura to set up a scheduled trigger right before the expected burst. Let's say a minute before the burst and that trigger is going to delegate to a serverless function, which you can host on whichever hosting provide you prefer. It's going to determine the list of queries that need to be cached, and then execute those queries as normal, just passing control back to Hasura, but making sure to set that cache directive, so we also want to make sure when we do that to set the refresh option, so that way the data is getting refreshed in the cache and then it's as fresh as possible. The queries are going to come back into Hasura and the responses go into the cache ready for when the burst in traffic happens. And then hopefully during the burst, hopefully a majority of that traffic ends up being served by the cache instead of going to the database.

Phil

16:00

Another case for automation is where we want to invalidate cache entries based on updates that are still happening in the database, so maybe in this case we have a read heavy workload, but we still want each sure the responses are as up to date as possible and respect any recent writes, so what we can do in this case is to set up an event trigger to watch for those data updates. That event trigger will fire on inserts, updates, and deletes. And again, just delegate to a serverless function that we can host. But this time the serverless function can use the cache invalidation endpoints that I showed to remove any out of date entries from the cache. And then for each of those queries the next instance of those queries is going to have to fetch new data from the data source, forcing us to refresh that data on the next read.

Phil

16:47

And the third case for automation is where we have a set of crucial queries whose responses need to be kept up to date in the cache, so you might imagine, let's say a news website and the queries fetching the data for the front page, which is going to be hit very often. In this case, we can set up a cron trigger in Hasura, which fires every minute, let's say. And again, delegate to a serverless function that we host, which is going to refresh the data from the cache. And in this case, I wrote out a little code to show what might happen how we might implement this inside the serverless function itself. First of all, we need to determine which queries we want to run. And here I'm using the same musical database running example from earlier, so we need to determine which queries we want to run, because we might want to, for example, pre-fetch data for multiple roles.

Phil

17:40

Here, for example, I'm using this musical database, but I have a role for artists who can only fetch their own album data. And I want to pre-cache query responses for a variety of different artists as if those artists had run queries, providing their own session headers. And of course, even for this first step, we can cache the response. There's no need to go to the database every time to fetch the list of artist IDs that we want to cache data for, we can just fetch that from the cache as well. And next we need to run the queries themselves and we'll store the responses in the cache. And again, we make sure to include the cache directive, but this time we make sure to include the refresh option because as I was saying before, we want to make sure we've forced an update in the cache so that we're storing data as fresh as possible on these updates.

Phil

18:40

And then here it is stitched together as a little express server that I deployed on [inaudible 00:18:45] for testing. I have a single endpoint called pre-fetch and cron trigger is going to invoke that for me. And we get all the benefits as usual of using Hasura cron triggers, so if there's a failure, if I return a status 500 at the end here, the cron triggers are going to re-queue that pre-fetch task for me, so there's a lot of, a lot of benefits to piecing this together out of existing Hasura tools. First this runs the first query to determine the different set of artists and their session variables that I want to pre-fetch data for. And then it finds out with promised to all and issues one query for each of those session variable sets.

Phil

09:30

Here I'm using the role artist and headers and the artist ID one for each of the roles from the first result. Finally, I want to talk about one possible future feature that we have in mind for caching in particular, which is optimization of the queries that use the cache. Normally in Hasura any optimizations are automatic. There's no need to tune anything by hand, but caching is different because first of all, you have to opt into caching with the directive, and then you have to tune it using the TTL premise on each query. And maybe even you need to set up one of these custom caching workflows and automation, if you have a particular use case in mind, and eventually we'd like to make caching more like any other optimization in Hasura where if there's something which is certainly an optimization, then we'd just like to be able to apply that for you instead of it being opt-in and even tune it automatically.

Phil

20:20

But in general with caching, this is pretty difficult in general, because it's hard to tell when caching something is an optimization at all, or if it was cached in the best way. That being said there are certainly still some cases where we can spot some inefficiencies and improve things automatically, so for example, depending on the way in which you make use of session variables, you might end up with a cache key that's very fine grained that results in a lot of cache misses, and that's not what we want. Here for example, I'm using the user ID as a session variable to effect user data. But if we look closely, actually I'm only fetching team level information via a relationship and nothing else, so it would be better if we could change the cache key, if we can rewrite the query so that we introduce a separate auxiliary session variable at the team level, then now any two users on the same team can use the same cache key.

Phil

21:12

And you can do this sort of optimization today by hand, but it would be better if we could just spot these sorts of issues and let you know as a user or even fix them for you. And here's another example. If we're fetching article data from an article database, using a similar user ID as a session variable, but we have a joint table, let's say to figure out which articles each user can see. Now, the user ID again, is very fine grained. Every user has their own cache entries, so we don't get any sharing in the cache. And it would be better if we could create another auxiliary session variable, which represents all of the articles that a user can fetch. If we do that, then now any two users with the same joint table entries get to use the same cache key.

Phil

21:55

But the question is, when is this actually an optimization because computing that new session variable takes time and requires storage, so hopefully what we win in terms of better cache performance pays for the small upfront cost of computing that additional session variable, but it's not so clear in general, we need to do some work to figure that out, but that's something we've been thinking about and something that's important on our roadmap at the moment. But hopefully that gives a sense of where we are with caching, how far we've come since the original release, a bit of where we're heading with new work. I'd just like to say a big thanks to everyone on the Hasura server engineering team, who've worked so hard on each of these features over the past several months. And thanks everyone for watching. I think we're going to do questions after the presentation, so thanks very much.

Phil

22:48

Bill maintained that the best part of a virtual event is forcing question and answer time. Phil, I'm going to start with one that I have. Tanmai, two talks ago sort of blew our minds with the notion of cross database joins. How do cross database joins and caching play together?

Phil

23:12

Yeah, the cross database joints are really exciting. Both the features represent a lot of work from the server engineering team, we're very excited about both of them. The nice answer is from the caching point of view, as I said in the talk. Caching, we can cache basically any query that you can express in Hasura now, and when the PR is finalized, that's going to include generalized joints.

Phil

23:40

That's super, super cool. You and I were talking about... A lot of the questions that have come in sort of fall into the category of can I X with caching? Or can I Y with caching? And you and I were talking a little bit about the notion of sort of caching as a tool set, if you will, as a foundation on which things can be built. Would you mind just sort of sharing your thinking on that a little bit?

Phil

24:11

Yeah, so the default with Hasura is that features tend to be batteries included. Caching is a little bit different, like I hinted out in the talk in that it's more of a toolkit. Right now, we're sort of concentrating on the individual pieces that you can piece together into your own caching workflow. That may be a bit more advanced than the standard one. You can just turn on the cache directive and you get a sort of sensible set of defaults for caching, but you might want to do one of these workflows with event triggers or schedule triggers, and that's possible too and you can just assemble it out of these pieces that Hasura provides. But we also want to be a little bit careful because as we add, generalized joints is another one of these features like caching that just cuts across so many aspects of the product and having so many of these features that, there's just these intersections of features everywhere.

Phil

25:02

We need to just make sure that we're semantically precise about how these features intersect with each other. In that sense, we want to be sort of careful and deliberate, but the nice thing is that, like I said, the defaults with caching for any type of query you can express are sensible defaults and you can also piece together more interesting things if you want to try and do that as well. Hopefully as we move forward, we'll provide a few more out the box advanced options as well.

Phil

25:30

Right, absolutely right. That's part of the sort of the future and the roadmap, and the what's next, that you were kind of hinting at and describing is moving beyond the framework and into the implementation. A couple of just really quick specific ones. I thought this was interesting, with the same query but with fewer fields in it to retrieve, will that hit the cache result as of right now, will it hit the result of the previous query or?

Phil

26:00

No, I mean, unfortunately not the downside with... We've tried to pick a very, the sort of foundation of caching is very, very basic. It's this basic outline that I went over, that we just look at a variant of the query AST and use that as a key elaborating where need be, means that there are some optimizations like that, that we can't do right now, but the other side of that is that hopefully by the time you are sort of productionizing your application, and I mentioned the sort of rest end points feature is one aspect of that. Hopefully we're sort of like, it would be nice to share the cache key for two of those things and hopefully sort of reuse part of the response in the cache, but we can't. But hopefully we're sort of locking down those queries in a productionized environment it becomes less of a concern there.

Tyler

26:51

Yeah, absolutely. There's been a question that's come up three times actually in sort of different forms, which I'm going to sort of combined and restate, which is effectively, where is the cache? Is it Ramdis? Is it Redis? Is it Memcache? Is it pluggable? What can you share with us about that?

Phil

27:10

Yeah, so I mean, the implementation in cloud, in Hasura cloud is it's an abstract implementation. The nice thing is from the user point of view, you don't need to know anything about the details of like how the cache is implemented. You can just set a TTL on that cache directive, and it behaves like an LIU cache, as you would expect. If you are doing an on-prem deployment, an enterprise deployment, you can bring your own cache and sort of substitute your own cache implementation by bringing a ready configured Redis instance, so it is pluggable in that sense, but only in the enterprise on prem deployments that we have.

Tyler

27:48

Gotcha. I think I know the answer, but I have to ask in the hope that I'm correct. There's a workshop tomorrow before Day Two, about adding a new backend to Hasura. In this case, I think it's SQLite and so let's say I'm in the position where I'm Haskell capable and adding a backend. Do I get caching for free?

Phil

28:13

Yeah, the workshop's very cool, so that's related to the generalized joints work that goes along with that, this idea of different pluggable back-ends that we've integrated into the IR, the intermediate representation of Hasura. The nice thing is that caching sits... so of all these layers, you get the past AST, you move down to the backend implementations and that forks off to a remote scheme or an action, various database sources of different types. Now SQLite, if you want to add that, the nice thing is that sits right at the top of all of that above everything else, so if you choose to add SQLite at the backend catching just comes for free. In terms of these intersection of features, fortunately, that's something that's far away enough that we don't have to worry about those features just [inaudible 00:29:00] for free.

Tyler

29:02

Totally, totally understand. And that's incredibly exciting at least to make the work that is happening in server engineering at Hasura is incredibly compelling and not only what's happening now, but what's happening in the future is incredibly exciting as well. Phil, thank you so much for sharing your time with us.

Phil

29:22

Yeah, no problem. Thanks very much.

End of transcript

Description

Behind the scenes, Hasura already provides many optimizations to optimize query performance for scalability. Recently, we’ve been improving support for query response caching, which is an opt-in performance improvement for your queries at the expense of consistency/staleness. Query response caching is not a feature we have discussed widely, so this talk will start with an overview of the feature. We’ll also look at the trade-offs, and a set of techniques for getting the most out of the response cache for your own production systems. Finally, we’ll look at some of the possible future directions in which we’ll be taking this feature over the next few months.