Architecture & Authorization For A Complex Multi-Tenant SaaS Platform With Hasura | Prefect | by Zachary Hughes @HasuraCon'20
Zachary Hughes spoke last year at HasuraCon'20 on how Prefect's complex multi-tenant architecture and authorization is built with Hasura.
As HasuraCon'21 is almost here, we thought it would be the perfect time to take a look back on all the amazing talks by our amazing speakers from 2020!
Here's Zachary's talk in video form:
And below is a text transcription of the talk for those who prefer reading! (The text is slightly paraphrased and edited here and there for clarity's sake.)
Hey folks, thanks for coming. Today we're going to talk a little bit about architecture and authorization modeling for a complex multi-tenant SaaS platform. And why using Hasura to achieve this is a Hasura-fire way.
A bit about me
So for those of you who didn't hit “leave meeting” as soon as you heard that awful pun, let's get some introductions out of the way. My name is Zachary Hughes. I'm based in Washington DC, and I'm a cloud engineer at a company called Prefect. Here's a little bit of contact information about myself. And in case you want to reach me.
The Agenda for today
What are we going to be talking about today? We're going to start by talking a little bit about Prefect. I know it's HasuraCon and not PrefectCon, but context is helpful to understanding what Prefect is trying to do helps explain why we do what we do with Hasura. Once we've gotten that out of the way, we'll talk a little bit about the architecture and how Hasura fits in with Prefect's overall design.
Following that, we'll talk a little bit about using Hasura, both internally and externally. So when people hit our API versus how we develop with it, and at once that foundation is set, we'll talk a little bit about Auth, that's gonna include roles, multi-tenancy, permissions, all that good stuff.
A little bit about Prefect, let's dive in.
At a really high-level Prefect is just responsible for making sure that workflow is executed properly. For any of y'all in the audience who might be fans of Hitchhiker's Guide to the Galaxy, the name is 100% inspired by that wonderful roving researcher Ford Prefect. It's a little bit of an homage.
Prefect at the moment has two main components: Prefect Core, which is workflow management, that takes your code and transforms it into a robust distributed pipeline that we call flow, and Prefect Cloud, which is the layer of abstraction and orchestration on top of that.
This is an API as well as a UI that provides additional orchestration, observability, and auth and IAM data pipeline. Basically, all that stuff that you don't really have to worry about. We call that stuff negative engineering.
Diving a little bit more deeply in Prefect Cloud, as we mentioned, it's a UI and an API that lets us manage scheduling your flows, which are those pipelines we discussed, helps you monitor and maintain the health of flows, and persist information about flow execution history.
This includes logs and flow states and flow states are basically just indicators of the history of the flow: when it started running, whether it failed or succeeded, any sort of information like that. And the great thing is, a significant portion of that is covered by Hasura.
Given that, let's talk a little bit about the architecture and how sort of fits into it.
Here's a high-level diagram of Prefect's architecture, it's a little bit pared down, but I don't want to bore you with the details. As you can see most of our compute is located in a GKE cluster. So let's start with the beginning.
We have an inbound request with the JWT in Header. That seems a little bit specific, but it's going to be important for our auth. Once the request hits upon the server, the Apollo server then talks to our auth service, verifies the token, it populates a role in a new JWT.
Once that new JWT has reached the Apollo server, it can take one of two paths. If the inbound request was in a query, you send it directly to Hasura. Or if it's a mutation, it takes a little bit of a detour. This detour takes it to Python business logic. It's bundled with an ORM. We'll get into the details of that a little bit later.
But the important part to consider right now is that once everything is said and done, this middleware posts any request to Hasura, and then everything ends up in Cloud SQL on the back end.
Now digging a bit more into JWTs that we discussed earlier, it's important to consider that there are two pieces or three pieces of content that are really important here.
We have a user ID, which indicates a specific user associated with this token, we have a tenant ID, which indicates the team this token belongs to. And we have a role, which indicates the user's role in Hasura.
Example, before & after auth service
So interestingly enough, when we first get our token, it doesn't have a role, we dynamically populate that. So if you notice, in this example, we have a tenant ID and a user ID, which we would expect, but we don't have a role.
However, once the Apollo server gets that auth that it does have a role. Seems like sort of a detail, but it's going to be important as far as how we manage dynamic roles.
Let's move on to how we actually use Hasura.
Consuming the Prefect API is pretty easy, and really anytime we consume the Prefect API, we consume the Hasura API. So as we discussed earlier, queries were sent directly to Hasura after the JWT is auth'd. Mutations take that detour that we discussed which goes to the Python business logic we've defined. And then any interaction between that business logic and Hasura is done via the ORM.
As far as humans go, everything is consolidated in a column, so the user has a seamless experience. Everyone just hits API.prefect.io and they're good to go. And then is half mean-half serious: and the, you know, I have tried each of these three approaches, they're each as painful as they sound. And we're getting a little bit more into the detail. But it's been really delightful letting Hasura handle all of this stuff.
Speaking of the ORM, let's give it a little bit of attention, because it is super nice, and it helps improve the productivity of our internal dev experience. The ORM is just a lightweight layer in front of Hasura. It's implemented in Python, the actual serialization logic there's identic, and it lets us interact with Hasura in a convenient manner.
Rather than having to open up the session, post everything, every time you want to talk to Hasura, we just use our ORM. Another nice thing about this is that the ORM operates within a token context that's populated by the JWT we've been discussing. This sounds like another implementation detail, but it's actually really nice because by using the ORM in conjunction with the token context, it means that every time we send something to Hasura using the ORM, we have pre-populated tenant IDs and user IDs for the roles and the header. This lets us sort of delegate our permissions work to Hasura without having to handle anything else on the front end.
Moving over, here are just some samples of what this actual ORM looks like. Our query would look fairly simple and lightweight: models.Flow.where we define our WHERE clause. Let's say something like name is equal to fool, and then we just get it. Similarly, if you wanted to insert something, we'd say models.Flow, define our actual flow, name through here are the tasks inside the flow, yada, yada. And then we'd go ahead and insert it.
So with that foundation layer, let's talk a little bit about auth.
Our auth model has three main components. We have users, who represent individual consumers and cloud - sounds a little bit silly to define that. But we also have a user role. We have memberships, which link a user's role to a team. And then we have teams, which are effectively tenants, many groups of people working with the same flows, have access to the same resources, like flows, projects, all that good stuff.
Users and tenants both have unique IDs, we use UUIDs. And these IDs are used to enforce role-based permissions. They both make liberal use of Hasura relationships. This is partially in order to just help improve the querying experience. It's also partially to help improve data integrity. And it's worth noting that users and tenants are many to many of us model, users can belong to one or more team tenant and switch it as well. And tenants can have one or more user.
So with that out of the way, I know it's a lot of text, let's take a little bit of a look at an example. Let's check out our flow.tenant ID. Our flow table has a certain ID column, which maps to the ID column or a tenant. This helps us enforce our row permissions, which are here where we check to make sure that our tenant ID is equal to the X-Hasura-tenant ID. That x-Hasura-tenant ID is part of what's passed through by our ORM or populated directly if it was passed through by a query.
So roles, these are important. Users have access to four primary roles.
We have a login role, which is sort of invisible, allows users to view information necessary to find and select their membership. Once they've found and selected their membership, we have three remaining roles.
We have a read-only user, just sort of what it says on the tin. It allows the user to view everything in a given tenant, but not update or insert anything. It's sort of an observer/auditor-type role.
We have users who are sort of day-to-day folks using the system. This gives the user everything they need to do in order to operate in that tenant. That means they can view things, they can kick off and create flows, create projects, update states, whatever you'd like.
And then we have kind of admins who give the user the ability to actively manage everything in that tenant. This includes things like billing. It also includes managing things like who actually belongs to a tenant. If you have a bad actor, then admins have the ability to remove them.
We also have a fifth role called runner, which is less crucial to how users actually interact with the system, but it's important for our orchestration, we're just gonna leave that off for the time being.
As an example, let's take a look at this. You can see right off the bat that our permissions are actually fairly granular, and they're not an all standard. Certain things have permissions for everything. Certain things have no permissions.
This is our membership table. So let's see who can do what. A user has permission to delete a membership that matches their user ID. This makes a fair amount of sense, right, because a user should be able to remove themselves from a tenant if they would like. But they shouldn't be able to actually remove someone else.
On the other hand, a tenant admin who is responsible for managing the tenant as a whole has permission to remove any membership that matches their tenant ID, which gives them a bit more power. This is a really nice, clean illustration of how you can actually give tenant and user permissions, and use roles in conjunction with them to help that intersection be a little bit more manageable.
So with all this in mind, switching tenants is actually super easy. And a key part of what we do as far as multi-tenancy goes. With this pattern in place that we've discussed, switching tenants is just as easy as issuing a new JWT. You issue a new JWT, that populates the new context, which defines the user ID, the tenant ID, and the role. And then once that's all said and done, you can move on to another tenant.
No need to write custom permissions & auth with Hasura
So something I want to call out here is using roles and role-based permissions lets us delegate permissions in Hasura which sounds sort of obvious, but it's something I know I take for granted since I've been working with Hasura for long enough. I've written custom permissions before as well as custom auth, and it is so painful. I would officially like to never write one again. I am sure there are folks out there who are great at it. And I'm sure there are folks out there who might even enjoy it. It's not really where I'd like to focus my attention.
Views can help us
But there is one gotcha. What if you want to allow folks with the user role to view other users in their tenant. It's a little bit tricky. In this case, the pattern we've approached with is views. So we can define a view and map memberships and roles and permissions do them, the same way we would any other table. So by defining views alongside your tables, you can define an entirely separate set of permissions to fit your needs. In this case, we want users to be able to see email, first, last name of other users, but no more. And by using views and defining a separate set of permissions there, it allows us to take the union of what they can do.
So I'd like to leave with a couple of parting thoughts. The big thing is between its GraphQL capabilities and flexible permission schema, Hasura roughly halves the amount of time it would take us to add and expose a data model. I cannot say just how nice it is to be able to implement a feature, create our new model, define how users will query for it, and have that be the easiest part of our system. We basically define our permissions, we make sure that our tenant guards are in place, and we're good to go.
And that about wraps it up. I really appreciate your attendance. I'll be sticking around to answer any questions and if you want to contact me, do so at [email protected]