The pros and cons of updatable views in PostgreSQL
I recently started looking at a class of bug reports in which non-admin users of Hasura were unable to use certain newer features from Postgres.
For example, if the schema contains a table with a
GENERATED AS IDENTITY column (a more modern, SQL-compliant equivalent of a SERIAL typed column), then admins would be able to insert data into the table, but non-admin users would not. A similar issue arises with generated columns in the Postgres 12 release.
The root cause turned out to be the way we were using “updatable views” to enforce Hasura permissions, a relatively recent addition to Postgres. So, what are updatable views, how were they related to the problem here and how did we fix it?
In this post, I will cover the following:
- What are updatable views
- How Hasura used updatable views
- Key problems with updatable views (from a Hasura point of view)
- Moving away from updatable views
I hope that blogpost serves as an introduction to updatable views and its caveats for advanced Postgres users!
Views are a great tool for controlling data access in
SELECT queries. This can allow us to separate the data representation in the table itself from the API provided to users.
As a working example, we can create a table of articles, but a view which only returns the articles which as set as
CREATE TABLE articles ( id integer NOT NULL PRIMARY KEY , title text NOT NULL , description text , public bool NOT NULL DEFAULT 'false' ); CREATE VIEW public_articles AS SELECT * FROM articles WHERE public;
But what about when we need to insert or update data?
Postgres 9.1 introduced
INSTEAD OF triggers on views, otherwise known as updatable views, to solve this problem. When creating the view, we can specify a trigger function to run on attempted insert operations on the view. Instead of simply failing as usual, Postgres will delegate to the trigger to decide how to insert the data. This way, we can preserve any invariants in our data representation while providing a complete but well-encapsulated API.
For example, perhaps we should only be able to insert articles marked as
public into our
CREATE OR REPLACE FUNCTION public_article_on_insert() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN IF NEW.public THEN INSERT INTO articles (id, title, description, public) VALUES ( NEW.id , NEW.title , NEW.description , NEW.public ); RETURN NEW; ELSE RAISE check_violation using message = 'article must be public'; END IF; END $$; CREATE TRIGGER public_article_on_insert_trigger INSTEAD OF INSERT ON public_articles FOR EACH ROW EXECUTE FUNCTION public_article_on_insert();
Updatable views for permission checking in Hasura
In Hasura, we previously used updatable views to implement permission checks for non-admin users.
The naïve approach to check Hasura permissions would be to add a trigger on the target table and rollback the current transaction if the desired constraint did not hold. However, we don’t want to add triggers to our users’ schemas in general, or any changes at all all in fact. So instead, we would create an updatable view inside the Hasura schema (hdb_catalog) and use a trigger there to check the constraint. Clever! And this avoided polluting the user's schema.
Caveat 1: Updatable views don't have default column values
In the case of tables with generated or default values, we needed to do a little more work. To illustrate the problem, consider changing the definition of the table to make the
CREATE TABLE articles ( id integer NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY , title text NOT NULL , description text , public bool NOT NULL DEFAULT 'false' );
Our existing insert trigger won't work, even if we try to use
DEFAULT for the
> INSERT INTO public_articles (id, title, public) values (DEFAULT, 'The pros and cons of updatable views', true); ERROR: cannot insert into column "id" DETAIL: Column "id" is an identity column defined as GENERATED ALWAYS.
But we can make it work by omitting the explicit
id, and using the default value instead inside the trigger itself:
CREATE OR REPLACE FUNCTION public_article_on_insert() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN IF NEW.public THEN INSERT INTO articles (title, description, public) VALUES ( NEW.title , NEW.description , NEW.public ); RETURN NEW; ELSE RAISE check_violation using message = 'article must be public'; END IF; END $$;
However, the Hasura compiler has to handle not just one specific table, but any table the user provides. The question then becomes: How will Hasura distinguish this sort of case based on the metadata available. We need to fetch the default values from the table metadata and modify the data on insert if necessary.
Using the Postgres metadata available in the
information_schema tables, we could gather the necessary data to do this and simply join it with the inserted row in the new trigger.
However, there is another problem. The Postgres metadata does not track default values for certain types of columns, such as columns with
GENERATED AS IDENTITY types or computed columns. And this is the root cause of the problem in the introduction. We are unable to provide default values if we use the updatable view trick for these sorts of columns!
Caveat 2: ON CONFLICT does not make sense on updatable views
Views don't have unique constraints (or any constraints, for that matter).
ON CONFLICT clauses require unique constraints during an
INSERT operation, and hence these can no longer work with updatable views.
We used to handle upsert operations using this strategy, by employing
ON CONFLICT clauses in our
INSERT statements, but again, it requires a little more work.
If we add a uniqueness constraint to our table, for example:
CREATE TABLE articles ( id integer NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY , title text NOT NULL UNIQUE , description text , public bool NOT NULL DEFAULT 'false' );
then we may wish to support upsert operations based on violations of that uniqueness constraint. We could modify our hand-written trigger as follows:
CREATE OR REPLACE FUNCTION public_article_on_insert() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN IF NEW.public THEN INSERT INTO articles (title, description, public) VALUES ( NEW.title , NEW.description , NEW.public ) ON CONFLICT ON CONSTRAINT articles_title_key DO UPDATE SET public = NEW.public, description = NEW.description; RETURN NEW; ELSE RAISE check_violation using message = 'article must be public'; END IF; END $$;
This is simple enough in the case of a single hand-written trigger function, but again, consider that Hasura must handle any such combination of
ON CONFLICT clauses generically for any table and for any role.
The implementation in Hasura would generate one view per user role, each with its own insert trigger, and then pass the
ON CONFLICT portions of the
INSERT statement to the trigger indirectly through a transaction-scoped runtime parameter.
This became a problem, because the SQL generation logic inside Hasura became very complicated and also required runtime parameters to be passed around. This became a key blocker for supporting Postgres flavours like Citus, YugaByte, Cockroach!
Solution: Cutting out the complexity
This is all getting a bit complex! The updatable view technique was a nice way to work around the need for a trigger on the original table, but the need to provide special case handling for default values and the use of global variables both seem like indications that something is not quite right with the design here. Let’s simplify things!
Instead of checking permissions inside a trigger, we could have the insert CTE rollback its own transaction in the case of a permission failure.
To do this, we need a helper function which can raise an exception inside an expression context:
CREATE OR REPLACE FUNCTION hdb_catalog.check_violation(msg text) RETURNS bool AS $$ BEGIN RAISE check_violation USING message=msg; END; $$ LANGUAGE plpgsql;
With this simple function, we can simply modify the
RETURNING clause of the insert CTE to add a
CASE expression which will check the condition after the insert tentatively completes, and throws an exception if necessary. For example:
INSERT INTO articles (title, public) VALUES ('The pros and cons of updatable views', 'true') RETURNING *, CASE WHEN public THEN NULL ELSE hdb_catalog.check_violation('article must be public') END;
It’s important that the function call is evaluated lazily inside the
CASE expression here, or we would always throw an exception!
By making this change, we can simplify the code quite a bit - no more updatable views or triggers, and the same code path can be used for all users. Admins simply have a condition expression which always evaluates to
true, granting access in every case.
So, as we’ve seen, updatable views can be a powerful tool in specific cases where we want to encapsulate a data representation behind a different API. However, in Hasura’s case, where we need to deal with any possible schema at all, it’s not quite the best fit, and it was simpler to use more basic tools at the cost of a little more code generation.