The pros and cons of updatable views in PostgreSQL

I recently started looking at a class of bug reports in which non-admin users of Hasura were unable to use certain newer features from Postgres.

For example, if the schema contains a table with a GENERATED AS IDENTITY column (a more modern, SQL-compliant equivalent of a SERIAL typed column), then admins would be able to insert data into the table, but non-admin users would not. A similar issue arises with generated columns in the Postgres 12 release.

The root cause turned out to be the way we were using “updatable views” to enforce Hasura permissions, a relatively recent addition to Postgres. So, what are updatable views, how were they related to the problem here and how did we fix it?

In this post, I will cover the following:

What are updatable views
How Hasura used updatable views
Key problems with updatable views (from a Hasura point of view)
Moving away from updatable views

I hope that blogpost serves as an introduction to updatable views and its caveats for advanced Postgres users!

Updatable views

Views are a great tool for controlling data access in SELECT queries. This can allow us to separate the data representation in the table itself from the API provided to users.

As a working example, we can create a table of articles, but a view which only returns the articles which as set as public:

CREATE TABLE articles
  ( id integer NOT NULL PRIMARY KEY
  , title text NOT NULL
  , description text
  , public bool NOT NULL DEFAULT 'false'
  );

CREATE VIEW public_articles AS 
  SELECT * FROM articles
  WHERE public;

But what about when we need to insert or update data?

Postgres 9.1 introduced INSTEAD OF triggers on views, otherwise known as updatable views, to solve this problem. When creating the view, we can specify a trigger function to run on attempted insert operations on the view. Instead of simply failing as usual, Postgres will delegate to the trigger to decide how to insert the data. This way, we can preserve any invariants in our data representation while providing a complete but well-encapsulated API.

For example, perhaps we should only be able to insert articles marked as public into our public_articles view:

CREATE OR REPLACE FUNCTION public_article_on_insert() RETURNS trigger LANGUAGE plpgsql AS $$	
  BEGIN	
    IF NEW.public THEN	
      INSERT INTO articles (id, title, description, public) VALUES 
        ( NEW.id
        , NEW.title
        , NEW.description
        , NEW.public
        );
      RETURN NEW;
    ELSE
      RAISE check_violation using message = 'article must be public'; 	
    END IF;	
  END $$;
  
CREATE TRIGGER public_article_on_insert_trigger 
  INSTEAD OF INSERT ON public_articles 
  FOR EACH ROW EXECUTE FUNCTION public_article_on_insert();

Updatable views for permission checking in Hasura

In Hasura, we previously used updatable views to implement permission checks for non-admin users.

The naïve approach to check Hasura permissions would be to add a trigger on the target table and rollback the current transaction if the desired constraint did not hold. However, we don’t want to add triggers to our users’ schemas in general, or any changes at all all in fact. So instead, we would create an updatable view inside the Hasura schema (hdb_catalog) and use a trigger there to check the constraint. Clever! And this avoided polluting the user's schema.

Caveat 1: Updatable views don't have default column values

In the case of tables with generated or default values, we needed to do a little more work. To illustrate the problem, consider changing the definition of the table to make the id column GENERATED ALWAYS:

CREATE TABLE articles
  ( id integer NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY
  , title text NOT NULL
  , description text
  , public bool NOT NULL DEFAULT 'false'
  );

Our existing insert trigger won't work, even if we try to use DEFAULT for the id:

> INSERT INTO public_articles (id, title, public) values (DEFAULT, 'The pros and cons of updatable views', true);

ERROR:  cannot insert into column "id"
DETAIL:  Column "id" is an identity column defined as GENERATED ALWAYS.

But we can make it work by omitting the explicit id, and using the default value instead inside the trigger itself:

CREATE OR REPLACE FUNCTION public_article_on_insert() RETURNS trigger LANGUAGE plpgsql AS $$	
  BEGIN	
    IF NEW.public THEN	
      INSERT INTO articles (title, description, public) VALUES 
        ( NEW.title
        , NEW.description
        , NEW.public
        );
      RETURN NEW;
    ELSE
      RAISE check_violation using message = 'article must be public'; 	
    END IF;	
  END $$;

However, the Hasura compiler has to handle not just one specific table, but any table the user provides. The question then becomes: How will Hasura distinguish this sort of case based on the metadata available. We need to fetch the default values from the table metadata and modify the data on insert if necessary.

Using the Postgres metadata available in the information_schema tables, we could gather the necessary data to do this and simply join it with the inserted row in the new trigger.

However, there is another problem. The Postgres metadata does not track default values for certain types of columns, such as columns with GENERATED AS IDENTITY types or computed columns. And this is the root cause of the problem in the introduction. We are unable to provide default values if we use the updatable view trick for these sorts of columns!

Caveat 2: ON CONFLICT does not make sense on updatable views

Views don't have unique constraints (or any constraints, for that matter). ON CONFLICT clauses require unique constraints during an INSERT operation, and hence these can no longer work with updatable views.

We used to handle upsert operations using this strategy, by employing ON CONFLICT clauses in our INSERT statements, but again, it requires a little more work.

If we add a uniqueness constraint to our table, for example:

CREATE TABLE articles
  ( id integer NOT NULL GENERATED ALWAYS AS IDENTITY PRIMARY KEY
  , title text NOT NULL UNIQUE
  , description text
  , public bool NOT NULL DEFAULT 'false'
  );

then we may wish to support upsert operations based on violations of that uniqueness constraint. We could modify our hand-written trigger as follows:

CREATE OR REPLACE FUNCTION public_article_on_insert() RETURNS trigger LANGUAGE plpgsql AS $$	
  BEGIN	
    IF NEW.public THEN	
      INSERT INTO articles (title, description, public) VALUES 
        ( NEW.title
        , NEW.description
        , NEW.public
        ) ON CONFLICT ON CONSTRAINT articles_title_key DO 
          UPDATE SET 
            public = NEW.public, 
            description = NEW.description;
      RETURN NEW;
    ELSE
      RAISE check_violation using message = 'article must be public'; 	
    END IF;	
  END $$;

This is simple enough in the case of a single hand-written trigger function, but again, consider that Hasura must handle any such combination of ON CONFLICT clauses generically for any table and for any role.

The implementation in Hasura would generate one view per user role, each with its own insert trigger, and then pass the ON CONFLICT portions of the INSERT statement to the trigger indirectly through a transaction-scoped runtime parameter.

This became a problem, because the SQL generation logic inside Hasura became very complicated and also required runtime parameters to be passed around. This became a key blocker for supporting Postgres flavours like Citus, YugaByte, Cockroach!

Solution: Cutting out the complexity

This is all getting a bit complex! The updatable view technique was a nice way to work around the need for a trigger on the original table, but the need to provide special case handling for default values and the use of global variables both seem like indications that something is not quite right with the design here. Let’s simplify things!

Instead of checking permissions inside a trigger, we could have the insert CTE rollback its own transaction in the case of a permission failure.

To do this, we need a helper function which can raise an exception inside an expression context:

CREATE OR REPLACE FUNCTION hdb_catalog.check_violation(msg text) RETURNS bool AS 
$$
  BEGIN
    RAISE check_violation USING message=msg;
  END;
$$ LANGUAGE plpgsql;

With this simple function, we can simply modify the RETURNING clause of the insert CTE to add a CASE expression which will check the condition after the insert tentatively completes, and throws an exception if necessary. For example:

INSERT INTO articles (title, public) VALUES
    ('The pros and cons of updatable views', 'true')
  RETURNING 
    *, 
    CASE 
      WHEN public 
      THEN NULL 
      ELSE hdb_catalog.check_violation('article must be public')
    END;

It’s important that the function call is evaluated lazily inside the CASE expression here, or we would always throw an exception!

By making this change, we can simplify the code quite a bit - no more updatable views or triggers, and the same code path can be used for all users. Admins simply have a condition expression which always evaluates to true, granting access in every case.

Conclusion

So, as we’ve seen, updatable views can be a powerful tool in specific cases where we want to encapsulate a data representation behind a different API. However, in Hasura’s case, where we need to deal with any possible schema at all, it’s not quite the best fit, and it was simpler to use more basic tools at the cost of a little more code generation.