Things you wish you knew about PostgreSQL for building an app
I had posted this question on platforms like Twitter, HackerNews and Reddit and got some interesting responses. In this post, I'm collating a bunch of these discussions to help frontend or full stack developers work better with PostgreSQL for their app.
As the app grows in size and user base, database typically becomes a bottleneck for various reasons. And I'm not even talking about resource consumption, memory, connection limits etc which will haunt admins at some point. I'm actually going to talk about things which are more fundamental at the data modelling layer, some smaller naunces about Postgres, many of which are applicable to any SQL database.
Note that this is not an exhaustive list and is purely based on experiences from various users running Postgres in production, including myself. Alright, lets dive in.
Using the right data type for a column is arguably the most important thing to worry about when designing data models, even before starting to use Postgres.
Choosing string type
A common mistake is to choose a restrictive data type that doesn't scale in the future. For example: A column containing some
string can be of 3 types in PostgreSQL - varchar(n), character(n), text. When choosing the former two, you are restricted to the number of characters you can use. Choosing
text type for data is usually harmless and is scalable to accomodate more characters in the future. And if need be, you can add check constraints to restrict data input.
Primary key dilemma
The never ending debate about what is the right primary key type? (unless of course you really have a unique column to mark as primary). The
id column can have auto-incrementing integers (preferabbly bigint) and this seems to be the most widely used solution.
The collision problem: To be honest, before these discussions on Reddit, I have always used auto-incrementing integers for my demos in the past but with an occassional mention that it might not scale well. This was purely because of the issues I have faced in the past. When you are migrating, import/export of data using CSV, or interacting with multiple databases or data sources, there is a small possibility of collision of
id values. So I tend to use
uuid as the primary key. But there's a possibility of collision there too, albeit a rare one.
Using anything other than auto-incrementing integers for
id column should be justified. Most apps can stick to this philosophy.
And hey, you can also create your own data types.
Postgres supports JSON type from v9.2. It has added many operators and functions for data manipulation of JSON. The operator
-> returns JSON object field by key. The operator
->> returns JSON object field by text.
Does this mean, I can use this type for my NoSQL db use cases? Yes! With support for wide variety of operations, this is a good reason for staying in the relational DB ecosystem.
Most columns should be NOT NULL
Unless required, most columns in your database should not be nullable. Apart from ensuring data consistency and reliability, null values often cause a lot of issues in the frontend. Especially with an app that is not strongly typed, null values might require extra conditional checks and making the app rendering complex. You really don't want the red screen of death!
It is always better to have the two timestamptz fields
updated_at columns with default
now() value for
created_at for tables that store some form of dynamic data. Storing what time a record was created or modified would be very useful in the future when going over some analytics or reporting. Audit logs might be required and timestamps are key.
Make use of schemas in PostgreSQL
PostgreSQL has support for mutiple schemas. We all typically end up using just the
public schema for most use cases. But multiple schemas are a nice way to namespace your different application use cases since joining of data and establishing relationships between schemas is fairly easy.
I tend to use schemas for reporting views for analytics and keep my primary table schema (public) noiseless.
Make use of extensions in PostgreSQL
PostgreSQL is pretty extensible. Extensions in Postgres lets you load it into the database and behave like built-in features. Some of the popular extensions are PostGIS, TimescaleDB, pgmemcache, pg_cron etc. They are designed for use cases and saves a lot of time for what it is intended to be used for.
It is driven by the community and there's something written for most use cases.
This is probably the first thing to add to improve performance of your queries. Postgres adds indexes on primary keys and unique constraints. Indexes won't be created automatically on foreign keys, but as noted from experience, it is typically required in most cases.
Be judicious on indexes since there is a performance overhead if there are too many.
Database Users and Sessions
Default configurations of postgres comes with superuser
postgres. But it is recommended to create new user with specific
GRANT types to allow access to certain databases and operations.
For example, your custom code server that is connecting to Postgres purely for querying and inserting data, just needs the GRANTs for SELECT and DML statements like INSERT, UPDATE and DELETE. For DDL statements, there can be a separate user with
Generating a connection string with the right host value is often where things go wrong. libpq libraries use the URI based format to make it easy to connect to.
Connection strings are of the format:
localhost will vary based on the OS being used and the way Postgres is installed and set up. For example, using docker on macOS would change this to
host.docker.internal, instead of the
localhost which works for native macOS app installation.
If you are using a secure SSL installation of Postgres, you need to add the
?sslmode=require parameter to the connection string.
Views can be leveraged to transform data so that queries are easier on the client. It is recommended to not do data transformations or multiple queries on the client, when it could have been solved with one View.
Views are powerful and execute everytime you run the query. The data is dynamic and will likely be more performant than client side processing for most use cases.
Version control the schema
As you keep developing your app, the schema changes incrementally. But you would want to keep a log of what changed when, not just from an audit perspective but more so from an ability to rollback perspective.
Database schema migrations are key to version control and rollback the schema in case things go wrong with a specific feature deployment.
There are cases where you might want to execute both DDL and DML statements in a transaction. This is called Transactional DDL. Consider the following use case:
postgres=# BEGIN; BEGIN postgres=# create index on table (status); CREATE INDEX postgres=# analyze table; ANALYZE postgres=# explain select * from table where status = 'FOO'; QUERY PLAN ------------------------------------------------------------------------------------ ...some output postgres=# rollback; ROLLBACK
This lets you perform complex updates, inserts, deletes, then perform selects to verify the data matches expectations, then commit or rollback before the changes are visible to users.
Join the discussion
There's more to using Postgres in production and advanced things to take care and we all wished we knew earlier. But as I mentioned earlier, this is not an exhaustive list but would be updated as I get more inputs from users and from my own experience.