GraphQL and UUID type on Postgres
Hasura GraphQL Engine has implicit support for Postgres UUID type, which means that UUID values can be provided as strings.
- Exposing information: Very frequently, the primary id is exposed to the user (REST urls, anyone?). If the primary key is easily visible to the user, this means that anyone can find out your number of users by just looking at the primary key on a freshly created user. This leaks information about the size of your userbase.
- Enumeration of entries: Another problem is that it’s very easy to enumerate all the entries of your table with a primary key. A malicious user or a bot could easily scrape identities of all your users, or spam your API with organised actions on all your users. A UUID is universally random, and is practically impossible for external parties to predict. To be very clear though, UUIDs are not a security mechanism, the actual solution to this problem is to put up stronger access control. UUIDs just provide more protection in this case.
- Independent Generation: The biggest problem with sequential ids is that the client doesn’t know the id of an object being inserted without talking to the database. This is even more of a problem when inserting multiple related objects with foreign keys, and creates inelegant multi step inserts. A UUID on the other hand, can be generated directly by the client making inserts very simple, and solves the foreign key problem quite nicely too.
- Random distribution: The random distribution of the UUIDs can prevent disk hotspots, where a large number of objects are stored in a particular disk block, causing high usage on specific areas of the disk. This works very well for databases distributed over several nodes.
- Size: The downside is that UUIDs are four times as big as a serial, and if you have size constraints, they can be a problem.
- Hard to debug: Harder to remember, hard to debug, a small but important QoL issue. It’s easier to test/debug when you have simpler ids that you can remember compared to long complicated UUIDs.
- Lack of easy ordering: The primary key cannot be used to determine the order of insertion, another column with timestamps or a sequence becomes necessary. This also means that clustered indexes on UUID columns are very costly in terms of performance.
- Performance: And also, random UUIDs mean scattered indexes, and this can cause performance issues/high disk usage with high insert rates. (Databases have sequential UUID generation functions to deal with this, but this eliminates the advantage of being able to generate the UUID on the client)
Related reading