Using Postgres for Cron
Why even ?
Any medium-sized application will have tasks that need to be performed at certain times of day or even ad-hoc times. This, typically, introduces a different infrastructure component to the application, e.g. in Rails you might use Sidekiq or with Python you might use Celery and of course there is the system level cron. Although these are typically seen as job queues, they also have a time-based scheduling component. But there are other ways to achieve time-based task scheduling as well !
Of course, we could use the aforementioned system cron but the utility of it is really limited by virtue of it being at OS level. And a crontab can’t even express a single instance in time, they are always recurring (although there are extensions which can express this). So, is the only option to introduce these new components into your application? Or can we do something more judicious?
Yes, we can use our database and application server to build a flexible and robust cron system ourselves with a few lines of code!
Postgres is great for eventing
First, let’s talk about Postgres. Postgres is great and it’s actually sufficient for typical eventing use-cases. This is not just because we implemented a scalable event-delivery mechanism over Postgres, but also widely validated in the community. Here are few resources which talk Postgres for job queue like cases: https://layerci.com/blog/postgres-is-the-answer/ ,
https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/
Apart from the transactional guarantees which traditional RDBMSs provide, Postgres also comes with nifty features like SELECT FOR UPDATE SKIP LOCKED
which makes it very apt for building eventing systems. In fact, quite recently a new product called MessageDB was launched and it aims to solve the entire gamut of event and messaging capabilities with Postgres: https://blog.eventide-project.org/articles/announcing-message-db/ .
Now that we agree that Postgres is great for eventing (if you are not, the rest of the content might still convince you...so read on), we will see how we can make our cron service over Postgres.
How cron can be modeled using events
Let’s imagine that an instance of time is modeled as a row with a timestamp column. This means a cron schedule like * * * * *
(run something every minute) can be modeled as a table with rows that has a timestamp for every minute. So that’s an infinite table but it’s easy to cap the timestamp to some max value (say one day ahead or one year ahead, you can choose).
Now, let’s see how we can make use of this table.
In our application code, suppose we run a thread which does the following simple activity every minute:
- Fetches all rows that have timestamp <= current_time
- Delivers these event for processing and marks the row as “done”
What did we just achieve? Well, a cron system of course!
We can even run multiple threads and using Postgres tricks like SELECT FOR UPDATE SKIP LOCKED
parallely process them for horizontal scale. Obviously, we also need to keep generating new events for future: just have another thread that inserts rows with timestamps till next x hours (or any configurable value).
Advantages of event-based cron
Now, why would you ever do this when you have Sidekiq or Celery or system cron? Let’s see some of the added benefits of having event-based cron on Postgres:
- Ability to see future events and pause/cancel them if you want.
- Ability to create one-off events with a set timestamp (not possible in standard cron).
- Scale as needed by simply running more delivery threads.
- Store additional metadata inside a row for adding context, marking event as `dead`, etc
All these benefits are worth it!
Conclusion
Postgres is great for eventing and cron can be nicely modeled with events! It follows that Postgres is great as cron. We built Hasura Scheduled Triggers following this model and you can read more about the implementation in the docs or try it out on Hasura Cloud.