Comparison of Managed PostgreSQL Providers: AWS RDS vs Google Cloud SQL vs Azure PostgreSQL
Hey there! My name is Gunar. I’m a Software Development consultant from London. I’ve been a happy user of Hasura for many years now and I was happy to collaborate with Hasura for this blog post. We’ll be talking about managed Postgres cloud providers.
Hasura gives us instant realtime GraphQL & REST APIs on new and existing Postgres databases. While building a new application with Hasura, I have often had to go through the choices of which managed cloud provider to pick. This is a post that is meant to ease that decision making process where I compare some of the popular providers based on the vectors I have used for my clients.
That is, service providers that perform all maintenance tasks for us. These services allow us to have a reliable database on the cloud, without having to worry about applying updates, managing servers, and keeping a big infra team just to maintain our database layer.
We’re going to be looking at six cloud providers:
- AWS
- GCloud
- Azure
- DigitalOcean
- Heroku
- ElephantSQL
Table of Contents
- Overall Comparison
- High Availability
- Auto-scaling
- Postgres Versions
- Postgres Extensions
- Automated Backups
- Monitoring and Metrics
- Security
- Costs
- Conclusion: Which provider should I pick?
Thought I’d start off with the comparison table right off the bat. And let you get a general overview before diving into the deets. As one might’ve expected, AWS is setting the bar, with GCloud and Azure tied for a close second. Then we have slightly simpler solutions like DigitalOcean, Heroku, and ElephantSQL—not too far behind either!
Databases are complex, multi-faceted creatures so we'll be looking at multiple characteristics. Which are auto-scaling, monitoring and metrics, costs, versions, extensions, backups, high availability, and security. If you really want to get a good picture of these services, I’d recommend diving in and reading each section.
I couldn't comment on the price as different providers offer different benefits. However, the costs section will give us an idea of how much a single server database might set us back with each of these providers.
Generally speaking:
- If you’re an indie hacker or small startup, a simpler solution like DigitalOcean, Heroku, or ElephantSQL might be a good fit. These providers have simpler cost structures and their offerings come in pre-defined packages.
- If you’re an enterprise company, you’ll want the stability and guarantees of big players such as AWS, GCloud, and Azure.
Might be good to remind you that Hasura and Hasura Cloud work with all providers.
High Availability speaks to data redundancy, availability, and reliability. All providers offer this capability, through read-only followers and automatic failover.
As you can see below, most providers make the transition in the background so that you don’t have to tell the application layer to point to a new Postgres URL. All but Heroku. Heroku will update the environment variable DATABASE_URL for your Dynos automatically (which triggers a restart) but this obviously won’t work if you have services hosted on non-Heroku servers.
Please note that Google Cloud SQL does NOT trigger a failover event during maintenance. Both the primary and failover instances go down for maintenance.
Also, quick shout out to the Azure team. They’ve taken High Availability one step further in simplifying and hiding the complexity away. For example, even the Single Server plan, in fact, stores “three copies of data within a region to ensure data redundancy, availability, and reliability.” So you get High Availability for the price of a single server and that’s pretty cool.
Auto-scaling is a relatively new capability of managed databases. Currently AWS and Azure support auto-scaling of both the storage and server, and Google Cloud supports auto-scaling storage only.
- Auto-scaling storage means the provider will automatically provision more disk space as required by the database.
- Auto-scaling servers means the provider will automatically scale up the database, either vertically or horizontally. The most common operation is spinning up more read-only replicas, but beefing up the primary instance is possible too.
Auto-scaling is opt-in but make sure it's properly configured. Automatically-scaling capacity also automatically scales costs. Also, it's usually not possible to scale storage down afterwards to match your actual usage and needs.
Postgres is an actively maintained open-source project. As such, there are multiple major versions that are stable. For simple applications versions 10 or 11 might be enough, however you might have a legacy system which depends on 9.6, or you might need the newest features included in version 12 and 13 (e.g. incremental sorting).
On this aspect, I was a bit disappointed by Azure. It’s the year 2021 and Azure still doesn’t support versions 12 and 13. This not only prevents us from using those sleek, cost-saving new features but also tells us that they’ll be slow to implement further new major versions as they’re released.
Hasura GraphQL Engine supports Postgres versions 9.5 and above, and Hasura Actions are supported from Postgres 10 and above.
Postgres extensions allow your database to do more. Examples of popular extensions are pg_crypto (for crypto functions like hashing) and PostGIS (for spherically-aware geographical calculations).
When using a managed Postgres provider, you mostly don’t have to worry about the availability of extensions. Most providers offer a wide range of extensions which most likely include the ones you’ll be needing. That being said, if you still want to check for specific extensions please follow the links below.
Managing your own backups can be a stressful and thankless job. It’s easy to make mistakes, forget to check your backups, or perhaps even overwrite them. Also, you have to make sure the backup server doesn’t run out of disk space.
This is where cloud providers shine. You can rely on the big cloud providers’ infrastructure to keep database backups for you. You’ll be able to sleep at night knowing that your data is secured with the same infrastructure as NASA’s.
All providers offer automated backup solutions, including point-in-time-restore. By using a combination of routine snapshots and incremental logs providers are able to recover data as it was on a specific date and time. Also, I’ve found Google’s approach to be quite interesting and thought worth sharing: “Google Cloud SQL backups are incremental. When the oldest backup is deleted, the size of the next oldest backup increases so that a full backup still exists.”
Observability is crucial. Would you be able to drive a car with blindfolds on? All providers analyzed here offer similar monitoring capabilities. Common metrics across these providers are network throughput, number of connections, I/O read/write, disk space, CPU, and memory. The only outlier was Heroku which requires you to choose and install an addon.
* Requires you to install an addon.
Security is multi-faceted. You can’t just generically say “provider X is secure.” Creating a secure infrastructure is a mix of technical and human systems that must work harmoniously.
So I’ve looked at some specific, objective metrics:
- Encryption in transit is expected of course, it’s the year 2021 and HTTPS/SSL is ubiquitous.
- Encryption at rest means the data is never stored as cleartext (i.e. unencrypted) in any servers ever. This has the obvious benefit that if someone were to physically enter the datacenter (or perhaps an ill-intentioned employee) and steal the HDDs, they wouldn’t be able to access your data. However, that’s very unlikely and really only a box to check in order to be compliant with statutes like SOC2.
- SOC 2 “is an auditing procedure that ensures your service providers securely manage your data to protect the interests of your organization and the privacy of its clients.” It’s a minimum requirement for pretty much any kind of enterprise operations.
* ElephantSQL says data “can be encrypted […] at rest.” (emphasis mine) so it seems that it’s not the default option and that you have to explicitly request it.
As mentioned in the intro, it’s tough to compare costs as you’re often comparing apples to oranges. Different providers offer different benefits. Some providers allow you to be very precise in the specs you want to hire—like allowing you to pick exactly how many vCPUs you want—while others sell packages which won’t allow you to even choose disk size.
My suggestion is, use the other parameters of this research to narrow down your choices, then calculate and estimate costs for each provider for your use case.
With this caveat, I’ve picked a fixed spec for comparison across providers and did my best effort to try and match this specific spec:
- 8GB RAM
- SSD storage
- 2 vCPU (dedicated)
- Servers in East US
- Single availability zone (no follower)
Using a managed cloud database, there are multiple items you’ll be paying for:
- Compute time
- Network data egress
- Data storage space
- Backup storage space
There are basically two groups. The “big three” AWS, GCloud, and Azure charge you per unit consumed, whereas DigitalOcean, Heroku, and ElephantSQL offer packages with limited customization.
* 7.5GB instead of 8GB RAM (db.m1.large)
† 10GB instead of 8GB RAM
‡ Generic Internet egress (i.e. any region)
§ Included (it’s a package)
We’ve analyzed six providers from multiple angles. They are all pretty competitive and overall exhibit the same features, however sometimes with different names and conceptual models.
All things equal, there are only three major differences that I’ve noticed:
- You might need a postgres version that a given provider does not offer
- You might expect your database load to vary greatly and a given provider might not offer auto-scaling
- You might need high availability and hence failovers need to keep a static IP so that applications can automatically reconnect
Of course, at the end of the day choosing a cloud provider is also an economic question. How do you get the most bang for your buck? As you’ve seen in the “costs” section above, estimating costs generically is imprecise as sometimes you’d be comparing apples to oranges. So I suggest using the three bullet-points above to narrow down your choices, then running cost estimates for your specific application with each of the providers.
That’s all for today. Let me know on twitter or in the comments below if you have any questions, suggestions, or feedback! Looking forward to hearing from you.