Over the past ten years, we collaborated with many PostgreSQL users. From users who run their own Postgres deployments, we often hear the question, “I can put Postgres in a container and deploy it. Could you help me with taking backups and HA?"
We hear this question because we also built several managed PostgreSQL solutions (including Heroku Postgres and Citus) over the years. And we saw that there was a huge difference between running PostgreSQL for yourself and running it for other people.
At Ubicloud, our managed cloud service enables us to run PostgreSQL for others. We open sourced our implementation, so we thought this was a good opportunity to talk about some of those differences. This blog post summarizes features commonly available in a managed service. It also describes interesting differences between running Postgres for yourself and others across four product areas - provisioning, backup/restore, HA, and security.
When you’re running PostgreSQL yourself, you’re primarily concerned about backups and HA. Users who’ve been running Postgres for a few years also think about version upgrades.
If you’re thinking about running Postgres for others, AWS, Azure, and Google had managed solutions for years and their features define the bar for customer expectations. The following diagram shows the most common features and also how Ubicloud’s architecture models the dependencies between these features.
First, you need to start with provisioning a database. Then you need to introduce DNS, certificates, backup / restore, internal APIs, UI, security, and so forth. These features are also all interconnected. For example, in order to have a highly available PostgreSQL database, you first need to have monitoring and you need to orchestrate failovers. Or, to have read replicas, first, you need to build backups.
This is also quite a simplistic view of what you need, but these are the must-have pieces. Since a blog post that covers all of these features would be quite lengthy, today, we’ll focus on four interesting product areas.
When you’re managing your own Postgres deployments, provisioning looks easy. You provision a VM, install the PostgreSQL binaries, and you give the connection string back to your users.
If you’re managing Postgres for others, things start to get more complicated. First, on top of installing PostgreSQL binaries, you usually have to install PostgreSQL extensions. PostgreSQL has a rich extension ecosystem and most users rely on various extensions. Thus, you also need to install at least the most commonly used extensions (around 80), so that each user can pick whatever they need. The problem with installing extensions is that some of the extensions are not packaged for you and you need to compile them. In order to compile these extensions, you also need to install their dependencies.
Technically, you can give connections back to the customers at this point, except that you also need to secure the database connection. To do that, you need to create a server certificate so that the user who connects to the PostgreSQL database can validate that they are connecting to the correct place.
The problem with the server certificate is that someone needs to sign it. Usually you would want that certificate to be signed by someone who is trusted globally like DigiCert. But that means you need to send an external request; and the provider at some point (usually in minutes but sometimes it can be hours) signs your certificate and returns it back to you. This time lag is unfortunately not acceptable for users. So most of the time you would sign the certificate yourself. This means you need to create a certificate authority (CA) and share it with the users so they can validate the certificate chain. Ideally, you would also create a different CA for each database.
Then, you also need to configure PostgreSQL settings. Users would request databases of different sizes with different CPU, memory, and disk combinations. If you’d like PostgreSQL to run efficiently, then you need to configure some PostgreSQL settings based on the underlying hardware configuration.
Finally, you also need to create DNS records. This way, the connection string doesn’t have an IP address but a more memorable DNS name. DNS record’s benefit isn’t just representation. In some cases you would need to perform a failover (for example if HA is enabled and the primary node is down). It is crucial for users to be able to continue the same connection string even if the underlying VM changes. Otherwise, failovers would require action from the user and this would defeat the purpose of managed PostgreSQL service.
As you see, we started with a very basic goal, but quickly realized there are many extra steps we needed to perform. An additional problem here is that we introduced too many extra steps and now provisioning takes about 10 minutes. Usually users would want to have their database much faster.
Luckily, we can apply three optimizations. First, we don’t need to download extensions and their dependencies and build them each time. Instead, we can create an OS image that has all the binaries baked in. This would reduce provisioning time from 10 to 3-4 minutes, which is a good improvement but is still quite high.
The next thing we can do is parallelize the remaining steps. Once we switch to using a baked OS image, provisioning reduces down to four steps:
None of the other steps depend on certificate creation, so we can separate out the certificate creation and run it in parallel with the other steps. Once the parallel steps complete, we can return the connection string back to the user. This reduces the database provisioning time to 2-2.5 minutes.
Most of the time, 2-2.5 minutes is good enough. As a final optimization, you can also create a pool of already provisioned databases and configure the pool for different instance sizes. When a provisioning request comes, you just need to configure the DNS record and then give the connection string to your user.
If you do this, then your provisioning times come down to 20-30 seconds - and this is in fact what most of the public cloud providers do. They optimize all provisioning steps and provision from an available pool of databases.
Our goal here is being able to restore the database to any minute in the past. For this, you need to take a full backup of your database and record incremental changes happening in your database. This feature is called point-in-time-restore (PITR) and comes in handy when you want to have other features, such as forking your Postgres database.
Full backup is easy, you can use any backup tool to take full backup of the system. Normally, recording all incremental changes would be difficult. Fortunately, most databases including Postgres already have a concept called write-ahead log files (WAL), which records all incremental changes in the database. WAL’s primary motivation is crash recovery, but it also makes PITR much easier.
So, the process looks like the following. For backups, we take daily full backups and also store all WAL files. Then, if the user wants to restore the database, we use the most recent full backup before the target date. From that target date, we apply all WAL files one by one until we reach the target time. For example, if the user wants to restore March 10th at 14:22 PST, we create a database from March 10 00:00 PST and apply about 14 hours of WAL files to reach the target time.
This works well on paper, but comes with two problems in PostgreSQL.
The first issue is “low activity”. If the user isn’t actively working on the database, PostgreSQL won’t generate a WAL file each minute. WAL files are by default 16 MB and PostgreSQL waits for the 16 MB to fill up. So, your restore granularity could be much longer, and you may not be able to restore to a particular minute in time.
You can overcome this problem by setting the archive_timeout and forcing Postgres to generate a new WAL file every minute. With this configuration, Postgres would create a new WAL file when it hits the 1-minute or 16 MB threshold, whichever comes first.
The second issue with backup/restore is “no activity”. In this case, PostgreSQL wouldn’t create a new file even if the archive_timeout is set. As a solution, you can generate artificial write activity by calling pg_current_xact_id().
HA has a simple workflow. You provision a primary and standby database and then run regular health checks on both databases. At Ubicloud, we use a simple SELECT 1; to reason about the database’s health; and then trigger a failover if we deem the database to be unhealthy.
In case of a primary failover, the standby becomes the new primary and we provision a new standby. In some cases, you might want to recover the old primary back and add it as a standby. At Ubicloud, we prefer to bring up a fresh standby every time. In our opinion, trying to reuse primary introduces too much complexity for little benefit.
Problems? HA has many pitfalls but one is more important than the others - fencing the primary. If the primary is really down, then it’s easy, as it is already fenced. However, sometimes the primary may still be healthy and the problem could be the network connectivity between the control plane and the database. Or maybe, the primary is really down, but once you do a failover, it comes back online.
The risk in this case is that some customers connect to the new primary, and some to the old one; they each write some data, which means you lose some of the written data. For this reason, fencing the primary is critical. It’s one of the steps where there is no wiggle room for error.
Further, since we can’t failover before fencing the primary, any second we spend here directly affects the down time, so we need to be fast. For this reason, we apply all fencing solutions in parallel. First, if the server is accessible, we kill the PostgreSQL database and ensure that it can’t come back up. Second, we detach the network interface, so no new connections can arrive to or exit from the VM. Third, we deprovision the VM entirely.
There is also one non-solution - updating the DNS records. You might think that if you update the DNS records to point to a new IP address, then no one can connect to the old one. Unfortunately, DNS has TTL value settings and within that time window, customers can still connect to the old primary. Even if you set a low enough TTL, some clients don’t honor it.
All considered, updating DNS isn’t a reliable way to achieve fencing. We still do this at Ubicloud, not for fencing, but to provide a good user experience to our customers.
Did you know that with one simple trick you can drop to the OS from PostgreSQL and managed service providers hate that? The trick is COPY table_name from COMMAND. COPY is a useful Postgres command to copy data between files and tables. However, COPY also has the option to run COMMAND. So, you can run OS commands like the following.
With these commands, you’re basically seeing the files in your Postgres directory. If the PostgreSQL user isn’t isolated enough, you could run COPY and read the files that you shouldn’t have access to or install binaries on the OS.
There are two caveats. First, you need to have superuser access on the PostgreSQL database for this and most managed Postgres services don’t give this level of access (at Ubicloud however, we give you superuser access).
Second, and more importantly, you run the command as a “postgres” OS user. This way, as a managed service provider, you can isolate the OS user to only accessing database files at the OS level. Then, to add an additional level of security, you can further isolate each database by placing them on a separate VM. Even better, you can isolate each server’s network from others. This way, VMs won’t be able to communicate with each other.
Naturally, if you’re running PostgreSQL for yourself, none of these security considerations matter. You typically already control the underlying machine and the network. When you’re running PostgreSQL for others, you need to use well-established boundaries for security isolation.
Our team built managed PostgreSQL solutions in different companies over the past ten years. In that time, we observed that there were important differences between running PostgreSQL for yourself and managing it for others. In this blog post, we described some of those differences.
To our knowledge, this is the first time a cloud provider is sharing those differences and their high level approach to building a managed Postgres service. If you have any questions or comments about our approach, please drop us an email at info@ubicloud.com. We’d be happy to chat.