Database in Docker

Thou shalt not run a prod database inside a container

Thou shalt not run a prod database inside a container

For stateless application services, the container is a perfect development and operation solution. However, for a service with a persistent state - the database, things are not that simple. As a developer, I really like Docker and believe that Docker and Kubernetes are the standard way to deploy and deploy software for future development. But as a DBA, I think the database in the container is a nightmare for operation and maintenance. ** Whether the database of the production environment should be placed in the container is still a controversial issue. But the truth is always more and more clarified. Today I will talk to you about why it is a bad idea to put the production database into the container.

But the truth is always getting more clear with more debat and more practice. In this document, I will show you why it is a bad idea to putting production database into docker.

What problems does Docker solve?

Let’s get start by looking at Docker’s self-description:

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.

The words Docker uses to describe itself include: lightweight, standardized, portable, cost savings, increased efficiency, automatic, integrated, and efficient operation and maintenance. So much benifits, and there is no problem claiming that., Docker does make development and operations much easier in most cases. So we can see that’s why so many companies are so eager to Dockerize their software and services.

But sometimes this enthusiasm goes to the other extreme side: Containerize all software, EVEN A PRODUCTION DATABASE. Containers are originally designed for stateless applications, and temporary data generated by applications within a container is also considering part of that container. Create a service with a container and destroy it after you run out. That is the typical usecase.

These apps themselves have no state, and the state is usually stored in a database outside the container, which is the classic architecture and usage, and the philosophy of docker. But when we put the database inside the container, things are different. Database is stateful service, and in order to maintain this state regardless container runs and leave, the DB container needs to drill a hole to the underlying operating system, which is named data volume.

Such a container is no longer an object that can be created, destroyed, transported, transferred at will, but it is bound to the underlying environment, so many advantages of typical container use case no longer apply to database containers.

Reliability

Getting the software up & run and making the software run reliably are two different things.

Database is the core of almost all information systems. It is a CRITICAL service among whole IT systems. Here CRITICAL is literally explained: DEAD WHEN FAILURE HAPPENS. If application is down, you can pull it up later. But if your database is scraped and don’t have a backup, then it is dead for good.

This is similiar to our daily software circumstance: Word/Excel/PPT can corrupt and it is not a big deal to pulling them up. But if your critical document corrupted or lost, it is really a mess. Database is similiar for many companies: if the database is deleted and there is no backup, it may down. Reliability is the most important attribute of a database. Reliability (reliability) is the ability of the system to function properly in adversity (Hardware failure, software failure, human error) (completly functional correctly and achieve the desired level of performance). Note that reliability differs from availability. Reliability means fault tolerance AND Resilient. Availability can usually be measured by a serveral nines, a percentage representing the expectation ratio of the system uptime. Reliability is hard to mearsure, it can only be proved by continuously running, or falsify by failure. Therefore, reliability is a safty property and is not that intuitive or measurable as performance and maintainability.

Safty matters, many people tend to ignore the most important attribute —— safety. They only aware of that when people get killed, get robbed, get sickness, get car accident, get database dropped, etc…. Only after that, people would feel regret.

So, Docker’s self-description does not contain words like “reliable”, which is the most important attribute for database. Docker do know what it is capable of.

Additional failure points

When comparing with bare metal, Putting a database inside docker does not reduce the probability of hardware failures, software errors, or human errors. Instead, **the overall reliability of the system decreases ** due to additional components, additional complexity, and additional failure points. The database container needs to be bound to a specific machine via data-volumn, so there hardware failure remains the same.

Standardized deployment may seems great by the first look. but there is no essential difference between scripts and dockerfile. At least for software bugs. It is mainly becauce of poor application design, which is a problem docker can not help with. So does human errors.

Additional components will introduce additional failure points, and the introduction of docker will not only involve us into problems with docker itself, but also the conjunct point between docker and other existing component. So, when failure occurs, it may be a problem with the database, a problem with the docker, a problem of the interaction between database & docker, or a problem because of interaction between docker & OS.

Take a look at the official PostgreSQL Docker image Issue list: Https://github.com/docker-library/postgres/issues. You can find a long list there. There are tons of additional problems when putting database into docker. Let me give you the simplest example: What will the database be if the dockerd daemon collapses? It will definitly break and out of service. Another subtle example is running two PostgreSQL instances on the same data directory, (2 docker on same volume, or 1 inside 1 outside). You data will be toasted without proper fencing, and these are problems that never gonna happen on bare metal.

Reliability Proof and community knowledge

As mentioned earlier, reliability does not a good way to measure. The only reliable is prove itself by contiously running correctly for a long time. Deploy database on bare metal is the traditional way of doing things, it has been proved by continuous work for serveral decades. Docker is a revolution to DevOps, but it is still too young , five years old is still much too short for critical things like procdution database. No enough lab rat yet.

In addition to long-term running, There is another way to “increase” reliability, which is failure. Failure is very valuable experience, it turns uncertainties into certainties, turns unknown problems into known problems. Failure experience is the most valueable part of operators. It is the form of operation knowledge, and it is the way for community to accumulate knowledge. For PostgreSQL, most of the community experience is based on the assumption of bare metal deployment, Variant failure has been explored by many people for decades. If you encounter some db problems. You are very likely to find similar situation other community members already been through, and find corresponding solutions. But if you search the same problem with additional keyword ‘Docker’, you would find a lot less useful information. Which means when something nasty happens, the probability of successfully rescuing the data is much lower, and the time required to resume would be much longer

Another subtle thing is, Companies and individuals are reluctant to share these failure experience if there are not special reasons. For companies, failure report may undermines company’s reputation, it may expose sensitive information or may expose how rubbish the infurstructure is. For individuals, the fault experience is almost the most important part of their values. Once shared, their value undermined. Ops/DBA is not that open than Dev. That is the very reason why docker kubernate operator exist: trying to make operation experience codify and able to accumulate. But that is really naive by now. Since few people would like to share these. You can find rubbish every where. Like the official PostgreSQL Docker image, it lack tons of tooling & tunning & settings to work effeciently like a real-world database.

Tooling

Database requires lots of tools to maintain, including: operations scripts, deployments, backups, archives, Failover, Major/Minor version upgrades, plugin installation, connection pooling, performance analysis, monitoring, tuning, inspection, repairing, etc. Most of these tools are also designed for bare metal deployments. These tooling are critical too, without adequate testing, we can’t really put that into use. Makes a thing up & running, and make things running steady for a long-time is complete different level of reliability.

Plugins is the typical example. PostgreSQL have lots of useful plugins, such as PostGIS. If you want to install the plugin to database, All you need is just typing yum install' and then create extension postgis on the bare metal. But doing it the Docker way, you need to modify the Dockerfile, build a new image, push it to the server, and then Restart the database container. No doubt that is much more complicated.

Similar problems including some CLI tools and system commands. They can preform on host in theory, but you can’t asure the execution & result have exact same meanning. And when emergency situation happens and you need some tools that doesn’t included in container, and you don’t have Internet access or yum repository. You would have to go through Dockerfile → Build Image → Restart Container. That is really insane.

When refer to monitoring, docker makes things harder. There are many subtle differences between monitoring in containers and monitoring on bare metal. For example, on bare metal, the sum of different modes of the CPU time and will always be 100%, but such assumptions do not always apply inside the container. In traditional bare metals, Node level metrics are important part of database indicators. it make monitoring a lot worse when database container is mixing deployed with application. Of course, if you using docker in a VM’s manner, many things still likely to work, but in that way we will loose the real value of using Docker.

Scalability

Performance is another point that people concerned a lot. From the performance perspective, the basic principle of database deployment is: The close to hardware, The better it is. Additional isolation & abstraction layer is bad for database performance. More isolation means more overhead, even if it is just an additional memcpy in the kernel .

For performance-seeking scenarios, some databases choose to bypass the operating system’s page management mechanism to operate the disk directly, while some databases may even use FPGA or GPU to speed up query processing. Docker as a lightweight container, performance suffers not much, and the impact to performance-insensitive scenarios may not be significant. But the extra abstract layer will definitely make performance worse than make it better.

Isolation

Docker provides process-level isolation. Database values isolation, but not this kind of isolation. Database performance is critical, so the typical deployment is take a whole physical machine exclusivly. with some necessary tools in addition. there will be no other applications. Even when using docker, we’d give it a whole physical machine.

Therefore, the isolation provided by Docker is useful for multi-tenant oversold by Cloud database vendors. But for other cases, it does not make much sense for deploying database.

Scales out

For stateless applications, using containers makes scale out incredibly simple, and it doesn’t matter which node you can schedule at will. But this does’t apply to database or some stateful application, you can not create or destory database instances freely as appserver: for example, to create a new replica, you have to pull it from primary whether you are using docker or not. It may take serval hours to copy serval TB data in production. And this still require manual intervention & inspection & verification. So what is the essence difference between running a ready-made make-replica script and running docker run . Time are spending on making new replicas.

Maintainability

Most of the software cost spending on operation phase rather than development phase: fixing vulnerabilities, keeping the system up and running, handling failures, version upgrades, migration, repaying tech debt, etc… Maintainability is very important for the quality of work & life of operators . That is the most pleasing part of Docker: Infrastructure as code. We can say that docker’s greatest value lies in its ability to deposit operational experience of software into reusable code, accumulating it in an easy way, rather than having a brunch of install/setup document & scripts scattered across everywhere. From this perspective, I think docker has done a great job, especially for stateless applications where logic is constantly changing. Docker and kubernates allow us to easily deploy, complete expansion, shrinkage, release, rolling upgrades, and so on, so that Dev can also be able to work as an OPS, so that OPS can also be able to DBA life (plausible).

But can these conclusion be applied to database? Once initialized, database does not require frequent environment changes. It may continuously running years without big change. DBAs typically accumulate a lot of maintenance scripts, the one-key configuration environment isn’t much slower than the Docker way, and the number of environments that need to be configured and initialized is relatively small, so the convenience of the container in terms of environmental configuration does not have a significant advantage. For daily operations, it is not possible for a database container to destroy creation and restart the migration as freely as the application container. Many operations need to be performed through the docker exec approach: In fact, they may still running the exact same script, but the steps has become much more cumbersome.

Docker prefer to say things like it is easy to upgrade software with docker. It is true for minor version: simply modifying the version in the Dockerfile and rebuild the image, then restarting the database container. However, when we need a major version upgrade, this is the way to do binary upgrade in docker: Https://github.com/tianon/docker-postgres-upgrade, and I can archieve that in serval lines of bash scripts.

And, it take more effort to use some existing tools with docker exec. For example, docker exec will mix the stdin and stderr, Which makes a lot of tools that rely on pipe does not work anymore. For example, if you want perform an ETL to transfer a table to another instance, in traditional way:

psql <src-url> -c 'COPY tbl TO STDOUT' |\
psql <dst-url> -c 'COPY tdb FROM STDIN'

with docker, things are more complicated

docker exec -it srcpg gosu postgres bash -c "psql -c \"COPY tbl TO STDOUT\" 2>/dev/null" |\ docker exec -i dstpg gosu postgres psql -c 'COPY tbl FROM STDIN;'

And if you want to take a basebackup from postgres inside container, and does not install PostgreSQL on host machine, you would have to run this command with a lot of extra wrapper:

docker exec -i postgres_pg_1 gosu postgres bash -c 'pg_basebackup -Xf -Ft -c fast -D - 2>/dev/null' | tar -xC /tmp/backup/basebackup

In fact, it is not Docker that elevates the daily operations experience, but the tools such as ansible. Containers may be faster in building a database environment, but such tasks are very rare. Thus, if the database container cannot be dispatched as freely as appserver, scales quickly, and does not bring more convenience to the initial setup, daily operations, and emergency troubleshooting than ordinary scripting, why should we putting the production database into docker?

I think maybe it’s because a rough image solution would still be better than setup blindly without DBA. Container technology and orchestration technology is very valuable for operation and maintenance, it actually fills the gap between software and service. Its vision is to modularize the experience and ability of operation and maintenance. Docker & kubernates would become the standard way of package management, and orchestration in the further. And envolv into something like “DataCenter DistributedCluster OperatingSystem”, and become the underlying infrastructure of all software, became the universal runtime. After those major uncertainly been elimiated, we can then put our application & valuable database inside that. But for now, at least for the production database, it’s just a good vision.

Summary

Finally, I must emphasized that the above discussion is only limited to the production database . That is to say, for db in development env, or application in production env, I am also very supportive of using docker. But when refer to production databases. if this data is really important, the we should ask ourselves three questions before come into it:

  • Am I willing to be a lab rat ?
  • Can I hold the problems ?
  • Can I take the consquence?

Any technical decision is some sort of trade-off, putting a production database into a container, the critical trade-off is **sacrificing reliability in exchange for maintainability **.

There are some scenarios where data reliability is not so important, or there are other considerations: for cloud service vendors, for example, it’s a great scenario for putting database inside docker. Container isolation, high resource utilization, and management convenience fit all requirement in that scenario. But for most cases, reliability has the highest priority, sacrificing reliability in exchange for maintainability is not advisable.

Postgres Development Convention

PostgreSQL Development convention

This is a PostgreSQL convention I wrote for previous company. Could be a useful reference when dealing with database. For now it’s just an index. detailed information will be traslated when got time.

0x00 Background

Nothing can be accomplished without norms or standards.

PostgreSQL is a powerful database. But to make it powerful all the time, it is a team work of Dev, Ops & DBA.

0x01 Naming Convention

Naming is beginning of everthing.

[REQUIRED] Common naming rule

[REQUIRED] Database naming rule

[REQUIRED] Role naming rule

[REQUIRED] Schema naming rule

[OPTIONAL] Table naming rule

[OPTIONAL] Index naming rule

[OPTIONAL] Function naming rule

[OPTIONAL] Column naming rule

[OPTIONAL] Variable and Parameter naming rule

0x02 Designing Convention

Suum cuique

[REQUIRED] Character Encoding must be UTF-8

[REQUIRED] Capacity Planning

[REQUIRED] Do not abuse stroed procedure

[REQUIRED] Separation of storage and calculation

[REQUIRED] Primary key and IDENTITY

[REQUIRED] Beware of foreign key

[REQUIRED] Beware of trigger

[REQUIRED] Avoid wide tables

[REQUIRED] Add default value to column

[REQUIRED] Handle nullable with caution

[REQUIRED] Unique constraint should be forced by database

[REQUIRED] Beware of integer overflow

[REQUIRED] Use Timestamp without timezone, force timezone to UTC

[REQUIRED] DROP obsolete function in time

[OPTIONAL] Data Type of primary key

[OPTIONAL] Use proper data types

[OPTIONAL] Use ENUM for stable and small valuespace fields

[OPTIONAL] Choose right text types.

[OPTIONAL] Choose right numeric types

[OPTIONAL] Function format

[OPTIONAL] Design for evolvability

[OPTIONAL] Choose right norm level

[OPTIONAL] Embrace new database version

[OPTIONAL] Use radical feature with caution

[OPTIONAL] Choose right norm level

0x03 Indexing Convention

Wer Ordnung hält, ist nur zu faul zum Suchen.

[REQUIRED] OLTP queries must have corresponding index

[REQUIRED] Never build index on wide field

[REQUIRED] Explicit with null ordering

[REQUIRED] Handle KNN problem with GiST index

[OPTIONAL] Make use of function index

[OPTIONAL] Make use of partial index

[OPTIONAL] Make use of BRIN index

[OPTIONAL] Beware of selectivity

0x04 Querying Convention

The limits of my language mean the limits of my world.

[REQUIRED] Separation of read and write

[REQUIRED] Separation of fast and slow

[REQUIRED] Set timeout for queries

[REQUIRED] Beware of replication lag

[REQUIRED] Use connection pooling

[REQUIRED] Changing connection state is forbidden

[REQUIRED] Have retry mechanism for aborted transaction

[REQUIRED] Have reconnecting mechanism

[REQUIRED] Execute DDL in production application code is forbidden

[REQUIRED] Using explicit schema names

[REQUIRED] Using explicit table names when involve join

[REQUIRED] Have reconnecting mechanism

[REQUIRED] Full table scan is forbidden in OLTP systems

[REQUIRED] Idle in Transaction for a long time is forbidden

[REQUIRED] Close cursors

[REQUIRED] Beware of NULL

[REQUIRED] Beware of null input on aggragation

[REQUIRED] Beware of hole in serial space

[OPTIONAL] Use prepared statement for repeat queries

[OPTIONAL] Use right isolation levels

[OPTIONAL] Do not tell existance by count

[OPTIONAL] Use returning clause

[OPTIONAL] Use upsert

[OPTIONAL] Use advisory locks to avoid contention

[OPTIONAL] Optimize IN operator

[OPTIONAL] Do not use left fuzzy search

[OPTIONAL] Use array instead of temporary table

0x05 Deploying Convention

[REQUIRED] Follow the deploy procdure

[REQUIRED] Deploy request format

[REQUIRED] Deployment review rules

[REQUIRED] Deployment time window

0x06 Operation Convention

[REQUIRED] Take care of backups

[REQUIRED] Take care of ages

[REQUIRED] Take care of bloats

[REQUIRED] Take care of replication lags

[REQUIRED] Take care of resource consumption

[REQUIRED] Minimal privilege princple

[REQUIRED] CREATE INDEX CONCURRENTLY

[REQUIRED] Warmming before taking real traffic

[REQUIRED] Doing schema migration with caution

[OPTIONAL] Split batch operation

[OPTIONAL] Speed up bulk load