This is the blog section. It has two categories: News and Releases.
Files in these directories will be listed in reverse chronological order.
This the multi-page printable view of this section. Click here to print.
This is the blog section. It has two categories: News and Releases.
Files in these directories will be listed in reverse chronological order.
One-Line Installation
Run this on meta node /bin/bash -c "$(curl -fsSL https://pigsty.cc/install)"
MetaDB provisioning
Now you can use pgsql database on meta node as inventory instead of static yaml file affter bootstrap.
Add Loki & Prometail as optinal logging collector
Now you can view, query, search postgres|pgbouncer|patroni logs with Grafana UI (PG Instance Log)
Pigsty CLI/GUI (beta)
Mange you pigsty deployment with much more human-friendly command line interface.
connection reset by peer
entries in postgres log caused by Haproxy health check.Connect Reset Exception
in patroni logs caused by haproxy health checklog_min_duration_statement=1s
for dbuser_monitor
to get ride of monitor logs.pgbouncer-create-user
does not handle md5 password properlyMakefile
entriesnode_disable_swap
to false
node_sysctl_params
.grafana_plugin
default install
will now download from CDN if plugins not existsrepo_url_packages
now download rpm via pigsty CDN to accelerate.proxy_env.no_proxy
now add pigsty CDN to noproxy
sites。grafana_customize
set to false
by default,enable it means install pigsty pro UI.node_admin_pk_current
add current user’s ~/.ssh/id_rsa.pub
to admin pksloki_clean
whether to cleanup existing loki data during initloki_data_dir
set default data dir for loki logging servicepromtail_enabled
enabling promtail logging agent service?promtail_clean
remove existing promtail status during init?promtail_port
default port used by promtail, 9080 by defaultpromtail_status_file
location of promtail status filepromtail_send_url
endpoint of loki service which receives log dataPigsty now is in RC status with guaranteed API stability.
Role vip
and haproxy
are merged into service
.
#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100 # default load balance weight (instance level)
# - service - #
pg_services: # how to expose postgres service in cluster?
# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary # service name {{ pg_cluster }}_primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
pg_services_extra: [] # extra services to be added
# - haproxy - #
haproxy_enabled: true # enable haproxy among every cluster members
haproxy_reload: true # reload haproxy after config
haproxy_policy: roundrobin # roundrobin, leastconn
haproxy_admin_auth_enabled: false # enable authentication for haproxy admin?
haproxy_admin_username: admin # default haproxy admin username
haproxy_admin_password: admin # default haproxy admin password
haproxy_exporter_port: 9101 # default admin/exporter port
haproxy_client_timeout: 3h # client side connection timeout
haproxy_server_timeout: 3h # server side connection timeout
# - vip - #
vip_mode: none # none | l2 | l4
vip_reload: true # whether reload service after config
# vip_address: 127.0.0.1 # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24 # virtual ip address cidr mask (l2 only)
# vip_interface: eth0 # virtual ip network interface (l2 only)
New Options
# - localization - #
pg_encoding: UTF8 # default to UTF8
pg_locale: C # default to C
pg_lc_collate: C # default to C
pg_lc_ctype: en_US.UTF8 # default to en_US.UTF8
pg_reload: true # reload postgres after hba changes
vip_mode: none # none | l2 | l4
vip_reload: true # whether reload service after config
Remove Options
haproxy_check_port # covered by service options
haproxy_primary_port
haproxy_replica_port
haproxy_backend_port
haproxy_weight
haproxy_weight_fallback
vip_enabled # replace by vip_mode
pg_services
and pg_services_extra
Defines the services in cluster:
A service has some mandatory fields:
name
: service’s namesrc_port
: which port to listen and expose service?selector
: which instances belonging to this service? # default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
Add additional locale support: lc_ctype
and lc_collate
.
It’s mainly because of pg_trgm
’s weird behavior on i18n characters.
pg_databases:
- name: meta # name is the only required field for a database
# owner: postgres # optional, database owner
# template: template1 # optional, template1 by default
# encoding: UTF8 # optional, UTF8 by default , must same as template database, leave blank to set to db default
# locale: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
# tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
Monitor Only Deployment
Database/User Management
pgsql-createdb.yml
and pgsql-createuser.yml
to mange user/db on running clusters.New Options
prometheus_sd_target: batch # batch|single
exporter_install: none # none|yum|binary
exporter_repo_url: '' # add to yum repo if set
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes' # default opts for node_exporter
pg_exporter_url: '' # optional, overwrite default pg_exporter target
pgbouncer_exporter_url: '' # optional, overwrite default pgbouncer_expoter target
Remove Options
exporter_binary_install: false # covered by exporter_install
Structure Changes
pg_default_roles # refer to pg_users
pg_users # refer to pg_users
pg_databases # refer to pg_databases
Rename Options
pg_default_privilegs -> pg_default_privileges # fix typo
Monitoring Provisioning Enhancement
Haproxy Enhancement
Security Enhancement
Software Update
Prometheus 2.25 / Grafana 7.4 / Consul 1.9.3 / Node Exporter 1.1 / PG Exporter 0.3.2
New Config Entries
service_registry: consul # none | consul | etcd | both
prometheus_options: '--storage.tsdb.retention=30d' # prometheus cli opts
prometheus_sd_method: consul # Prometheus service discovery method:static|consul
prometheus_sd_interval: 2s # Prometheus service discovery refresh interval
pg_offline_query: false # set to true to allow offline queries on this instance
node_exporter_enabled: true # enabling Node Exporter
pg_exporter_enabled: true # enabling PG Exporter
pgbouncer_exporter_enabled: true # enabling Pgbouncer Exporter
export_binary_install: false # install Node/PG Exporter via copy binary
dcs_disable_purge: false # force dcs_exists_action = abort to avoid dcs purge
pg_disable_purge: false # force pg_exists_action = abort to avoid pg purge
haproxy_weight: 100 # relative lb weight for backend instance
haproxy_weight_fallback: 1 # primary server weight in replica service group
Obsolete Config Entries
prometheus_metrics_path # duplicate with exporter_metrics_path
prometheus_retention # covered by `prometheus_options`
Database provisioning interface enhancement #33
pg_databases: # create a business database 'meta'
- name: meta
schemas: [meta] # create extra schema named 'meta'
extensions: [{name: postgis}] # create extra extension postgis
parameters: # overwrite database meta's default search_path
search_path: public, monitor
pg_databases:
- name: meta # name is the only required field for a database
owner: postgres # optional, database owner
template: template1 # optional, template1 by default
encoding: UTF8 # optional, UTF8 by default
locale: C # optional, C by default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
template
, encoding
, locale
, allowconn
, tablespace
, connlimit
revokeconn
, which revoke connect privileges from public for this databasecomment
field for databaseYou can create new database on running postgres clusters with pgsql-createdb.yml
playbook.
pg_database
to playbook../pgsql-createdb.yml -e pg_database=<your_new_database_name>
User provisioning interface enhancement #34
pg_users:
- username: test # example production user have read-write access
password: test # example user's password
options: LOGIN # extra options
groups: [ dbrole_readwrite ] # dborole_admin|dbrole_readwrite|dbrole_readonly
comment: default test user for production usage
pgbouncer: true # add to pgbouncer
pg_users:
# complete example of user/role definition for production user
- name: dbuser_meta # example production user have read-write access
password: DBUser.Meta # example user's password, can be encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
connlimit: -1 # connection limit, -1 disable limit
expire_at: '2030-12-31' # 'timestamp' when this role is expired
expire_in: 365 # now + n days when this role is expired (OVERWRITE expire_at)
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
parameters: # user's default search path
search_path: public
comment: test user
username
field rename to name
groups
field rename to roles
options
now split into separated configration entries:
login
, superuser
, createdb
, createrole
, inherit
, replication
,bypassrls
,connlimit
expire_at
and expire_in
optionspgbouncer
option for user is now false
by defaultYou can create new users on running postgres clusters with pgsql-createuser.yml
playbook.
pg_users
)pg_user
to playbook../pgsql-createuser.yml -e pg_user=<your_new_user_name>
Merge Fix name of dashboard #1, Fix PG Overview Dashboard typo
Fix default primary instance to pg-test-1
of cluster pg-test
in sandbox environment
Fix obsolete comments
Monitoring Provisioning Enhancement
Haproxy Enhancement
Security Enhancement
Software Update
Prometheus 2.25 / Grafana 7.4 / Consul 1.9.3 / Node Exporter 1.1 / PG Exporter 0.3.2
New Config Entries
service_registry: consul # none | consul | etcd | both
prometheus_options: '--storage.tsdb.retention=30d' # prometheus cli opts
prometheus_sd_method: consul # Prometheus service discovery method:static|consul
prometheus_sd_interval: 2s # Prometheus service discovery refresh interval
pg_offline_query: false # set to true to allow offline queries on this instance
node_exporter_enabled: true # enabling Node Exporter
pg_exporter_enabled: true # enabling PG Exporter
pgbouncer_exporter_enabled: true # enabling Pgbouncer Exporter
export_binary_install: false # install Node/PG Exporter via copy binary
dcs_disable_purge: false # force dcs_exists_action = abort to avoid dcs purge
pg_disable_purge: false # force pg_exists_action = abort to avoid pg purge
haproxy_weight: 100 # relative lb weight for backend instance
haproxy_weight_fallback: 1 # primary server weight in replica service group
Obsolete Config Entries
prometheus_metrics_path # duplicate with exporter_metrics_path
prometheus_retention # covered by `prometheus_options`
Pigsty now have an Official Site 🎉 !
pg-cluster-replication
to default dashboardsNow you can customize your database content with pigsty !
pg_users:
- username: test
password: test
comment: default test user
groups: [ dbrole_readwrite ] # dborole_admin|dbrole_readwrite|dbrole_readonly
pg_databases: # create a business database 'test'
- name: test
extensions: [{name: postgis}] # create extra extension postgis
parameters: # overwrite database meta's default search_path
search_path: public,monitor
pg-init-template.sql wil be used as default template1 database init script pg-init-business.sql will be used as default business database init script
you can customize default role system, schemas, extensions, privileges with variables now:
# - system roles - #
pg_replication_username: replicator # system replication user
pg_replication_password: DBUser.Replicator # system replication password
pg_monitor_username: dbuser_monitor # system monitor user
pg_monitor_password: DBUser.Monitor # system monitor password
pg_admin_username: dbuser_admin # system admin user
pg_admin_password: DBUser.Admin # system admin password
# - default roles - #
pg_default_roles:
- username: dbrole_readonly # sample user:
options: NOLOGIN # role can not login
comment: role for readonly access # comment string
- username: dbrole_readwrite # sample user: one object for each user
options: NOLOGIN
comment: role for read-write access
groups: [ dbrole_readonly ] # read-write includes read-only access
- username: dbrole_admin # sample user: one object for each user
options: NOLOGIN BYPASSRLS # admin can bypass row level security
comment: role for object creation
groups: [dbrole_readwrite,pg_monitor,pg_signal_backend]
# NOTE: replicator, monitor, admin password are overwritten by separated config entry
- username: postgres # reset dbsu password to NULL (if dbsu is not postgres)
options: SUPERUSER LOGIN
comment: system superuser
- username: replicator
options: REPLICATION LOGIN
groups: [pg_monitor, dbrole_readonly]
comment: system replicator
- username: dbuser_monitor
options: LOGIN CONNECTION LIMIT 10
comment: system monitor user
groups: [pg_monitor, dbrole_readonly]
- username: dbuser_admin
options: LOGIN BYPASSRLS
comment: system admin user
groups: [dbrole_admin]
- username: dbuser_stats
password: DBUser.Stats
options: LOGIN
comment: business read-only user for statistics
groups: [dbrole_readonly]
# object created by dbsu and admin will have their privileges properly set
pg_default_privilegs:
- GRANT USAGE ON SCHEMAS TO dbrole_readonly
- GRANT SELECT ON TABLES TO dbrole_readonly
- GRANT SELECT ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
- GRANT CREATE ON SCHEMAS TO dbrole_admin
- GRANT USAGE ON TYPES TO dbrole_admin
# schemas
pg_default_schemas: [monitor]
# extension
pg_default_extensions:
- { name: 'pg_stat_statements', schema: 'monitor' }
- { name: 'pgstattuple', schema: 'monitor' }
- { name: 'pg_qualstats', schema: 'monitor' }
- { name: 'pg_buffercache', schema: 'monitor' }
- { name: 'pageinspect', schema: 'monitor' }
- { name: 'pg_prewarm', schema: 'monitor' }
- { name: 'pg_visibility', schema: 'monitor' }
- { name: 'pg_freespacemap', schema: 'monitor' }
- { name: 'pg_repack', schema: 'monitor' }
- name: postgres_fdw
- name: file_fdw
- name: btree_gist
- name: btree_gin
- name: pg_trgm
- name: intagg
- name: intarray
# postgres host-based authentication rules
pg_hba_rules:
- title: allow meta node password access
role: common
rules:
- host all all 10.10.10.10/32 md5
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
- title: allow intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
- title: allow local read-write access (local production user via pgbouncer)
role: common
rules:
- local all +dbrole_readwrite md5
- host all +dbrole_readwrite 127.0.0.1/32 md5
- title: allow read-only user (stats, personal) password directly access
role: replica
rules:
- local all +dbrole_readonly md5
- host all +dbrole_readonly 127.0.0.1/32 md5
pg_hba_rules_extra: []
# pgbouncer host-based authentication rules
pgbouncer_hba_rules:
- title: local password access
role: common
rules:
- local all all md5
- host all all 127.0.0.1/32 md5
- title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
pgbouncer_hba_rules_extra: []
The second public beta (v0.4.0) of pigsty is available now ! 🎉
Skim version of monitoring system consist of 10 essential dashboards:
pg_exporter 0.3.1
The first public beta (v0.3.0) of pigsty is available now ! 🎉
Skim version of monitoring system consist of 8 essential dashboards:
conf/all.yml
by defaultinfra.yml
to provision meta node(s) and infrastructureinitdb.yml
to provision database clustersins-add.yml
to add new instance to database clusterins-del.yml
to remove instance from database clusterFor stateless application services, the container is a perfect development and operation solution. However, for a service with a persistent state - the database, things are not that simple. As a developer, I really like Docker and believe that Docker and Kubernetes are the standard way to deploy and deploy software for future development. But as a DBA, I think the database in the container is a nightmare for operation and maintenance. ** Whether the database of the production environment should be placed in the container is still a controversial issue. But the truth is always more and more clarified. Today I will talk to you about why it is a bad idea to put the production database into the container.
But the truth is always getting more clear with more debat and more practice. In this document, I will show you why it is a bad idea to putting production database into docker.
Let’s get start by looking at Docker’s self-description:
Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.
The words Docker uses to describe itself include: lightweight, standardized, portable, cost savings, increased efficiency, automatic, integrated, and efficient operation and maintenance. So much benifits, and there is no problem claiming that., Docker does make development and operations much easier in most cases. So we can see that’s why so many companies are so eager to Dockerize their software and services.
But sometimes this enthusiasm goes to the other extreme side: Containerize all software, EVEN A PRODUCTION DATABASE. Containers are originally designed for stateless applications, and temporary data generated by applications within a container is also considering part of that container. Create a service with a container and destroy it after you run out. That is the typical usecase.
These apps themselves have no state, and the state is usually stored in a database outside the container, which is the classic architecture and usage, and the philosophy of docker. But when we put the database inside the container, things are different. Database is stateful service, and in order to maintain this state regardless container runs and leave, the DB container needs to drill a hole to the underlying operating system, which is named data volume.
Such a container is no longer an object that can be created, destroyed, transported, transferred at will, but it is bound to the underlying environment, so many advantages of typical container use case no longer apply to database containers.
Getting the software up & run and making the software run reliably are two different things.
Database is the core of almost all information systems. It is a CRITICAL service among whole IT systems. Here CRITICAL is literally explained: DEAD WHEN FAILURE HAPPENS. If application is down, you can pull it up later. But if your database is scraped and don’t have a backup, then it is dead for good.
This is similiar to our daily software circumstance: Word/Excel/PPT can corrupt and it is not a big deal to pulling them up. But if your critical document corrupted or lost, it is really a mess. Database is similiar for many companies: if the database is deleted and there is no backup, it may down. Reliability is the most important attribute of a database. Reliability (reliability) is the ability of the system to function properly in adversity (Hardware failure, software failure, human error) (completly functional correctly and achieve the desired level of performance). Note that reliability differs from availability. Reliability means fault tolerance AND Resilient. Availability can usually be measured by a serveral nines, a percentage representing the expectation ratio of the system uptime. Reliability is hard to mearsure, it can only be proved by continuously running, or falsify by failure. Therefore, reliability is a safty property and is not that intuitive or measurable as performance and maintainability.
Safty matters, many people tend to ignore the most important attribute —— safety. They only aware of that when people get killed, get robbed, get sickness, get car accident, get database dropped, etc…. Only after that, people would feel regret.
So, Docker’s self-description does not contain words like “reliable”, which is the most important attribute for database. Docker do know what it is capable of.
When comparing with bare metal, Putting a database inside docker does not reduce the probability of hardware failures, software errors, or human errors. Instead, **the overall reliability of the system decreases ** due to additional components, additional complexity, and additional failure points. The database container needs to be bound to a specific machine via data-volumn, so there hardware failure remains the same.
Standardized deployment may seems great by the first look. but there is no essential difference between scripts and dockerfile. At least for software bugs. It is mainly becauce of poor application design, which is a problem docker can not help with. So does human errors.
Additional components will introduce additional failure points, and the introduction of docker will not only involve us into problems with docker itself, but also the conjunct point between docker and other existing component. So, when failure occurs, it may be a problem with the database, a problem with the docker, a problem of the interaction between database & docker, or a problem because of interaction between docker & OS.
Take a look at the official PostgreSQL Docker image Issue list: Https://github.com/docker-library/postgres/issues. You can find a long list there. There are tons of additional problems when putting database into docker. Let me give you the simplest example: What will the database be if the dockerd daemon collapses? It will definitly break and out of service. Another subtle example is running two PostgreSQL instances on the same data directory, (2 docker on same volume, or 1 inside 1 outside). You data will be toasted without proper fencing, and these are problems that never gonna happen on bare metal.
As mentioned earlier, reliability does not a good way to measure. The only reliable is prove itself by contiously running correctly for a long time. Deploy database on bare metal is the traditional way of doing things, it has been proved by continuous work for serveral decades. Docker is a revolution to DevOps, but it is still too young , five years old is still much too short for critical things like procdution database. No enough lab rat yet.
In addition to long-term running, There is another way to “increase” reliability, which is failure. Failure is very valuable experience, it turns uncertainties into certainties, turns unknown problems into known problems. Failure experience is the most valueable part of operators. It is the form of operation knowledge, and it is the way for community to accumulate knowledge. For PostgreSQL, most of the community experience is based on the assumption of bare metal deployment, Variant failure has been explored by many people for decades. If you encounter some db problems. You are very likely to find similar situation other community members already been through, and find corresponding solutions. But if you search the same problem with additional keyword ‘Docker’, you would find a lot less useful information. Which means when something nasty happens, the probability of successfully rescuing the data is much lower, and the time required to resume would be much longer
Another subtle thing is, Companies and individuals are reluctant to share these failure experience if there are not special reasons. For companies, failure report may undermines company’s reputation, it may expose sensitive information or may expose how rubbish the infurstructure is. For individuals, the fault experience is almost the most important part of their values. Once shared, their value undermined. Ops/DBA is not that open than Dev. That is the very reason why docker kubernate operator exist: trying to make operation experience codify and able to accumulate. But that is really naive by now. Since few people would like to share these. You can find rubbish every where. Like the official PostgreSQL Docker image, it lack tons of tooling & tunning & settings to work effeciently like a real-world database.
Database requires lots of tools to maintain, including: operations scripts, deployments, backups, archives, Failover, Major/Minor version upgrades, plugin installation, connection pooling, performance analysis, monitoring, tuning, inspection, repairing, etc. Most of these tools are also designed for bare metal deployments. These tooling are critical too, without adequate testing, we can’t really put that into use. Makes a thing up & running, and make things running steady for a long-time is complete different level of reliability.
Plugins is the typical example. PostgreSQL have lots of useful plugins, such as PostGIS. If you want to install the plugin to database, All you need is just typing yum install
' and then create extension postgis
on the bare metal. But doing it the Docker way, you need to modify the Dockerfile, build a new image, push it to the server, and then Restart the database container. No doubt that is much more complicated.
Similar problems including some CLI tools and system commands. They can preform on host in theory, but you can’t asure the execution & result have exact same meanning. And when emergency situation happens and you need some tools that doesn’t included in container, and you don’t have Internet access or yum repository. You would have to go through Dockerfile → Build Image → Restart Container. That is really insane.
When refer to monitoring, docker makes things harder. There are many subtle differences between monitoring in containers and monitoring on bare metal. For example, on bare metal, the sum of different modes of the CPU time and will always be 100%, but such assumptions do not always apply inside the container. In traditional bare metals, Node level metrics are important part of database indicators. it make monitoring a lot worse when database container is mixing deployed with application. Of course, if you using docker in a VM’s manner, many things still likely to work, but in that way we will loose the real value of using Docker.
Performance is another point that people concerned a lot. From the performance perspective, the basic principle of database deployment is: The close to hardware, The better it is. Additional isolation & abstraction layer is bad for database performance. More isolation means more overhead, even if it is just an additional memcpy in the kernel .
For performance-seeking scenarios, some databases choose to bypass the operating system’s page management mechanism to operate the disk directly, while some databases may even use FPGA or GPU to speed up query processing. Docker as a lightweight container, performance suffers not much, and the impact to performance-insensitive scenarios may not be significant. But the extra abstract layer will definitely make performance worse than make it better.
Docker provides process-level isolation. Database values isolation, but not this kind of isolation. Database performance is critical, so the typical deployment is take a whole physical machine exclusivly. with some necessary tools in addition. there will be no other applications. Even when using docker, we’d give it a whole physical machine.
Therefore, the isolation provided by Docker is useful for multi-tenant oversold by Cloud database vendors. But for other cases, it does not make much sense for deploying database.
For stateless applications, using containers makes scale out incredibly simple, and it doesn’t matter which node you can schedule at will. But this does’t apply to database or some stateful application, you can not create or destory database instances freely as appserver: for example, to create a new replica, you have to pull it from primary whether you are using docker or not. It may take serval hours to copy serval TB data in production. And this still require manual intervention & inspection & verification. So what is the essence difference between running a ready-made make-replica
script and running docker run
. Time are spending on making new replicas.
Most of the software cost spending on operation phase rather than development phase: fixing vulnerabilities, keeping the system up and running, handling failures, version upgrades, migration, repaying tech debt, etc… Maintainability is very important for the quality of work & life of operators . That is the most pleasing part of Docker: Infrastructure as code. We can say that docker’s greatest value lies in its ability to deposit operational experience of software into reusable code, accumulating it in an easy way, rather than having a brunch of install/setup
document & scripts scattered across everywhere. From this perspective, I think docker has done a great job, especially for stateless applications where logic is constantly changing. Docker and kubernates allow us to easily deploy, complete expansion, shrinkage, release, rolling upgrades, and so on, so that Dev can also be able to work as an OPS, so that OPS can also be able to DBA life (plausible).
But can these conclusion be applied to database? Once initialized, database does not require frequent environment changes. It may continuously running years without big change. DBAs typically accumulate a lot of maintenance scripts, the one-key configuration environment isn’t much slower than the Docker way, and the number of environments that need to be configured and initialized is relatively small, so the convenience of the container in terms of environmental configuration does not have a significant advantage. For daily operations, it is not possible for a database container to destroy creation and restart the migration as freely as the application container. Many operations need to be performed through the docker exec
approach: In fact, they may still running the exact same script, but the steps has become much more cumbersome.
Docker prefer to say things like it is easy to upgrade software with docker. It is true for minor version: simply modifying the version in the Dockerfile and rebuild the image, then restarting the database container. However, when we need a major version upgrade, this is the way to do binary upgrade in docker: Https://github.com/tianon/docker-postgres-upgrade, and I can archieve that in serval lines of bash scripts.
And, it take more effort to use some existing tools with docker exec. For example, docker exec
will mix the stdin
and stderr
, Which makes a lot of tools that rely on pipe does not work anymore. For example, if you want perform an ETL to transfer a table to another instance, in traditional way:
psql <src-url> -c 'COPY tbl TO STDOUT' |\
psql <dst-url> -c 'COPY tdb FROM STDIN'
with docker, things are more complicated
docker exec -it srcpg gosu postgres bash -c "psql -c \"COPY tbl TO STDOUT\" 2>/dev/null" |\ docker exec -i dstpg gosu postgres psql -c 'COPY tbl FROM STDIN;'
And if you want to take a basebackup from postgres inside container, and does not install PostgreSQL on host machine, you would have to run this command with a lot of extra wrapper:
docker exec -i postgres_pg_1 gosu postgres bash -c 'pg_basebackup -Xf -Ft -c fast -D - 2>/dev/null' | tar -xC /tmp/backup/basebackup
In fact, it is not Docker that elevates the daily operations experience, but the tools such as ansible
. Containers may be faster in building a database environment, but such tasks are very rare. Thus, if the database container cannot be dispatched as freely as appserver, scales quickly, and does not bring more convenience to the initial setup, daily operations, and emergency troubleshooting than ordinary scripting, why should we putting the production database into docker?
I think maybe it’s because a rough image solution would still be better than setup blindly without DBA. Container technology and orchestration technology is very valuable for operation and maintenance, it actually fills the gap between software and service. Its vision is to modularize the experience and ability of operation and maintenance. Docker & kubernates would become the standard way of package management, and orchestration in the further. And envolv into something like “DataCenter DistributedCluster OperatingSystem”, and become the underlying infrastructure of all software, became the universal runtime. After those major uncertainly been elimiated, we can then put our application & valuable database inside that. But for now, at least for the production database, it’s just a good vision.
Finally, I must emphasized that the above discussion is only limited to the production database . That is to say, for db in development env, or application in production env, I am also very supportive of using docker. But when refer to production databases. if this data is really important, the we should ask ourselves three questions before come into it:
Any technical decision is some sort of trade-off, putting a production database into a container, the critical trade-off is **sacrificing reliability in exchange for maintainability **.
There are some scenarios where data reliability is not so important, or there are other considerations: for cloud service vendors, for example, it’s a great scenario for putting database inside docker. Container isolation, high resource utilization, and management convenience fit all requirement in that scenario. But for most cases, reliability has the highest priority, sacrificing reliability in exchange for maintainability is not advisable.
This is a PostgreSQL convention I wrote for previous company. Could be a useful reference when dealing with database. For now it’s just an index. detailed information will be traslated when got time.
Nothing can be accomplished without norms or standards.
PostgreSQL is a powerful database. But to make it powerful all the time, it is a team work of Dev, Ops & DBA.
Naming is beginning of everthing.
[REQUIRED] Common naming rule
[REQUIRED] Database naming rule
[REQUIRED] Role naming rule
[REQUIRED] Schema naming rule
[OPTIONAL] Table naming rule
[OPTIONAL] Index naming rule
[OPTIONAL] Function naming rule
[OPTIONAL] Column naming rule
[OPTIONAL] Variable and Parameter naming rule
Suum cuique
[REQUIRED] Character Encoding must be UTF-8
[REQUIRED] Capacity Planning
[REQUIRED] Do not abuse stroed procedure
[REQUIRED] Separation of storage and calculation
[REQUIRED] Primary key and IDENTITY
[REQUIRED] Beware of foreign key
[REQUIRED] Beware of trigger
[REQUIRED] Avoid wide tables
[REQUIRED] Add default value to column
[REQUIRED] Handle nullable with caution
[REQUIRED] Unique constraint should be forced by database
[REQUIRED] Beware of integer overflow
[REQUIRED] Use Timestamp without timezone, force timezone to UTC
[REQUIRED] DROP obsolete function in time
[OPTIONAL] Data Type of primary key
[OPTIONAL] Use proper data types
[OPTIONAL] Use ENUM for stable and small valuespace fields
[OPTIONAL] Choose right text types.
[OPTIONAL] Choose right numeric types
[OPTIONAL] Function format
[OPTIONAL] Design for evolvability
[OPTIONAL] Choose right norm level
[OPTIONAL] Embrace new database version
[OPTIONAL] Use radical feature with caution
[OPTIONAL] Choose right norm level
Wer Ordnung hält, ist nur zu faul zum Suchen.
[REQUIRED] OLTP queries must have corresponding index
[REQUIRED] Never build index on wide field
[REQUIRED] Explicit with null ordering
[REQUIRED] Handle KNN problem with GiST index
[OPTIONAL] Make use of function index
[OPTIONAL] Make use of partial index
[OPTIONAL] Make use of BRIN index
[OPTIONAL] Beware of selectivity
The limits of my language mean the limits of my world.
[REQUIRED] Separation of read and write
[REQUIRED] Separation of fast and slow
[REQUIRED] Set timeout for queries
[REQUIRED] Beware of replication lag
[REQUIRED] Use connection pooling
[REQUIRED] Changing connection state is forbidden
[REQUIRED] Have retry mechanism for aborted transaction
[REQUIRED] Have reconnecting mechanism
[REQUIRED] Execute DDL in production application code is forbidden
[REQUIRED] Using explicit schema names
[REQUIRED] Using explicit table names when involve join
[REQUIRED] Have reconnecting mechanism
[REQUIRED] Full table scan is forbidden in OLTP systems
[REQUIRED] Idle in Transaction for a long time is forbidden
[REQUIRED] Close cursors
[REQUIRED] Beware of NULL
[REQUIRED] Beware of null input on aggragation
[REQUIRED] Beware of hole in serial space
[OPTIONAL] Use prepared statement for repeat queries
[OPTIONAL] Use right isolation levels
[OPTIONAL] Do not tell existance by count
[OPTIONAL] Use returning clause
[OPTIONAL] Use upsert
[OPTIONAL] Use advisory locks to avoid contention
[OPTIONAL] Optimize IN operator
[OPTIONAL] Do not use left fuzzy search
[OPTIONAL] Use array instead of temporary table
[REQUIRED] Follow the deploy procdure
[REQUIRED] Deploy request format
[REQUIRED] Deployment review rules
[REQUIRED] Deployment time window
[REQUIRED] Take care of backups
[REQUIRED] Take care of ages
[REQUIRED] Take care of bloats
[REQUIRED] Take care of replication lags
[REQUIRED] Take care of resource consumption
[REQUIRED] Minimal privilege princple
[REQUIRED] CREATE INDEX CONCURRENTLY
[REQUIRED] Warmming before taking real traffic
[REQUIRED] Doing schema migration with caution
[OPTIONAL] Split batch operation
[OPTIONAL] Speed up bulk load