This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Pigsty Blog

News

Release Note

v0.9.0 Release Note
v0.8.0 Release Note
v0.7.0 Release Note
v0.6.0 Release Note
v0.5.0 Release Note
v0.4.0 Release Note
v0.3.0 Release Note

Post

Database in Docker
Postgres Development Convention

This is the blog section. It has two categories: News and Releases.

Files in these directories will be listed in reverse chronological order.

News

Release Note

v0.9.0 Release Note

v0.9 accessbility enhancement, logging improvement, pigsty CLI/GUI beta

By Ruohang Feng（@Vonng） | 2021-05-05

Pigsty v0.9.0

Features

One-Line Installation

Run this on meta node /bin/bash -c "$(curl -fsSL https://pigsty.cc/install)"
MetaDB provisioning

Now you can use pgsql database on meta node as inventory instead of static yaml file affter bootstrap.
Add Loki & Prometail as optinal logging collector

Now you can view, query, search postgres|pgbouncer|patroni logs with Grafana UI (PG Instance Log)
Pigsty CLI/GUI (beta)

Mange you pigsty deployment with much more human-friendly command line interface.

Bug Fix

Log related issues
- fix connection reset by peer entries in postgres log caused by Haproxy health check.
- fix Connect Reset Exception in patroni logs caused by haproxy health check
- fix patroni log time format (remove mill seconds, add timezone)
- set log_min_duration_statement=1s for dbuser_monitor to get ride of monitor logs.
Fix pgbouncer-create-user does not handle md5 password properly
Fix obsolete Makefile entries
Fix node dns nameserver lost when abort during resolv.conf rewrite
Fix db/user template and entry not null check

API Change

Set default value of node_disable_swap to false
Remove example enties of node_sysctl_params.
grafana_plugin default install will now download from CDN if plugins not exists
repo_url_packages now download rpm via pigsty CDN to accelerate.
proxy_env.no_proxy now add pigsty CDN to noproxy sites。
grafana_customize set to false by default，enable it means install pigsty pro UI.
node_admin_pk_current add current user’s ~/.ssh/id_rsa.pub to admin pks
loki_clean whether to cleanup existing loki data during init
loki_data_dir set default data dir for loki logging service
promtail_enabled enabling promtail logging agent service?
promtail_clean remove existing promtail status during init?
promtail_port default port used by promtail, 9080 by default
promtail_status_file location of promtail status file
promtail_send_url endpoint of loki service which receives log data

v0.8.0 Release Note

v0.8 service provision overhaul and general RC

By Ruohang Feng（@Vonng） | 2021-03-16

Pigsty v0.8.0

Pigsty now is in RC status with guaranteed API stability.

New Features

Service provision.
full locale support.

API Changes

Role vip and haproxy are merged into service.

#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100              # default load balance weight (instance level)

# - service - #
pg_services:                                  # how to expose postgres service in cluster?
  # primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
  - name: primary           # service name {{ pg_cluster }}_primary
    src_ip: "*"
    src_port: 5433
    dst_port: pgbouncer     # 5433 route to pgbouncer
    check_url: /primary     # primary health check, success when instance is primary
    selector: "[]"          # select all instance as primary service candidate

  # replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
  - name: replica           # service name {{ pg_cluster }}_replica
    src_ip: "*"
    src_port: 5434
    dst_port: pgbouncer
    check_url: /read-only   # read-only health check. (including primary)
    selector: "[]"          # select all instance as replica service candidate
    selector_backup: "[? pg_role == `primary`]"   # primary are used as backup server in replica service

  # default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
  - name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
    src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
    src_port: 5436          # bind port, mandatory
    dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
    check_method: http      # health check method: only http is available for now
    check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
    check_url: /primary     # health check url path, / as default
    check_code: 200         # health check http code, 200 as default
    selector: "[]"          # instance selector
    haproxy:                # haproxy specific fields
      maxconn: 3000         # default front-end connection
      balance: roundrobin   # load balance algorithm (roundrobin by default)
      default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

  # offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
  - name: offline           # service name {{ pg_cluster }}_replica
    src_ip: "*"
    src_port: 5438
    dst_port: postgres
    check_url: /replica     # offline MUST be a replica
    selector: "[? pg_role == `offline` || pg_offline_query ]"         # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
    selector_backup: "[? pg_role == `replica` && !pg_offline_query]"  # replica are used as backup server in offline service

pg_services_extra: []        # extra services to be added

# - haproxy - #
haproxy_enabled: true                         # enable haproxy among every cluster members
haproxy_reload: true                          # reload haproxy after config
haproxy_policy: roundrobin                    # roundrobin, leastconn
haproxy_admin_auth_enabled: false             # enable authentication for haproxy admin?
haproxy_admin_username: admin                 # default haproxy admin username
haproxy_admin_password: admin                 # default haproxy admin password
haproxy_exporter_port: 9101                   # default admin/exporter port
haproxy_client_timeout: 3h                    # client side connection timeout
haproxy_server_timeout: 3h                    # server side connection timeout

# - vip - #
vip_mode: none                                # none | l2 | l4
vip_reload: true                              # whether reload service after config
# vip_address: 127.0.0.1                      # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24                            # virtual ip address cidr mask (l2 only)
# vip_interface: eth0                         # virtual ip network interface (l2 only)

New Options

# - localization - #
pg_encoding: UTF8                             # default to UTF8
pg_locale: C                                  # default to C
pg_lc_collate: C                              # default to C
pg_lc_ctype: en_US.UTF8                       # default to en_US.UTF8

pg_reload: true                               # reload postgres after hba changes
vip_mode: none                                # none | l2 | l4
vip_reload: true                              # whether reload service after config

Remove Options

haproxy_check_port                            # covered by service options
haproxy_primary_port
haproxy_replica_port
haproxy_backend_port
haproxy_weight
haproxy_weight_fallback
vip_enabled                                   # replace by vip_mode

Service

pg_services and pg_services_extra Defines the services in cluster:

A service has some mandatory fields:

name: service’s name
src_port: which port to listen and expose service?
selector: which instances belonging to this service?

  # default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
  - name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
    src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
    src_port: 5436          # bind port, mandatory
    dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
    check_method: http      # health check method: only http is available for now
    check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
    check_url: /primary     # health check url path, / as default
    check_code: 200         # health check http code, 200 as default
    selector: "[]"          # instance selector
    haproxy:                # haproxy specific fields
      maxconn: 3000         # default front-end connection
      balance: roundrobin   # load balance algorithm (roundrobin by default)
      default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

Database

Add additional locale support: lc_ctype and lc_collate.

It’s mainly because of pg_trgm ’s weird behavior on i18n characters.

pg_databases:
  - name: meta                      # name is the only required field for a database
    # owner: postgres                 # optional, database owner
    # template: template1             # optional, template1 by default
    # encoding: UTF8                # optional, UTF8 by default , must same as template database, leave blank to set to db default
    # locale: C                     # optional, C by default , must same as template database, leave blank to set to db default
    # lc_collate: C                 # optional, C by default , must same as template database, leave blank to set to db default
    # lc_ctype: C                   # optional, C by default , must same as template database, leave blank to set to db default
    allowconn: true                 # optional, true by default, false disable connect at all
    revokeconn: false               # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
    # tablespace: pg_default          # optional, 'pg_default' is the default tablespace
    connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
    extensions:                     # optional, extension name and where to create
      - {name: postgis, schema: public}
    parameters:                     # optional, extra parameters with ALTER DATABASE
      enable_partitionwise_join: true
    pgbouncer: true                 # optional, add this database to pgbouncer list? true by default
    comment: pigsty meta database   # optional, comment string for database

v0.7.0 Release Note

v0.7 Monitor Only Deployment

By Ruohang Feng（@Vonng） | 2021-03-01

Overview

Monitor Only Deployment
- Now you can monitoring existing postgres clusters without Pigsty provisioning solution.
- Intergration with other provisioning solution is available and under further test.
Database/User Management
- Update user/database definition schema to cover more usecases.
- Add pgsql-createdb.yml and pgsql-createuser.yml to mange user/db on running clusters.

Features

Bug Fix

API Changes

New Options

prometheus_sd_target: batch                   # batch|single
exporter_install: none                        # none|yum|binary
exporter_repo_url: ''                         # add to yum repo if set
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'                          # default opts for node_exporter
pg_exporter_url: ''                           # optional, overwrite default pg_exporter target
pgbouncer_exporter_url: ''                    # optional, overwrite default pgbouncer_expoter target

Remove Options

exporter_binary_install: false                 # covered by exporter_install

Structure Changes

pg_default_roles                               # refer to pg_users
pg_users                                       # refer to pg_users
pg_databases                                   # refer to pg_databases

Rename Options

pg_default_privilegs -> pg_default_privileges  # fix typo

Enhancement

Monitoring Provisioning Enhancement

Haproxy Enhancement

Security Enhancement

Software Update

Upgrade to PG 13.2 #6
Prometheus 2.25 / Grafana 7.4 / Consul 1.9.3 / Node Exporter 1.1 / PG Exporter 0.3.2

API Change

New Config Entries

service_registry: consul                      # none | consul | etcd | both
prometheus_options: '--storage.tsdb.retention=30d'  # prometheus cli opts
prometheus_sd_method: consul                  # Prometheus service discovery method：static|consul
prometheus_sd_interval: 2s                    # Prometheus service discovery refresh interval
pg_offline_query: false                       # set to true to allow offline queries on this instance
node_exporter_enabled: true                   # enabling Node Exporter
pg_exporter_enabled: true                     # enabling PG Exporter
pgbouncer_exporter_enabled: true              # enabling Pgbouncer Exporter
export_binary_install: false                  # install Node/PG Exporter via copy binary
dcs_disable_purge: false                      # force dcs_exists_action = abort to avoid dcs purge
pg_disable_purge: false                       # force pg_exists_action = abort to avoid pg purge
haproxy_weight: 100                           # relative lb weight for backend instance
haproxy_weight_fallback: 1                    # primary server weight in replica service group

Obsolete Config Entries

prometheus_metrics_path                       # duplicate with exporter_metrics_path 
prometheus_retention                          # covered by `prometheus_options`

Database Definition

Database provisioning interface enhancement #33

Old Schema

pg_databases:                       # create a business database 'meta'
  - name: meta
    schemas: [meta]                 # create extra schema named 'meta'
    extensions: [{name: postgis}]   # create extra extension postgis
    parameters:                     # overwrite database meta's default search_path
      search_path: public, monitor

New Schema

pg_databases:
  - name: meta                      # name is the only required field for a database
    owner: postgres                 # optional, database owner
    template: template1             # optional, template1 by default
    encoding: UTF8                  # optional, UTF8 by default
    locale: C                       # optional, C by default
    allowconn: true                 # optional, true by default, false disable connect at all
    revokeconn: false               # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
    tablespace: pg_default          # optional, 'pg_default' is the default tablespace
    connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
    extensions:                     # optional, extension name and where to create
      - {name: postgis, schema: public}
    parameters:                     # optional, extra parameters with ALTER DATABASE
      enable_partitionwise_join: true
    pgbouncer: true                 # optional, add this database to pgbouncer list? true by default
    comment: pigsty meta database   # optional, comment string for database

Changes

Add new options: template , encoding, locale, allowconn, tablespace, connlimit
Add new option revokeconn, which revoke connect privileges from public for this database
Add comment field for database

Apply Changes

You can create new database on running postgres clusters with pgsql-createdb.yml playbook.

Define your new database in config files
Pass new database.name with option pg_database to playbook.

./pgsql-createdb.yml -e pg_database=<your_new_database_name>

User Definition

User provisioning interface enhancement #34

Old Schema

pg_users:
  - username: test                  # example production user have read-write access
    password: test                  # example user's password
    options: LOGIN                  # extra options
    groups: [ dbrole_readwrite ]    # dborole_admin|dbrole_readwrite|dbrole_readonly
    comment: default test user for production usage
    pgbouncer: true                 # add to pgbouncer

New Schema

pg_users:
  # complete example of user/role definition for production user
  - name: dbuser_meta               # example production user have read-write access
    password: DBUser.Meta           # example user's password, can be encrypted
    login: true                     # can login, true by default (should be false for role)
    superuser: false                # is superuser? false by default
    createdb: false                 # can create database? false by default
    createrole: false               # can create role? false by default
    inherit: true                   # can this role use inherited privileges?
    replication: false              # can this role do replication? false by default
    bypassrls: false                # can this role bypass row level security? false by default
    connlimit: -1                   # connection limit, -1 disable limit
    expire_at: '2030-12-31'         # 'timestamp' when this role is expired
    expire_in: 365                  # now + n days when this role is expired (OVERWRITE expire_at)
    roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly
    pgbouncer: true                 # add this user to pgbouncer? false by default (true for production user)
    parameters:                     # user's default search path
      search_path: public
    comment: test user

Changes

username field rename to name
groups field rename to roles
options now split into separated configration entries: login, superuser, createdb, createrole, inherit, replication,bypassrls,connlimit
expire_at and expire_in options
pgbouncer option for user is now false by default

Apply Changes

You can create new users on running postgres clusters with pgsql-createuser.yml playbook.

Define your new users in config files (pg_users)
Pass new user.name with option pg_user to playbook.

./pgsql-createuser.yml -e pg_user=<your_new_user_name>

v0.6.0 Release Note

v0.6 Provisioning Enhancement

By Ruohang Feng（@Vonng） | 2021-02-19

Bug Fix

pg_hba reset on patroni restart (patroni 2.0) #5
Merge Fix name of dashboard #1, Fix PG Overview Dashboard typo
Fix default primary instance to pg-test-1 of cluster pg-test in sandbox environment
Fix obsolete comments

Enhancement

Monitoring Provisioning Enhancement

Haproxy Enhancement

Security Enhancement

Software Update

Upgrade to PG 13.2 #6
Prometheus 2.25 / Grafana 7.4 / Consul 1.9.3 / Node Exporter 1.1 / PG Exporter 0.3.2

API Change

New Config Entries

service_registry: consul                      # none | consul | etcd | both
prometheus_options: '--storage.tsdb.retention=30d'  # prometheus cli opts
prometheus_sd_method: consul                  # Prometheus service discovery method：static|consul
prometheus_sd_interval: 2s                    # Prometheus service discovery refresh interval
pg_offline_query: false                       # set to true to allow offline queries on this instance
node_exporter_enabled: true                   # enabling Node Exporter
pg_exporter_enabled: true                     # enabling PG Exporter
pgbouncer_exporter_enabled: true              # enabling Pgbouncer Exporter
export_binary_install: false                  # install Node/PG Exporter via copy binary
dcs_disable_purge: false                      # force dcs_exists_action = abort to avoid dcs purge
pg_disable_purge: false                       # force pg_exists_action = abort to avoid pg purge
haproxy_weight: 100                           # relative lb weight for backend instance
haproxy_weight_fallback: 1                    # primary server weight in replica service group

Obsolete Config Entries

prometheus_metrics_path                       # duplicate with exporter_metrics_path 
prometheus_retention                          # covered by `prometheus_options`

v0.5.0 Release Note

Significant improvement on database templating

By Ruohang Feng（@Vonng） | 2020-12-20

Pigsty now have an Official Site 🎉 !

New Features

Add Database Provision Template
Add Init Template
Add Business Init Template
Refactor HBA Rules variables
Fix dashboards bugs.
Move pg-cluster-replication to default dashboards
Use ZJU PostgreSQL mirror as default to accelerate repo build phase.
Move documentation to official site: https://pigsty.cc
Download newly created offline installation packages: pkg.tgz (v0.5)

Database Provision Template

Now you can customize your database content with pigsty !

pg_users:
  - username: test
    password: test
    comment: default test user
    groups: [ dbrole_readwrite ]    # dborole_admin|dbrole_readwrite|dbrole_readonly
pg_databases:                       # create a business database 'test'
  - name: test
    extensions: [{name: postgis}]   # create extra extension postgis
    parameters:                     # overwrite database meta's default search_path
      search_path: public,monitor

pg-init-template.sql wil be used as default template1 database init script pg-init-business.sql will be used as default business database init script

you can customize default role system, schemas, extensions, privileges with variables now:

# - system roles - #
pg_replication_username: replicator           # system replication user
pg_replication_password: DBUser.Replicator    # system replication password
pg_monitor_username: dbuser_monitor           # system monitor user
pg_monitor_password: DBUser.Monitor           # system monitor password
pg_admin_username: dbuser_admin               # system admin user
pg_admin_password: DBUser.Admin               # system admin password

# - default roles - #
pg_default_roles:
  - username: dbrole_readonly                 # sample user:
    options: NOLOGIN                          # role can not login
    comment: role for readonly access         # comment string

  - username: dbrole_readwrite                # sample user: one object for each user
    options: NOLOGIN
    comment: role for read-write access
    groups: [ dbrole_readonly ]               # read-write includes read-only access

  - username: dbrole_admin                    # sample user: one object for each user
    options: NOLOGIN BYPASSRLS                # admin can bypass row level security
    comment: role for object creation
    groups: [dbrole_readwrite,pg_monitor,pg_signal_backend]

  # NOTE: replicator, monitor, admin password are overwritten by separated config entry
  - username: postgres                        # reset dbsu password to NULL (if dbsu is not postgres)
    options: SUPERUSER LOGIN
    comment: system superuser

  - username: replicator
    options: REPLICATION LOGIN
    groups: [pg_monitor, dbrole_readonly]
    comment: system replicator

  - username: dbuser_monitor
    options: LOGIN CONNECTION LIMIT 10
    comment: system monitor user
    groups: [pg_monitor, dbrole_readonly]

  - username: dbuser_admin
    options: LOGIN BYPASSRLS
    comment: system admin user
    groups: [dbrole_admin]

  - username: dbuser_stats
    password: DBUser.Stats
    options: LOGIN
    comment: business read-only user for statistics
    groups: [dbrole_readonly]


# object created by dbsu and admin will have their privileges properly set
pg_default_privilegs:
  - GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly
  - GRANT SELECT                        ON TABLES    TO dbrole_readonly
  - GRANT SELECT                        ON SEQUENCES TO dbrole_readonly
  - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
  - GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite
  - GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite
  - GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin
  - GRANT CREATE                        ON SCHEMAS   TO dbrole_admin
  - GRANT USAGE                         ON TYPES     TO dbrole_admin

# schemas
pg_default_schemas: [monitor]

# extension
pg_default_extensions:
  - { name: 'pg_stat_statements',  schema: 'monitor' }
  - { name: 'pgstattuple',         schema: 'monitor' }
  - { name: 'pg_qualstats',        schema: 'monitor' }
  - { name: 'pg_buffercache',      schema: 'monitor' }
  - { name: 'pageinspect',         schema: 'monitor' }
  - { name: 'pg_prewarm',          schema: 'monitor' }
  - { name: 'pg_visibility',       schema: 'monitor' }
  - { name: 'pg_freespacemap',     schema: 'monitor' }
  - { name: 'pg_repack',           schema: 'monitor' }
  - name: postgres_fdw
  - name: file_fdw
  - name: btree_gist
  - name: btree_gin
  - name: pg_trgm
  - name: intagg
  - name: intarray

# postgres host-based authentication rules
pg_hba_rules:
  - title: allow meta node password access
    role: common
    rules:
      - host    all     all                         10.10.10.10/32      md5

  - title: allow intranet admin password access
    role: common
    rules:
      - host    all     +dbrole_admin               10.0.0.0/8          md5
      - host    all     +dbrole_admin               172.16.0.0/12       md5
      - host    all     +dbrole_admin               192.168.0.0/16      md5

  - title: allow intranet password access
    role: common
    rules:
      - host    all             all                 10.0.0.0/8          md5
      - host    all             all                 172.16.0.0/12       md5
      - host    all             all                 192.168.0.0/16      md5

  - title: allow local read-write access (local production user via pgbouncer)
    role: common
    rules:
      - local   all     +dbrole_readwrite                               md5
      - host    all     +dbrole_readwrite           127.0.0.1/32        md5

  - title: allow read-only user (stats, personal) password directly access
    role: replica
    rules:
      - local   all     +dbrole_readonly                               md5
      - host    all     +dbrole_readonly           127.0.0.1/32        md5
pg_hba_rules_extra: []

# pgbouncer host-based authentication rules
pgbouncer_hba_rules:
  - title: local password access
    role: common
    rules:
      - local  all          all                                     md5
      - host   all          all                     127.0.0.1/32    md5

  - title: intranet password access
    role: common
    rules:
      - host   all          all                     10.0.0.0/8      md5
      - host   all          all                     172.16.0.0/12   md5
      - host   all          all                     192.168.0.0/16  md5
pgbouncer_hba_rules_extra: []

v0.4.0 Release Note

The second public beta (v0.4.0) of pigsty is available now

By Ruohang Feng（@Vonng） | 2020-12-14

The second public beta (v0.4.0) of pigsty is available now ! 🎉

Monitoring System

Skim version of monitoring system consist of 10 essential dashboards:

PG Overview
PG Cluster
PG Service
PG Instance
PG Database
PG Query
PG Table
PG Table Catalog
PG Table Detail
Node

Software upgrade

Upgrade to PostgreSQL 13.1, Patroni 2.0.1-4, add citus to repo.
Upgrade to pg_exporter 0.3.1
Upgrade to Grafana 7.3, Ton’s of compatibility work
Upgrade to prometheus 2.23, with new UI as default
Upgrade to consul 1.9

Misc

Update prometheus alert rules
Fix alertmanager info links
Fix bugs and typos.
add a simple backup script

Offline Installation

pkg.tgz is the latest offline install package (1GB rpm packages, made under CentOS 7.8)

v0.3.0 Release Note

v0.3.0 First Public Beta now available!

By Ruohang Feng（@Vonng） | 2020-10-24

The first public beta (v0.3.0) of pigsty is available now ! 🎉

Monitoring System

Skim version of monitoring system consist of 8 essential dashboards:

PG Overview
PG Cluster
PG Service
PG Instance
PG Database
PG Table Overview
PG Table Catalog
Node

Database Cluster Provision

All config files are merged into one file: conf/all.yml by default
Use infra.yml to provision meta node(s) and infrastructure
Use initdb.yml to provision database clusters
Use ins-add.yml to add new instance to database cluster
Use ins-del.yml to remove instance from database cluster

Offline Installation

pkg.tgz is the latest offline install package (1GB rpm packages, made under CentOS 7.8)

Post

Database in Docker

Thou shalt not run a prod database inside a container

By 冯若航（@Vonng） | 2019-01-13

Thou shalt not run a prod database inside a container

For stateless application services, the container is a perfect development and operation solution. However, for a service with a persistent state - the database, things are not that simple. As a developer, I really like Docker and believe that Docker and Kubernetes are the standard way to deploy and deploy software for future development. But as a DBA, I think the database in the container is a nightmare for operation and maintenance. ** Whether the database of the production environment should be placed in the container is still a controversial issue. But the truth is always more and more clarified. Today I will talk to you about why it is a bad idea to put the production database into the container.

But the truth is always getting more clear with more debat and more practice. In this document, I will show you why it is a bad idea to putting production database into docker.

What problems does Docker solve?

Let’s get start by looking at Docker’s self-description:

Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure so you can deliver software quickly. With Docker, you can manage your infrastructure in the same ways you manage your applications. By taking advantage of Docker’s methodologies for shipping, testing, and deploying code quickly, you can significantly reduce the delay between writing code and running it in production.

The words Docker uses to describe itself include: lightweight, standardized, portable, cost savings, increased efficiency, automatic, integrated, and efficient operation and maintenance. So much benifits, and there is no problem claiming that., Docker does make development and operations much easier in most cases. So we can see that’s why so many companies are so eager to Dockerize their software and services.

But sometimes this enthusiasm goes to the other extreme side: Containerize all software, EVEN A PRODUCTION DATABASE. Containers are originally designed for stateless applications, and temporary data generated by applications within a container is also considering part of that container. Create a service with a container and destroy it after you run out. That is the typical usecase.

These apps themselves have no state, and the state is usually stored in a database outside the container, which is the classic architecture and usage, and the philosophy of docker. But when we put the database inside the container, things are different. Database is stateful service, and in order to maintain this state regardless container runs and leave, the DB container needs to drill a hole to the underlying operating system, which is named data volume.

Such a container is no longer an object that can be created, destroyed, transported, transferred at will, but it is bound to the underlying environment, so many advantages of typical container use case no longer apply to database containers.

Reliability

Getting the software up & run and making the software run reliably are two different things.

Database is the core of almost all information systems. It is a CRITICAL service among whole IT systems. Here CRITICAL is literally explained: DEAD WHEN FAILURE HAPPENS. If application is down, you can pull it up later. But if your database is scraped and don’t have a backup, then it is dead for good.

This is similiar to our daily software circumstance: Word/Excel/PPT can corrupt and it is not a big deal to pulling them up. But if your critical document corrupted or lost, it is really a mess. Database is similiar for many companies: if the database is deleted and there is no backup, it may down. Reliability is the most important attribute of a database. Reliability (reliability) is the ability of the system to function properly in adversity (Hardware failure, software failure, human error) (completly functional correctly and achieve the desired level of performance). Note that reliability differs from availability. Reliability means fault tolerance AND Resilient. Availability can usually be measured by a serveral nines, a percentage representing the expectation ratio of the system uptime. Reliability is hard to mearsure, it can only be proved by continuously running, or falsify by failure. Therefore, reliability is a safty property and is not that intuitive or measurable as performance and maintainability.

Safty matters, many people tend to ignore the most important attribute —— safety. They only aware of that when people get killed, get robbed, get sickness, get car accident, get database dropped, etc…. Only after that, people would feel regret.

So, Docker’s self-description does not contain words like “reliable”, which is the most important attribute for database. Docker do know what it is capable of.

Additional failure points

When comparing with bare metal, Putting a database inside docker does not reduce the probability of hardware failures, software errors, or human errors. Instead, **the overall reliability of the system decreases ** due to additional components, additional complexity, and additional failure points. The database container needs to be bound to a specific machine via data-volumn, so there hardware failure remains the same.

Standardized deployment may seems great by the first look. but there is no essential difference between scripts and dockerfile. At least for software bugs. It is mainly becauce of poor application design, which is a problem docker can not help with. So does human errors.

Additional components will introduce additional failure points, and the introduction of docker will not only involve us into problems with docker itself, but also the conjunct point between docker and other existing component. So, when failure occurs, it may be a problem with the database, a problem with the docker, a problem of the interaction between database & docker, or a problem because of interaction between docker & OS.

Take a look at the official PostgreSQL Docker image Issue list: Https://github.com/docker-library/postgres/issues. You can find a long list there. There are tons of additional problems when putting database into docker. Let me give you the simplest example: What will the database be if the dockerd daemon collapses? It will definitly break and out of service. Another subtle example is running two PostgreSQL instances on the same data directory, (2 docker on same volume, or 1 inside 1 outside). You data will be toasted without proper fencing, and these are problems that never gonna happen on bare metal.

Reliability Proof and community knowledge

As mentioned earlier, reliability does not a good way to measure. The only reliable is prove itself by contiously running correctly for a long time. Deploy database on bare metal is the traditional way of doing things, it has been proved by continuous work for serveral decades. Docker is a revolution to DevOps, but it is still too young , five years old is still much too short for critical things like procdution database. No enough lab rat yet.

In addition to long-term running, There is another way to “increase” reliability, which is failure. Failure is very valuable experience, it turns uncertainties into certainties, turns unknown problems into known problems. Failure experience is the most valueable part of operators. It is the form of operation knowledge, and it is the way for community to accumulate knowledge. For PostgreSQL, most of the community experience is based on the assumption of bare metal deployment, Variant failure has been explored by many people for decades. If you encounter some db problems. You are very likely to find similar situation other community members already been through, and find corresponding solutions. But if you search the same problem with additional keyword ‘Docker’, you would find a lot less useful information. Which means when something nasty happens, the probability of successfully rescuing the data is much lower, and the time required to resume would be much longer

Another subtle thing is, Companies and individuals are reluctant to share these failure experience if there are not special reasons. For companies, failure report may undermines company’s reputation, it may expose sensitive information or may expose how rubbish the infurstructure is. For individuals, the fault experience is almost the most important part of their values. Once shared, their value undermined. Ops/DBA is not that open than Dev. That is the very reason why docker kubernate operator exist: trying to make operation experience codify and able to accumulate. But that is really naive by now. Since few people would like to share these. You can find rubbish every where. Like the official PostgreSQL Docker image, it lack tons of tooling & tunning & settings to work effeciently like a real-world database.

Tooling

Database requires lots of tools to maintain, including: operations scripts, deployments, backups, archives, Failover, Major/Minor version upgrades, plugin installation, connection pooling, performance analysis, monitoring, tuning, inspection, repairing, etc. Most of these tools are also designed for bare metal deployments. These tooling are critical too, without adequate testing, we can’t really put that into use. Makes a thing up & running, and make things running steady for a long-time is complete different level of reliability.

Plugins is the typical example. PostgreSQL have lots of useful plugins, such as PostGIS. If you want to install the plugin to database, All you need is just typing yum install' and then create extension postgis on the bare metal. But doing it the Docker way, you need to modify the Dockerfile, build a new image, push it to the server, and then Restart the database container. No doubt that is much more complicated.

Similar problems including some CLI tools and system commands. They can preform on host in theory, but you can’t asure the execution & result have exact same meanning. And when emergency situation happens and you need some tools that doesn’t included in container, and you don’t have Internet access or yum repository. You would have to go through Dockerfile → Build Image → Restart Container. That is really insane.

When refer to monitoring, docker makes things harder. There are many subtle differences between monitoring in containers and monitoring on bare metal. For example, on bare metal, the sum of different modes of the CPU time and will always be 100%, but such assumptions do not always apply inside the container. In traditional bare metals, Node level metrics are important part of database indicators. it make monitoring a lot worse when database container is mixing deployed with application. Of course, if you using docker in a VM’s manner, many things still likely to work, but in that way we will loose the real value of using Docker.

Scalability

Performance is another point that people concerned a lot. From the performance perspective, the basic principle of database deployment is: The close to hardware, The better it is. Additional isolation & abstraction layer is bad for database performance. More isolation means more overhead, even if it is just an additional memcpy in the kernel .

For performance-seeking scenarios, some databases choose to bypass the operating system’s page management mechanism to operate the disk directly, while some databases may even use FPGA or GPU to speed up query processing. Docker as a lightweight container, performance suffers not much, and the impact to performance-insensitive scenarios may not be significant. But the extra abstract layer will definitely make performance worse than make it better.

Isolation

Docker provides process-level isolation. Database values isolation, but not this kind of isolation. Database performance is critical, so the typical deployment is take a whole physical machine exclusivly. with some necessary tools in addition. there will be no other applications. Even when using docker, we’d give it a whole physical machine.

Therefore, the isolation provided by Docker is useful for multi-tenant oversold by Cloud database vendors. But for other cases, it does not make much sense for deploying database.

Scales out

For stateless applications, using containers makes scale out incredibly simple, and it doesn’t matter which node you can schedule at will. But this does’t apply to database or some stateful application, you can not create or destory database instances freely as appserver: for example, to create a new replica, you have to pull it from primary whether you are using docker or not. It may take serval hours to copy serval TB data in production. And this still require manual intervention & inspection & verification. So what is the essence difference between running a ready-made make-replica script and running docker run . Time are spending on making new replicas.

Maintainability

Most of the software cost spending on operation phase rather than development phase: fixing vulnerabilities, keeping the system up and running, handling failures, version upgrades, migration, repaying tech debt, etc… Maintainability is very important for the quality of work & life of operators . That is the most pleasing part of Docker: Infrastructure as code. We can say that docker’s greatest value lies in its ability to deposit operational experience of software into reusable code, accumulating it in an easy way, rather than having a brunch of install/setup document & scripts scattered across everywhere. From this perspective, I think docker has done a great job, especially for stateless applications where logic is constantly changing. Docker and kubernates allow us to easily deploy, complete expansion, shrinkage, release, rolling upgrades, and so on, so that Dev can also be able to work as an OPS, so that OPS can also be able to DBA life (plausible).

But can these conclusion be applied to database? Once initialized, database does not require frequent environment changes. It may continuously running years without big change. DBAs typically accumulate a lot of maintenance scripts, the one-key configuration environment isn’t much slower than the Docker way, and the number of environments that need to be configured and initialized is relatively small, so the convenience of the container in terms of environmental configuration does not have a significant advantage. For daily operations, it is not possible for a database container to destroy creation and restart the migration as freely as the application container. Many operations need to be performed through the docker exec approach: In fact, they may still running the exact same script, but the steps has become much more cumbersome.

Docker prefer to say things like it is easy to upgrade software with docker. It is true for minor version: simply modifying the version in the Dockerfile and rebuild the image, then restarting the database container. However, when we need a major version upgrade, this is the way to do binary upgrade in docker: Https://github.com/tianon/docker-postgres-upgrade, and I can archieve that in serval lines of bash scripts.

And, it take more effort to use some existing tools with docker exec. For example, docker exec will mix the stdin and stderr, Which makes a lot of tools that rely on pipe does not work anymore. For example, if you want perform an ETL to transfer a table to another instance, in traditional way:

psql <src-url> -c 'COPY tbl TO STDOUT' |\
psql <dst-url> -c 'COPY tdb FROM STDIN'

with docker, things are more complicated

docker exec -it srcpg gosu postgres bash -c "psql -c \"COPY tbl TO STDOUT\" 2>/dev/null" |\ docker exec -i dstpg gosu postgres psql -c 'COPY tbl FROM STDIN;'

And if you want to take a basebackup from postgres inside container, and does not install PostgreSQL on host machine, you would have to run this command with a lot of extra wrapper:

docker exec -i postgres_pg_1 gosu postgres bash -c 'pg_basebackup -Xf -Ft -c fast -D - 2>/dev/null' | tar -xC /tmp/backup/basebackup

In fact, it is not Docker that elevates the daily operations experience, but the tools such as ansible. Containers may be faster in building a database environment, but such tasks are very rare. Thus, if the database container cannot be dispatched as freely as appserver, scales quickly, and does not bring more convenience to the initial setup, daily operations, and emergency troubleshooting than ordinary scripting, why should we putting the production database into docker?

I think maybe it’s because a rough image solution would still be better than setup blindly without DBA. Container technology and orchestration technology is very valuable for operation and maintenance, it actually fills the gap between software and service. Its vision is to modularize the experience and ability of operation and maintenance. Docker & kubernates would become the standard way of package management, and orchestration in the further. And envolv into something like “DataCenter DistributedCluster OperatingSystem”, and become the underlying infrastructure of all software, became the universal runtime. After those major uncertainly been elimiated, we can then put our application & valuable database inside that. But for now, at least for the production database, it’s just a good vision.

Summary

Finally, I must emphasized that the above discussion is only limited to the production database . That is to say, for db in development env, or application in production env, I am also very supportive of using docker. But when refer to production databases. if this data is really important, the we should ask ourselves three questions before come into it:

Am I willing to be a lab rat ?
Can I hold the problems ?
Can I take the consquence?

Any technical decision is some sort of trade-off, putting a production database into a container, the critical trade-off is **sacrificing reliability in exchange for maintainability **.

There are some scenarios where data reliability is not so important, or there are other considerations: for cloud service vendors, for example, it’s a great scenario for putting database inside docker. Container isolation, high resource utilization, and management convenience fit all requirement in that scenario. But for most cases, reliability has the highest priority, sacrificing reliability in exchange for maintainability is not advisable.

Postgres Development Convention

PostgreSQL Development convention

By 冯若航（@Vonng） | 2018-06-20

This is a PostgreSQL convention I wrote for previous company. Could be a useful reference when dealing with database. For now it’s just an index. detailed information will be traslated when got time.

0x00 Background

Nothing can be accomplished without norms or standards.

PostgreSQL is a powerful database. But to make it powerful all the time, it is a team work of Dev, Ops & DBA.

0x01 Naming Convention

Naming is beginning of everthing.

[REQUIRED] Common naming rule

[REQUIRED] Database naming rule

[REQUIRED] Role naming rule

[REQUIRED] Schema naming rule

[OPTIONAL] Table naming rule

[OPTIONAL] Index naming rule

[OPTIONAL] Function naming rule

[OPTIONAL] Column naming rule

[OPTIONAL] Variable and Parameter naming rule

0x02 Designing Convention

Suum cuique

[REQUIRED] Character Encoding must be UTF-8

[REQUIRED] Capacity Planning

[REQUIRED] Do not abuse stroed procedure

[REQUIRED] Separation of storage and calculation

[REQUIRED] Primary key and IDENTITY

[REQUIRED] Beware of foreign key

[REQUIRED] Beware of trigger

[REQUIRED] Avoid wide tables

[REQUIRED] Add default value to column

[REQUIRED] Handle nullable with caution

[REQUIRED] Unique constraint should be forced by database

[REQUIRED] Beware of integer overflow

[REQUIRED] Use Timestamp without timezone, force timezone to UTC

[REQUIRED] DROP obsolete function in time

[OPTIONAL] Data Type of primary key

[OPTIONAL] Use proper data types

[OPTIONAL] Use ENUM for stable and small valuespace fields

[OPTIONAL] Choose right text types.

[OPTIONAL] Choose right numeric types

[OPTIONAL] Function format

[OPTIONAL] Design for evolvability

[OPTIONAL] Choose right norm level

[OPTIONAL] Embrace new database version

[OPTIONAL] Use radical feature with caution

[OPTIONAL] Choose right norm level

0x03 Indexing Convention

Wer Ordnung hält, ist nur zu faul zum Suchen.

[REQUIRED] OLTP queries must have corresponding index

[REQUIRED] Never build index on wide field

[REQUIRED] Explicit with null ordering

[REQUIRED] Handle KNN problem with GiST index

[OPTIONAL] Make use of function index

[OPTIONAL] Make use of partial index

[OPTIONAL] Make use of BRIN index

[OPTIONAL] Beware of selectivity

0x04 Querying Convention

The limits of my language mean the limits of my world.

[REQUIRED] Separation of read and write

[REQUIRED] Separation of fast and slow

[REQUIRED] Set timeout for queries

[REQUIRED] Beware of replication lag

[REQUIRED] Use connection pooling

[REQUIRED] Changing connection state is forbidden

[REQUIRED] Have retry mechanism for aborted transaction

[REQUIRED] Have reconnecting mechanism

[REQUIRED] Execute DDL in production application code is forbidden

[REQUIRED] Using explicit schema names

[REQUIRED] Using explicit table names when involve join

[REQUIRED] Have reconnecting mechanism

[REQUIRED] Full table scan is forbidden in OLTP systems

[REQUIRED] Idle in Transaction for a long time is forbidden

[REQUIRED] Close cursors

[REQUIRED] Beware of NULL

[REQUIRED] Beware of null input on aggragation

[REQUIRED] Beware of hole in serial space

[OPTIONAL] Use prepared statement for repeat queries

[OPTIONAL] Use right isolation levels

[OPTIONAL] Do not tell existance by count

[OPTIONAL] Use returning clause

[OPTIONAL] Use upsert

[OPTIONAL] Use advisory locks to avoid contention

[OPTIONAL] Optimize IN operator

[OPTIONAL] Do not use left fuzzy search

[OPTIONAL] Use array instead of temporary table

0x05 Deploying Convention

[REQUIRED] Follow the deploy procdure

[REQUIRED] Deploy request format

[REQUIRED] Deployment review rules

[REQUIRED] Deployment time window

0x06 Operation Convention

[REQUIRED] Take care of backups

[REQUIRED] Take care of ages

[REQUIRED] Take care of bloats

[REQUIRED] Take care of replication lags

[REQUIRED] Take care of resource consumption

[REQUIRED] Minimal privilege princple

[REQUIRED] CREATE INDEX CONCURRENTLY

[REQUIRED] Warmming before taking real traffic

[REQUIRED] Doing schema migration with caution

[OPTIONAL] Split batch operation

[OPTIONAL] Speed up bulk load