多页打印视图 点击此处打印.

返回常规视图.

欢迎来到Pigsty中文文档(v0.9)

Pigsty是 PostgreSQL In Graphic STYle 的缩写,即 “图形化Postgres”。

pigsty 一词的的本意是猪圈,读作 Pig Style (/ˈpɪɡˌstaɪ/) 。

中文文档 | English Docs

Pigsty提供业界顶尖的开源PostgreSQL监控系统,与开箱即用的高可用数据库供给方案。既可以用于监控、部署、管理大规模生产级高可用数据库集群,也可用于快速搭建单机测试&演示数据库环境。

Pigsty基于开源生态构建,针对大规模数据库集群监控与管理而设计;经过长期迭代演进,久经实际生产环境考验。Pigsty旨在为用户带来极致的可观测性与丝滑的数据库使用体验,降低PostgreSQL使用管理的门槛,让所有人都能轻松享受到数据库的乐趣。

Pigsty基于Apache 2.0协议开源,可免费用于商业目的。但不得改装为自有产品,遵守显著声明义务。

1 - 概览

快速了解Pigsty所解决的问题,采用的技术,适用的场景。

Pigsty是什么?

Pigsty是监控系统

You can’t manage what you don’t measure.

监控系统提供了对系统状态的度量,是运维管理工作的基石。

PostgreSQL是世界上最好的开源关系型数据库,但其生态中却缺少一个足够好的监控系统。

Pigsty旨在解决这一问题:交付最好的PostgreSQL监控系统

与同类产品相比,Pigsty在指标覆盖率与监控面板丰富程度上一骑绝尘,无出其右,详见同类对比

Pigsty是供给方案

授人以鱼,不如授人以渔。

Pigsty还是门槛最低的高可用数据库集群 供给方案

供给方案不是数据库,而是数据库工厂。用户向工厂提交订单,供给系统会自动根据表单的内容,创建出对应的数据库集群。

Pigsty通过声明式的配置定义数据库集群,通过幂等的预置剧本自动创建所需的数据库集群,提供近似私有云般的使用体验。

Pigsty创建的数据库集群是分布式、高可用的数据库集群。只要集群中有任意实例存活,集群就可以对外提供完整的读写服务与只读服务。数据库集群中的每个数据库实例在使用上都是幂等的,任意实例都可以通过内建负载均衡组件提供完整的读写服务,提供分布式数据库的使用体验。数据库集群可以自动进行故障检测与主从切换,普通故障能在几秒到几十秒内自愈,且期间只读流量不受影响。

Pigsty采用简单成熟稳定的物理机/虚拟机部署方式,一行命令完成安装,真正做到傻瓜式部署。本地开发,公用测试,生产环境均可使用同一套方案,既可用于学习、开发、测试,又能用于大规模生产实践。

此外,Pigsty的监控系统可以脱离Pigsty供给方案独立部署,详见 仅监控部署

Pigsty是开源软件

Pigsty基于Apache 2.0协议开源,可以免费使用,也提供可选的商业支持。

Pigsty的监控系统与供给方案大多基于开源组件,而PostgreSQL本身也是世界上最先进的开源关系型数据库。基于开源生态,回馈开源社区。Pigsty可以极大地降低PostgreSQL的使用与管理门槛,让更多人享受到PostgreSQL的便利,体验数据库的乐趣。

开发Pigsty的初衷是:作者需要对一个大规模PostgreSQL集群进行管理,但找遍所有市面上的开源与商业监控系统方案后,发现没有一个是“足够好用”的。本着“我行我上”的精神,开发设计了Pigsty监控系统。而监控系统要想发行与演示,必须要先有被监控的对象,所以顺便开发了Pigsty供给方案。

Pigsty将主从复制,故障切换,流量代理,连接池,服务发现,基本权限系统等生产级成熟部署方案打包至本项目中,并提供了沙箱环境用于演示与测试。沙箱配置文件只微量修改即可应用于生产环境部署,用户在自己的笔记本电脑上就可以充分探索与体验Pigsty提供的功能,真正做到开箱即用

接下来做什么?

上手

浏览

实战

2 - 上手

如何快速拉起Pigsty

准备

安装Pigsty需要一个机器节点:规格至少为1核2GB,采用Linux内核,安装CentOS 7发行版,处理器为x86_64架构。该节点在生产环境中被用作元节点管理节点),发出控制命令,采集监控数据,运行定时任务。

安装

安装需要root权限。使用带有sudo权限的用户(或root)执行以下命令即可完成安装:

curl -fsSL https://pigsty.cc/pigsty.tgz | gzip -d | tar -xC ~; cd ~/pigsty  # 下载源码
make config    # 配置环境
make install   # 安装软件

在使用离线安装包的情况下,整个安装过程耗时约10~15分钟。

./configure 会自动检测环境。如果节点拥有多个IP地址,请指定一个主要IP地址。沙箱环境中的IP地址固定为10.10.10.10 。此外,如果离线安装包/tmp/pkg.tgz不存在,程序会提示是否从网络下载。

沙箱

如果希望在本机运行Pigsty,可以使用虚拟机软件,或使用Pigsty沙箱。沙箱是本地演示/测试/开发环境,运行于由 Vagrant 托管的本地 Virtualbox 虚拟机上。这两者都是跨平台软件,可以在MacOS|Windows|Linux下运行。

以MacOS为例,在本机终端中依次执行以下命令,即可拉起沙箱。

make deps   # 安装homebrew,并通过homebrew安装vagrant与virtualbox(需重启)
make dns    # 向本机/etc/hosts写入静态域名 (需sudo输入密码)
make start  # 使用Vagrant拉起单个meta节点 (start4则为4个节点)
make demo   # 使用单节点Demo配置并安装    (demo4则为4节点demo)

使用

安装完毕后,用可以直接访问该节点上的端口来使用Pigsty监控系统。

例如,Pigsty监控系统默认使用3000端口,默认管理用户与密码均为:admin

在使用沙箱时,用户可以通过make dns写入的默认本地域名访问Pigsty提供的相关服务,例如这里的:http://g.pigsty。Pigsty对外暴露的相关服务如下表所示:

服务 域名 地址 说明
Grafana http://pigsty 10.10.10.10:3000 Pigsty监控系统主页
Consul http://c.pigsty 10.10.10.10:8500 元数据库,展示集群中的所有节点与服务的状态
Prometheus http://p.pigsty 10.10.10.10:9090 监控时序数据库,查询指标,定义规则,处理报警
Alertmanager http://a.pigsty 10.10.10.10:9093 浏览、处理、屏蔽告警信息
Haproxy http://h.pigsty 10.10.10.10:80 浏览负载均衡器的状态,进行流量管理与控制
Yum Repo http://yum.pigsty 10.10.10.10:80 本地Yum源,包含所有离线软件安装包

当使用普通机器部署时,将这里的IP地址(10.10.10.10)换为用户自己的节点IP即可。

直接通过IP地址访问虽然方便,但更合适的做法是通过 nginx_upstream 为各个服务指定域名,并通过域名访问不同的服务。Pigsty自带的Nginx会默认通过80端口对外代理所有Web访问。

部署

Pigsty安装完成后,这台机器将作为Pigsty的元节点。用户可以从元节点发起控制,部署新PG集群。部署新数据库集群分为三步:

  1. 将用于部署的机器节点纳入管理

    当前用户可以从当前节点免密码ssh登陆目标节点,并带有免密码的sudo权限。

  2. 定义数据库集群(配置文件或图形界面)

  3. 执行数据库集群部署剧本

    如果用户通过make start4make demo4启动沙箱,则无需配置直接执行此命令即可。

    ./pgsql.yml -l pg-test    # 初始化pg-test数据库集群
    

更多信息请参考部署一章

FAQ

安装与使用过程中的常见问题,请参考 FAQ

接下来做什么?

2.1 - FAQ

Pigsty快速上手常见问题

下载问题


源码包从哪里下载?

Pigsty源码包:pigsty.tgz 可以从多个地方下载:Pigsty官网,Pigsty CDN,以及Github。

  • Pigsty官网是最新最快都的下载地址,也是默认使用的地址。但只提供最新版本,不提供历史版本。
  • Github Release 是最权威最全面的下载地址,包含所有历史版本。
  • Pigsty CDN则主要用于下载历史版本,以及离线软件包。
https://pigsty.cc/pigsty.tgz                                          # 官网最新
https://github.com/Vonng/pigsty/releases/download/v0.9/pigsty.tgz     # Github
http://pigsty-1304147732.cos.accelerate.myqcloud.com/v0.9/pigsty.tgz  # CDN

离线安装包从哪里下载?

默认情况下,用户不需要操心这个问题configure过程中如果发现离线安装包不存在,将会自动提示用户下载。但如果用户需要在没有互联网访问的环境下进行安装,就需要自行下载并将其上传至目标服务器。

离线安装包pkg.tgz可以从Github Release 或CDN(专为大陆提供)下载。

https://github.com/Vonng/pigsty/releases/download/v0.9/pkg.tgz     # Github
http://pigsty-1304147732.cos.accelerate.myqcloud.com/v0.9/pkg.tgz  # CDN (China)

将其放置于安装机器的 /tmp/pkg.tgz 路径下,即可在安装过程中自动使用。离线软件包默认会解压至:/www/pigsty


不使用离线安装包?

离线安装包中包含了从各路Yum源与Github Release中收集下载的软件包。用户也可以选择不使用预先打包好的离线安装包,而是直接从原始上游下载。当用户使用非 CentOS 7.8 操作系统时,通常可以使用这种方式解决绝大多数依赖错漏问题。不使用离线安装包也很简单,在make config提示时选择否 n 即可。


安装yum软件包时报错

默认的离线软件安装包基于CentOS 7.8环境制作,如果出现问题,可以删除/www/pigsty中出现问题的相关rpm包,以及/www/pigsty/repo_complete标记文件。执行make repo-download重新下载与当前操作系统版本匹配的依赖软件包即可。


有些软件包下载速度太慢

Pigsty已经尽可能使用国内yum镜像进行下载,然而少量软件包仍然受到GFW的影响,导致下载缓慢,例如直接从Github下载的相关软件。有以下解决方案:

  1. Pigsty提供离线软件安装包,预先打包了所有软件及其依赖。在make config时会自动提示下载。

  2. 通过proxy_env指定代理服务器,通过代理服务器下载,或直接使用墙外服务器。

  3. 通过URL直接下载的软件,Pigsty CDN提供了镜像(文件名不变,前缀换掉),例如:

    http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/pg_exporter-0.3.2-1.el7.x86_64.rpm
    

Vagrant沙箱第一次启动太慢

Pigsty沙箱默认使用CentOS 7虚拟机,Vagrant首次启动虚拟机时,会下载CentOS/7的ISO镜像Box,尺寸不小。(当然您也可以选择自己下一个CentOS 7 ISO然后用虚拟机安装)。使用代理可能会提高下载速度,好在这个下载只需要在第一次启动时进行。


版本问题


Pigsty源码有哪几种分支?

除了常规的语义版本号之外,Pigsty有三个主要分支:Default, Pro, Beta。pigsty.tgz为标准开源版本,pigsty-beta.tgz为BETA版本,pigsty-pro.tgz为专业版本。普通用户使用默认的pigsty.tgz即可,专业版目前不提供公开下载。


我需要要等1.0 GA吗?

Pigsty从0.3开始就实际应用于真实世界的生产环境中,并不是1.0才真正General Available。然而1.0计划了若干变更(例如监控指标的重新定义改造,PG14的支持),而Pigsty不会对v1.0前的版本提供升级支持。是否现在就用于生产请视自身情况考虑。


编辑Pigsty配置文件的GUI工具是什么?

那是一个单独的命令行工具pigsty-cli,目前处于beta状态。将于Pigsty v1.0一同正式发布。


环境问题


Pigsty的安装环境

安装Pigsty需要至少一个机器节点:规格至少为1核2GB,采用Linux内核,安装CentOS 7发行版,处理器为x86_64架构。

在生产环境中,建议使用更高规格的机器,并部署多个元节点作为容灾冗余。生产环境中元节点将作为管理节点发出控制命令,管理部署数据库集群,采集监控数据,运行定时任务等。


Pigsty的操作系统要求

Pigsty强烈建议使用CentOS 7.8操作系统安装元节点与数据库节点,以免将精力消耗在无谓的问题上。

Pigsty的默认开发、测试、部署环境都基于CentOS 7.8,CentOS 7.6也经过充分的验证。其他CentOS 7.x及其等效版本RHEL7 , Oracle Linux 7在理论上都没有问题,但并未进行测试与验证。

在使用仅监控模式监控已有PostgreSQL数据库集群时,可以使用不同的Linux发行版。因为监控系统相关组件均为Go编写的二进制,可以兼容各种Linux发行版。 但这并不是官方支持的行为。

后续其他操作系统支持可能以容器镜像的形式提供。

为什么不使用Docker与Kubernetes?

虽然Docker对于环境兼容性破事有非常好的疗效,然而数据库并不属于容器使用的最佳场景。此外Docker与Kubernetes本身也是有使用门槛。为了满足“降低门槛”的主旨,Pigsty采用裸机部署。

但Pigsty在设计之初就考虑到容器化云化的需求,这体现在其配置定义的声明式实现中。并不需要太多修改就可以迁移改造为云原生解决方案。当时机成熟时,会考虑使用Kubernetes Operator的方式进行重构。


集成问题


是否可以监控已有的PG实例?

对于非Pigsty供给方案创建的外部数据库,可以使用仅监控模式部署,详情请参考文档。注意Pigsty部署需要目标机器ssh sudo权限。因此通常无法支持云厂商RDS,但例如MyBase for PostgreSQL的ECS托管云数据库是可以纳入监控的。


云厂商RDS监控不了有什么办法?

目前Pigsty官方不支持对纯RDS的监控,因为缺少机器指标的监控系统只能说是半成品。但用户可以通过本地部署PG Exporter远程连接监控RDS,以及Prometheus本地静态服务发现抓取本地Exporter,并通过手工配置Label的方式实现曲线救国。


监控系统问题


监控系统中的Dashboard与文档不一致?

为什么监控系统里只有10个Dashboard?因为开源版本的Pigsty只提供这些监控面板,当然也绝对够用了。


为什么PG Instance Log面板没有数据?

日志收集目前是一个Beta特性,需要额外的安装步骤。执行make logging会安装lokipromtail,执行后该面板方可用。毕竟loki还是比较新的日志收集方案,不是所有人都愿意接受。


监控系统的数据量有多大?

这取决于您数据库的复杂程度(workload),作为参考:200个生产数据库实例1天产生的监控数据量约为16GB。Pigsty默认保留30天监控数据,可以通过参数调整。


架构问题


Pigsty都装了什么东西?

详情请参考系统架构

Pigsty是一套带有完整运行时的数据库解决方案。在本机上,Pigsty可以作为开发、测试、数据分析的环境。在生产环境中,Pigsty可以用于部署,管理,监控大规模PostgreSQL集群。


Pigsty数据库如何保证高可用

Patroni 2.0作为HA Agent,Consul作为DCS,Haproxy作为默认流量分发器。Pigsty的数据库集群成员在使用上幂等:只要集群还有任意一个实例存活,读写与只读流量都可以继续工作。

DCS自身的可用性通过多节点共识保证,故生产环境中建议部署3~5个meta节点,或使用外部的DCS集群。


Pigsty问题交流群

3 - 概念

在使用Pigsty时需要了解的一些信息

Pigsty在逻辑上由两部分组成:监控系统供给方案

监控系统负责监控PostgreSQL数据库集群,供给方案负责创建PostgreSQL数据库集群。了解Pigsty的监控系统与供给方案前,阅读 命名原则整体架构 有助于对整体设计形成直观印象。

Pigsty的监控系统与供给方案可以独立使用,用户可以在不使用Pigsty供给方案的情况下,使用Pigsty监控系统监控现有PostgreSQL集群与实例,详见 仅监控部署

监控系统

You can’t manage what you don’t measure.

监控系统提供了对系统状态的度量,是运维管理工作的基石。Pigsty提供最好的开源PostgreSQL监控系统。

Pigsty的监控系统在物理上分为两个部分:

  • 服务端:部署于元节点上,包括时序数据库Prometheus,监控仪表盘Grafana,报警管理Altermanager,服务发现Consul等服务。
  • 客户端:部署于数据库节点上,包括NodeExporter, PgExporter, Haproxy。被动接受Prometheus拉取,上。

Pigsty监控系统的核心概念如下:

供给方案

授人以鱼,不如授人以渔

供给方案Provisioning Solution) ,指的是向用户交付数据库服务与监控系统的系统。供给方案不是数据库,而是数据库工厂,用户向供给系统提交一份配置,供给系统便会按照用户所需的规格在环境中创建出所需的数据库集群来,这类似于通过向Kubernetes提交YAML文件来创建系统所需的各类资源。

Pigsty的供给方案在部署上分为两个部分:

  • 基础设施(Infra) :部署于元节点上,监控基础设施,DNS,NTP,DCS,本地源等关键服务。
  • 数据库集群(PgSQL):部署于数据库节点上,以集群为单位对外提供数据库服务

Pigsty的供给方案的部署对象分为两种:

  • 元节点Meta):部署基础设施,执行控制逻辑,每个Pigsty部署至少需要一个元节点,可复用为普通节点。
  • 数据库节点Node):用于部署数据库集群/实例,Pigsty采用节点与数据库实例一一对应的独占式部署。

Pigsty供给方案的相关概念如下:

3.1 - 命名原则

介绍Pigsty默认采用的实体命名原则

名之必可言也,言之必可行也。

概念及其命名是非常重要的东西,命名风格体现了工程师对系统架构的认知。定义不清的概念将导致沟通困惑,随意设定的名称将产生意想不到的额外负担。因此需要审慎地设计。本文介绍 Pigsty 中的相关实体,以及其命名所遵循的原则。

结论

Pigsty中,核心的四类实体为:集群(Cluster)服务(Service)实例(Instance)节点(Node)

  • 集群(Cluster) 是基本自治单元,由用户指定唯一标识,表达业务含义,作为顶层命名空间。
  • 集群在硬件层面上包含一系列的节点(Node),即物理机,虚机(或Pod),可以通过IP唯一标识。
  • 集群在软件层面上包含一系列的实例(Instance),即软件服务器,可以通过IP:Port唯一标识。
  • 集群在服务层面上包含一系列的服务(Service),即可访问的域名与端点,可以通过域名唯一标识。
  • 集群的命名可以使用任意满足DNS域名规范的名称,不能带点([a-zA-Z0-9-]+)。
  • 节点命名采用集群名称作为前缀,后接-,再接一个整数序号(建议从0开始分配,与k8s保持一致)
  • 因为Pigsty采用独占式部署,节点与实例一一对应。则实例命名可与节点命名保持一致,即${cluster}-${seq}的方式。
  • 服务命名亦采用集群名称作为前缀,后接-连接服务具体内容,如primary, replica,offline,delayed等。

entity-naming.png

以上图为例,用于测试的数据库集群名为“pg-test”,该集群由一主两从三个数据库服务器实例组成,部署在集群所属的三个节点上。pg-test集群集群对外提供两种服务,读写服务pg-test-primary与只读副本服务pg-test-replica

实体

在Postgres集群管理中,有如下实体概念:

集群(Cluster)

集群是基本的自治业务单元,这意味着集群能够作为一个整体组织对外提供服务。类似于k8s中Deployment的概念。注意这里的集群是软件层面的概念,不要与PG Cluster(数据库集簇,即包含多个PG Database的单个PG实例的数据目录)或Node Cluster(机器集群)混淆。

集群是管理的基本单位之一,是用于统合各类资源的组织单位。例如一个PG集群可能包括:

  • 三个物理机器节点
  • 一个主库实例,对外提供数据库读写服务。
  • 两个从库实例,对外提供数据库只读副本服务。
  • 两个对外暴露的服务:读写服务,只读副本服务。

每个集群都有用户根据业务需求定义的唯一标识符,本例中定义了一个名为pg-test的数据库集群。

节点(Node)

节点是对硬件资源的一种抽象,通常指代一台工作机器,无论是物理机(bare metal)还是虚拟机(vm),或者是k8s中的Pod。这里注意k8s中Node是硬件资源的抽象,但在实际管理使用上,是k8s中的Pod而不是Node更类似于这里Node概念。总之,节点的关键要素是:

  • 节点是硬件资源的抽象,可以运行一系列的软件服务
  • 节点可以使用IP地址作为唯一标识符

尽管可以使用lan_ip地址作为节点唯一标识符,但为了便于管理,节点应当拥有一个人类可读的充满意义的名称作为节点的Hostname,作为另一个常用的节点唯一标识。

服务(Service)

服务是对软件服务(例如Postgres,Redis)的一种命名抽象(named abastraction)。服务可以有各种各样的实现,但其的关键要素在于:

  • 可以寻址访问的服务名称,用于对外提供接入,例如:
    • 一个DNS域名(pg-test-primary
    • 一个Nginx/Haproxy Endpoint
  • 服务流量路由解析与负载均衡机制,用于决定哪个实例负责处理请求,例如:
    • DNS L7:DNS解析记录
    • HTTP Proxy:Nginx/Ingress L7:Nginx Upstream配置
    • TCP Proxy:Haproxy L4:Haproxy Backend配置
    • Kubernetes:Ingress:Pod Selector 选择器

同一个数据集簇中通常包括主库与从库,两者分别提供读写服务(primary)和只读副本服务(replica)。

实例(Instance)

实例指带一个具体的数据库服务器,它可以是单个进程,也可能是共享命运的一组进程,也可以是一个Pod中几个紧密关联的容器。实例的关键要素在于:

  • 可以通过IP:Port唯一标识
  • 具有处理请求的能力

例如,我们可以把一个Postgres进程,为之服务的独占Pgbouncer连接池,PgExporter监控组件,高可用组件,管理Agent看作一个提供服务的整体,视为一个数据库实例。

实例隶属于集群,每个实例在集群范围内都有着自己的唯一标识用于区分。

实例由服务负责解析,实例提供被寻址的能力,而Service将请求流量解析到具体的实例组上。

命名规则

一个对象可以有很多组标签(Tag)与元数据(Metadata/Annotation),但通常只能有一个名字(Name)

管理数据库和软件与管理宠物类似,都需要花心思照顾。而起名字就是其中非常重要的一项工作。肆意的名字(例如 XÆA-12,NULL,史珍香)很可能会引入不必要的麻烦(额外复杂度),而设计得当的名字则可能会有意想不到的惊喜效果。

总体而言,对象起名应当遵循一些原则:

  • 简洁直白,人类可读:名字是给人看的,因此要好记,便于使用。

  • 体现功能,反映特征:名字需要反映对象的关键特征

  • 独一无二,唯一标识:名字在命名空间内,自己的类目下应当是独一无二,可以惟一标识寻址的。

  • 不要把太多无关的东西塞到名字里去:在名字中嵌入很多重要元数据是一个很有吸引力的想法,但维护起来会非常痛苦,例如反例:pg:user:profile:10.11.12.13:5432:replica:13

集群命名

集群名称,其实类似于命名空间的作用。所有隶属本集群的资源,都会使用该命名空间。

集群命名的形式,建议采用符合DNS标准 RFC1034 的命名规则,以免给后续改造埋坑。例如哪一天想要搬到云上去,发现以前用的名字不支持,那就要再改一遍名,成本巨大。

我认为更好的方式是采用更为严格的限制:集群的名称不应该包括点(dot)。应当仅使用小写字母,数字,以及减号连字符(hyphen)-。这样,集群中的所有对象都可以使用这个名称作为前缀,用于各种各样的地方,而不用担心打破某些约束。即集群命名规则为:

cluster_name := [a-z][a-z0-9-]*

之所以强调不要在集群名称中用,是因为以前很流行一种命名方式,例如com.foo.bar。即由点分割的层次结构命名法。这种命名方式虽然简洁名快,但有一个问题,就是用户给出的名字里可能有任意多的层次,数量不可控。如果集群需要与外部系统交互,而外部系统对于命名有一些约束,那么这样的名字就会带来麻烦。一个最直观的例子是K8s中的Pod,Pod的命名规则中不允许出现.

集群命名的内涵,建议采用-分隔的两段式,三段式名称,例如:

<集群类型>-<业务>-<业务线>

比如:pg-test-tt就表示tt 业务线下的test集群,类型为pgpg-user-fin表示fin业务线下的user服务。

节点命名

节点命名建议采用与k8s Pod一致的命名规则,即

<cluster_name>-<seq>

Node的名称会在集群资源分配阶段确定下来,每个节点都会分配到一个序号${seq},从0开始的自增整型。这个与k8s中StatefulSet的命名规则保持一致,因此能够做到云上云下一致管理。

例如,集群pg-test有三个节点,那么这三个节点就可以命名为:

pg-test-1, pg-test-2pg-test-3

节点的命名,在整个集群的生命周期中保持不变,便于监控与管理。

实例命名

对于数据库来说,通常都会采用独占式部署方式,一个实例占用整个机器节点。PG实例与Node是一一对应的关系,因此可以简单地采用Node的标识符作为Instance的标识符。例如,节点pg-test-1上的PG实例名即为:pg-test-1,以此类推。

采用独占部署的方式有很大优势,一个节点即一个实例,这样能最小化管理复杂度。混部的需求通常来自资源利用率的压力,但虚拟机或者云平台可以有效解决这种问题。通过vm或pod的抽象,即使是每个redis(1核1G)实例也可以有一个独占的节点环境。

作为一种约定,每个集群中的0号节点(Pod),会作为默认主库。因为它是初始化时第一个分配的节点。

服务命名

通常来说,数据库对外提供两种基础服务:primary 读写服务,与replica只读副本服务。

那么服务就可以采用一种简单的命名规则:

<cluster_name>-<service_name>

例如这里pg-test集群就包含两个服务:读写服务pg-test-primary与只读副本服务pg-test-replica

一种流行的实例/节点命名规则:<cluster_name>-<service_role>-<sequence>,即把数据库的主从身份嵌入到实例名称中。这种命名方式有好处也有坏处。好处是管理的时候一眼就能看出来哪一个实例/节点是主库,哪些是从库。缺点是一但发生Failover,实例与节点的名称必须进行调整才能维持一执性,这就带来的额外的维护工作。此外,服务与节点实例是相对独立的概念,这种Embedding命名方式扭曲了这一关系,将实例唯一隶属至服务。但复杂的场景下这一假设可能并不满足。例如,集群可能有几种不同的服务划分方式,而不同的划分方式之间很可能会出现重叠。

  • 可读从库(解析至包含主库在内的所有实例)
  • 同步从库(解析至采用同步提交的备库)
  • 延迟从库,备份实例(解析至特定具体实例)

因此不要把服务角色嵌入实例名称,而是在服务中维护目标实例列表。毕竟名字并非全能,不要把太多非必要的信息嵌入到对象名称中。

3.2 - 系统架构

介绍Pigsty的系统架构

一套Pigsty部署在架构上分为两个部分:

  • 基础设施(Infra) :部署于元节点上,监控,DNS,NTP,DCS,Yum源等基础服务
  • 数据库集群(PgSQL):部署于数据库节点上,以集群为单位对外提供数据库服务

同时,用于部署的 节点(物理机,虚拟机,Pod)也分为两种:

  • 元节点Meta):部署基础设施,执行控制逻辑,每个Pigsty部署至少需要一个元节点。
  • 数据库节点Node):用于部署数据库集群/实例,节点与数据库实例一一对应。

沙箱样例

以Pigsty附带的四节点沙箱环境为例,组件在节点上的分布如下图所示:

图:Pigsty沙箱中包含的节点与组件

沙箱由一个元节点与四个数据库节点组成(元节点也被复用为一个数据库节点),部署有一套基础设施与两套数据库集群meta 为元节点,部署有基础设施组件,同时被复用为普通数据库节点,部署有单主数据库集群pg-metanode-1node-2node-3 为普通数据库节点,部署有数据库集群pg-test

基础设施

每一套 Pigsty 部署(Deployment) 中,都需要有一些基础设施,才能使整个系统正常工作。

基础设施通常由专业的运维团队或云厂商负责,但Pigsty作为一个开箱即用的产品解决方案,将基本的基础设施集成至供给方案中。

  • 域名基础设施:Dnsmasq(部分请求转发至Consul DNS处理)
  • 时间基础设施:NTP
  • 监控基础设施:Prometheus
  • 报警基础设施:Altermanager
  • 可视化基础设施:Grafana
  • 本地源基础设施:Yum/Nginx
  • 分布式配置存储:etcd/consul
  • Pigsty基础设施:元数据库MetaDB,管理组件Ansible,定时任务,与其他高级特性组件。

基础设施部署于 元节点 上。一套环境中包含一个或多个元节点,用于基础设施部署。

除了 分布式配置存储(DCS) 之外,所有基础设施组件都采用副本式部署;如果有多个元节点,元节点上的DCS(etcd/consul)会共同作为DCS Server。

元节点

在每套环境中,Pigsty最少需要一个元节点,该节点将作为整个环境的控制中心。元节点负责各种管理工作:保存状态,管理配置,发起任务,收集指标,等等。整个环境的基础设施组件,Nginx,Grafana,Prometheus,Alertmanager,NTP,DNS Nameserver,DCS都将部署在元节点上。

同时,元节点也将用于部署元数据库 (Consul 或 Etcd),用户也可以使用已有的外部DCS集群。如果将DCS部署至元节点上,建议在生产环境使用3个元节点,以充分保证DCS服务的可用性。DCS外的基础设施组件都将以对等副本的方式部署在所有元节点上。元节点的数量要求最少1个,推荐3个,建议不超过5个。

元节点上运行的服务如下所示:

组件 端口 默认域名 说明
Grafana 3000 g.pigsty Pigsty监控系统图形界面
Prometheus 9090 p.pigsty 监控时序数据库
AlertManager 9093 a.pigsty 报警聚合管理组件
Consul 8500 c.pigsty 分布式配置管理,服务发现
Consul DNS 8600 - Consul提供的DNS服务
Nginx 80 pigsty 所有服务的入口代理
Yum Repo 80 yum.pigsty 本地Yum源
Haproxy Index 80 h.pigsty 所有Haproxy管理界面的访问代理
NTP 123 n.pigsty 环境统一使用的NTP时间服务器
Dnsmasq 53 - 环境统一使用的DNS域名解析服务器

部署于元节点上的基础设置架构如下图所示:

其主要交互关系如下:

  • Dnsmasq提供环境内的DNS解析服务(可选,可使用已有Nameserver)

    部分DNS解析将转交由Consul DNS进行

  • Nginx对外暴露所有Web服务,通过域名进行区分转发。

  • Yum Repo是Nginx的默认服务器,为环境中所有节点提供从离线安装软件的能力。

  • Grafana是Pigsty监控系统的载体,用于可视化Prometheus与CMDB中的数据。

  • Prometheus是监控用时序数据库。

    • Prometheus默认从Consul获取所有需要抓取的Exporter,并为其关联身份信息。
    • Prometheus从Exporter拉取监控指标数据,进行预计算加工后存入自己的TSDB中。
    • Prometheus计算报警规则,将报警事件发往Alertmanager处理。
  • Consul Server用于保存DCS的状态,达成共识,服务元数据查询。

  • NTP服务用于同步环境内所有节点的时间(可选用外部NTP服务)

  • Pigsty相关组件:

    • 用于执行剧本,发起控制的Ansible
    • 用于支持各种高级功能的MetaDB(也是一个标准的数据库集群)
    • 定时任务控制器(备份,清理,统计,巡检,高级特性暂未加入)

数据库集群

生产环境的数据库以集群为单位进行组织,集群是一个由主从复制所关联的一组数据库实例所构成的逻辑实体。每个数据库集群是一个自组织的业务服务单元,由至少一个数据库实例组成。

集群是基本的业务服务单元,下图展示了沙箱环境中的复制拓扑。其中pg-meta-1单独构成一个数据库集群pg-meta,而pg-test-1pg-test-2pg-test-3共同构成另一个逻辑集群pg-test

pg-meta-1
(primary)

pg-test-1 -------------> pg-test-2
(primary)      |         (replica)
               |
               ^-------> pg-test-3
                         (replica)

下图从数据库集群的视角重新排列pg-test集群中相关组件的位置。

图:从数据库集群的逻辑视角审视架构(标准接入方案

Pigsty是数据库供给方案,可以按需创建高可用数据库集群。只要集群中有任意实例存活,集群就可以对外提供完整的读写服务与只读服务。Pigsty可以自动进行故障切换,业务方只读流量不受影响;读写流量的影响视具体配置与负载,通常在几秒到几十秒的范围。

在Pigsty中,每个“数据库实例”在使用上是幂等的,采用类似NodePort的方式对外暴露 数据库服务。默认情况下,访问任意实例的5433端口即可访问主库,访问任意实例的5434端口即可访问从库。用户也可以灵活地同时使用不同的方式访问数据库,详情请参考:数据库接入

数据库节点

数据库节点负责运行数据库实例, 在Pigsty中数据库实例固定采用独占式部署,一个节点上有且仅有一个数据库实例,因此节点与数据库实例可以互用唯一标识(IP地址与实例名)。

一个典型的数据库节点上运行的服务如下所示:

组件 端口 说明
Postgres 5432 Postgres数据库服务
Pgbouncer 6432 Pgbouncer连接池服务
Patroni 8008 Patroni高可用组件
Consul 8500 分布式配置管理,服务发现组件Consul的本地Agent
Haproxy Primary 5433 集群读写服务(主库连接池)代理
Haproxy Replica 5434 集群只读服务(从库连接池)代理
Haproxy Default 5436 集群主库直连服务(用于管理,DDL/DML变更)
Haproxy Offline 5438 集群离线读取服务(直连离线实例,用于ETL,交互式查询)
Haproxy <Service> 543x 集群提供的额外自定义服务将依次分配端口
Haproxy Admin 9101 Haproxy 监控指标与管理页面
PG Exporter 9630 Postgres监控指标导出器
PGBouncer Exporter 9631 Pgbouncer监控指标导出器
Node Exporter 9100 机器节点监控指标导出器
Consul DNS 8600 Consul提供的DNS服务
vip-manager x 将VIP绑定至集群主库上

主要交互关系如下:

  • vip-manager通过查询Consul获取集群主库信息,将集群专用L2 VIP绑定至主库节点(默认接入方案)。

  • Haproxy是数据库流量入口,用于对外暴露服务,使用不同端口(543x)区分不同的服务。

    • Haproxy的9101端口暴露Haproxy的内部监控指标,同时提供Admin界面控制流量。
    • Haproxy 5433端口默认指向集群主库连接池6432端口
    • Haproxy 5434端口默认指向集群从库连接池6432端口
    • Haproxy 5436端口默认直接指向集群主库5432端口
    • Haproxy 5438端口默认直接指向集群离线实例5432端口
  • Pgbouncer用于池化数据库连接,缓冲故障冲击,暴露额外指标。

    • 生产服务(高频非交互,5433/5434)必须通过Pgbouncer访问。

    • 直连服务(管理与ETL,5436/5438)必须绕开Pgbouncer直连。

  • Postgres提供实际数据库服务,通过流复制构成主从数据库集群。

  • Patroni用于监管Postgres服务,负责主从选举与切换,健康检查,配置管理。

    • Patroni使用Consul达成共识,作为集群领导者选举的依据。
  • Consul Agent用于下发配置,接受服务注册,服务发现,提供DNS查询。

    • 所有使用端口的进程服务都会注册至Consul中
  • PGB Exporter,PG Exporter, Node Exporter分别用于暴露数据库,连接池,节点的监控指标

节点与元节点交互

以单个 元节点 和 单个 数据库节点 构成的环境为例,架构如下图所示:

图:单个元节点与单个数据库节点(点击查看大图)

元节点与数据库节点之间的交互主要包括:

  • 数据库集群/节点的域名依赖元节点的Nameserver进行解析

  • 数据库节点软件安装需要用到元节点上的Yum Repo。

  • 数据库集群/节点的监控指标会被元节点的Prometheus收集。

  • Pigsty会从元节点上发起对数据库节点的管理

    执行集群创建,扩缩容,用户、服务、HBA修改;日志收集、垃圾清理,备份,巡检等

  • 数据库节点的Consul会向元节点的DCS同步本地注册的服务,并代理状态读写操作。

  • 数据库节点会从元节点(或其他NTP服务器)同步时间

3.3 - 监控系统

Pigsty监控系统相关概念

3.3.1 - 可观测性

从原始信息到全局洞察

对于系统管理来说,最重要到问题之一就是可观测性(Observability),下图展示了Postgres的可观测性。

https://pgstats.dev/

原图地址:https://pgstats.dev/

PostgreSQL 提供了丰富的观测接口,包括系统目录,统计视图,辅助函数。 这些都是用户可以观测的信息。这里列出的信息全部为Pigsty所收录。Pigsty通过精心的设计,将晦涩的指标数据,转换成了人类可以轻松理解的洞察。

可观测性

经典的监控模型中,有三类重要信息:

  • 指标(Metrics):可累加的,原子性的逻辑计量单元,可在时间段上进行更新与统计汇总。
  • 日志(Log):离散事件的记录与描述
  • 追踪(Trace):与单次请求绑定的相关元数据

Pigsty重点关注 指标 信息,也会在后续加入对 日志 的采集、处理与展示,但Pigsty不会收集数据库的 追踪 信息。

指标

下面让以一个具体的例子来介绍指标的获取及其加工产物。

pg_stat_statements是Postgres官方提供的统计插件,可以暴露出数据库中执行的每一类查询的详细统计指标。

图:pg_stat_statements原始数据视图

这里pg_stat_statements提供的原始指标数据以表格的形式呈现。每一查询都分配有一个查询ID,紧接着是调用次数,总耗时,最大、最小、平均单次耗时,响应时间都标准差,每次调用平均返回的行数,用于块IO的时间这些指标,(如果是PG13,还有更为细化的计划时间、执行时间、产生的WAL记录数量等新指标)。

这些系统视图与系统信息函数,就是Pigsty中指标数据的原始来源。直接查阅这种数据表很容易让人眼花缭乱,失去焦点。需要将这种指标转换为洞察,也就是以直观图表的方式呈现。

图:加工后的相关监控面板,PG Cluster Query看板部分截图

这里的表格数据经过一系列的加工处理,最终呈现为若干监控面板。最基本的数据加工是对表格中的原始数据进行标红上色,但也足以提供相当实用的改进:慢查询一览无余,但这不过是雕虫小技。重要的是,原始数据视图只能呈现当前时刻的快照;而通过Pigsty,用户可以回溯任意时刻或任意时间段。获取更深刻的性能洞察。

上图是集群视角下的查询看板 (PG Cluster Query),用户可以看到整个集群中所有查询的概览,包括每一类查询的QPS与RT,平均响应时间排名,以及耗费的总时间占比。

当用户对某一类具体查询感兴趣时,就可以点击查询ID,跳转到查询详情页(PG Query Detail)中。如下图所示。这里会显示查询的语句,以及一些核心指标。

图:呈现单类查询的详细信息,PG Query Detail 看板截图

上图是实际生产环境中的一次慢查询优化记录,用户可以从右侧中间的Realtime Response Time 面板中发现一个突变。该查询的平均响应时间从七八秒突降到了七八毫秒。我们定位到了这个慢查询并添加了适当的索引,那么优化的效果就立刻在图表上以直观的形式展现出来,给出实时的反馈。

这就是Pigsty需要解决的核心问题:From observability to insight

日志

除了指标外,还有一类重要的观测数据:日志(Log),日志是对离散事件的记录与描述。

如果说指标是对数据库系统的被动观测,那么日志就是数据库系统及其周边组件主动上报的信息。

Pigsty目前尚未对数据库日志进行挖掘,但在后续的版本中将集成pgbadgermtail,引入日志统一收集、分析、处理的基础设施。并添加数据库日志相关的监控指标。

用户可以自行使用开源组件对PostgreSQL日志进行分析。

追踪

PostgreSQL提供了对DTrace的支持,用户也可以使用采样探针分析PostgreSQL查询执行时的性能瓶颈。但此类数据仅在某些特定场景会用到,实用性一般,因此Pigsty不会针对数据库收集Trace数据。

接下来?

只有指标并不够,我们还需要将这些信息组织起来,才能构建出体系来。阅读 监控层级 了解更多信息

3.3.2 - 监控层级

介绍Pigsty监控系统中的层次关系

正如 命名原则 中所介绍,Pigsty中的对象分为多个层次:集群,服务,实例,节点。

监控系统层次

Pigsty的监控系统中有着更多的层次,除了实例集群这两个最为普遍层次,整个系统中还有着其他层次的组织。自顶向下可以分为7个层级:概览,分片,集群,服务,实例,数据库,对象。

图:Pigsty的监控面板被划分为7个逻辑层级与5个实现层级

逻辑层次

生产环境的数据库往往是以集群为单位组织的,集群是基本的业务服务单元,也是最为重要的监控层次。

集群是一个由主从复制所关联的一组数据库实例所构成的,实例是最基本的监控层次。

而多套数据库集群共同组成一个现实世界中的生产环境概览(Overview) 层次的监控提供了对整个环境的整体描述。

按照水平拆分的模式服务于同一业务的多个数据库集群称为分片(Shard),分片层次的监控对于定位数据分布、倾斜等问题很有帮助。

服务 是夹在集群与实例中间的层次,服务通常与DNS,域名,VIP,NodePort等资源紧密关联。

数据库(Database) 是亚实例级对象,一个数据库集群/实例可能会同时有多个数据库存在,数据库层面的监控关注单个数据库内的活动。

对象(Object) 是数据库内的实体,包括表,索引,序列号,函数,查询,连接池等,对象层面的监控关注这些对象的统计指标,与业务紧密相关。

层次精简

作为一种精简,正如网络的OSI 7层模型在实际中被简化为TCP/IP五层模型一样,这七个层次也以 集群实例 为界,简化为五个层次: 概览(Overview)集群(Cluster)服务(Service)实例(Instance)数据库(Database)

这样,最终的层次划分也变得十分简洁:所有集群层次以上的信息,都是 概览 层次,所有实例以下的监控都算作 数据库 层次,夹在 集群实例 中间的,就是 服务 层次。

命名规则

分完层次后,最重要的问题就是命名问题:

  1. 需要一种方式来标识、引用系统中不同层次内的各个组件,

  2. 这种命名方式,应当合理地反映出系统中各个实体的层次关系

  3. 这种命名方式,应当可以按照规则自动生成,只有这样,才可以在集群扩容缩容,Failover时做到免维护自动化运行,

当我们理清了系统中存在的层次后,就可以着手为系统中的每个实体起名。

Pigsty所遵循的基本命名规则,请参考 命名原则 一节。

Pigsty使用独立的名称管理机制,实体的命名自成体系。

如果需要与外部系统对接,用户可以直接使用这套命名体系,或通过转接适配的方式采用自己的命名体系。

集群命名

Pigsty的集群名称由用户指定,满足[a-z0-9][a-z0-9-]*的正则表达式,形如pg-testpg-meta

节点命名

Pigsty的节点从属于集群。Pigsty的节点名称由两部分组成:集群名节点编号,并使用-连接。

形式为${pg_cluster}-${pg_seq},例如pg-meta-1pg-test-2

在形式上,节点编号是长度合理的自然数(包括0),在集群范围内唯一,每个节点都有自己的编号。

实例的编号可以由用户显式指定并分配,通常采用从0或1开始分配,一旦分配,在集群生命周期内不再变更

实例命名

Pigsty的实例从属于集群,采用独占节点式部署。

因为实例与节点存在一一对应关系,因此实例名与节点命保持一致。

服务命名

Pigsty的服务从属于集群。Pigsty的服务名称由两部分组成:集群名角色(Role),并使用-连接。

形式为${pg_cluster}-${pg_role},例如pg-meta-primarypg-test-replica

pg_role的可选项包括:primary|replica|offline|delayed

primary是特殊的角色,每个集群必须,且只能定义一个pg_role = primary的实例作为主库。

其他的角色大体上由用户定义,其中replica|offline|delayed 是Pigsty预定义的角色。

接下来?

划分好监控的层级后,需要对为监控对象赋予身份,方能进行管理。

3.3.3 - 身份管理

Pigsty如何管理监控对象的身份

所有的实例都具有身份(Identity),身份信息是与实例关联的元数据,用于标识实例。

图:使用Consul服务发现时,Postgres服务带有的身份信息

身份参数

身份参数是任何集群与实例都必须定义的唯一标识符。

名称 变量 类型 说明
集群 pg_cluster 核心身份参数 集群名称,集群内资源的顶层命名空间
角色 pg_role 核心身份参数 实例角色,primary, replica, offline,…
标号 pg_seq 核心身份参数 实例序号,正整数,集群内唯一。
实例 pg_instance 衍生身份参数 ${pg_cluster}-${pg_seq}
服务 pg_service 衍生身份参数 ${pg_cluster}-${pg_role}

身份关联

为系统中的对象命名后,还需要将 身份信息 关联至具体的实例上。

身份信息属于业务赋予的元数据,数据库实例本身不会意识到这些身份信息,它不知道自己为谁而服务,从属于哪个业务,或者自己是集群中的几号实例。

身份赋予可以有多种形式,最朴素的身份关联方式就是运维人员的记忆:DBA在脑海中记住了IP地址为10.2.3.4上的数据库实例,是用于支付的实例,而另一台上的数据库实例则用于用户管理。更好的管理方式是通过配置文件,或者采用服务发现的方式来管理集群成员的身份。

Pigsty同时提供这两种身份管理的方式:基于Consul的服务发现,与基于配置文件的服务发现

参数 prometheus_sd_method (consul|static) 控制这一行为:

  • consul:基于Consul进行服务发现,默认配置
  • static:基于本地配置文件进行服务发现

Pigsty建议使用consul服务发现,当服务器发生Failover时,监控系统会自动更正目标实例所注册的身份。

Consul服务发现

Pigsty默认采用 Consul服务发现的方式管理环境中的服务。

Pigsty内置了基于DCS的配置管理与自动服务发现,用户可以直观地察看系统中的所有节点与服务信息,以及健康状态。Pigsty中的所有服务都会自动注册至DCS中,因此创建、销毁、修改数据库集群时,元数据会自动修正,监控系统能够自动发现监控目标,无需手动维护配置。

用户亦可通过Consul提供的DNS与服务发现机制,实现基于DNS的自动流量切换。

Consul采用了Client/Server架构,整个环境中存在1~5个不等的Consul Server,用于实际的元数据存储。所有节点上都部署有Consul Agent,代理本机服务与Consul Server的通信。Pigsty默认通过本地Consul配置文件的方式注册服务。

服务注册

在每个节点上,都运行有 consul agent。服务通过JSON配置文件的方式,由consul agent注册至DCS中。

JSON配置文件的默认位置是/etc/consul.d/,采用svc-<service>.json的命名规则,以postgres为例:

{
  "service": {
    "name": "postgres",
    "port": {{ pg_port }},
    "tags": [
      "{{ pg_role }}",
      "{{ pg_cluster }}"
    ],
    "meta": {
      "type": "postgres",
      "role": "{{ pg_role }}",
      "seq": "{{ pg_seq }}",
      "instance": "{{ pg_instance }}",
      "service": "{{ pg_service }}",
      "cluster": "{{ pg_cluster }}",
      "version": "{{ pg_version }}"
    },
    "check": {
      "tcp": "127.0.0.1:{{ pg_port }}",
      "interval": "15s",
      "timeout": "1s"
    }
  }
}

其中metatags部分是服务的元数据,存储有实例的身份信息

服务查询

用户可以通过Consul提供的DNS服务,或者直接调用Consul API发现注册到Consul中的服务

使用DNS API查阅consul服务的方式,请参阅Consul文档

图:查询pg-bench-1上的 pg_exporter 服务。

服务发现

Prometheus会自动通过consul_sd_configs发现环境中的监控对象。同时带有pgexporter标签的服务会自动被识别为抓取对象:

- job_name: pg
  # https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
  consul_sd_configs:
    - server: localhost:8500
      refresh_interval: 5s
      tags:
        - pg
        - exporter

图:被Prometheus发现的服务,身份信息已关联至实例的指标维度上。

服务维护

有时候,因为数据库主从发生切换,导致注册的角色与数据库实例的实际角色出现偏差。这时候需要通过反熵过程处理这种异常。

基于Patroni的故障切换可以正常地通过回调逻辑修正注册的角色,但人工完成的角色切换则需要人工介入处理。

使用以下脚本可以自动检测并修复数据库的服务注册。建议在数据库实例上配置Crontab,或在元节点上设置定期巡检任务。

/pg/bin/pg-register $(pg-role)

静态文件服务发现

static服务发现依赖/etc/prometheus/targets/*.yml中的配置进行服务发现。采用这种方式的优势是不依赖Consul。

当Pigsty监控系统与外部管控方案集成时,这种模式对原系统的侵入性较小。但是缺点是,当集群内发生主从切换时,用户需要自行维护实例角色信息。手动维护时,可以根据以下命令从配置文件生成Prometheus所需的监控对象配置文件并载入生效。

详见 Prometheus服务发现

./infra.yml --tags=prometheus_targtes,prometheus_reload

Pigsty默认生成的静态监控对象文件示例如下:

#==============================================================#
# File      :   targets/all.yml
# Ctime     :   2021-02-18
# Mtime     :   2021-02-18
# Desc      :   Prometheus Static Monitoring Targets Definition
# Path      :   /etc/prometheus/targets/all.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

#======> pg-meta-1 [primary]
- labels: {cls: pg-meta, ins: pg-meta-1, ip: 10.10.10.10, role: primary, svc: pg-meta-primary}
  targets: [10.10.10.10:9630, 10.10.10.10:9100, 10.10.10.10:9631, 10.10.10.10:9101]

#======> pg-test-1 [primary]
- labels: {cls: pg-test, ins: pg-test-1, ip: 10.10.10.11, role: primary, svc: pg-test-primary}
  targets: [10.10.10.11:9630, 10.10.10.11:9100, 10.10.10.11:9631, 10.10.10.11:9101]

#======> pg-test-2 [replica]
- labels: {cls: pg-test, ins: pg-test-2, ip: 10.10.10.12, role: replica, svc: pg-test-replica}
  targets: [10.10.10.12:9630, 10.10.10.12:9100, 10.10.10.12:9631, 10.10.10.12:9101]

#======> pg-test-3 [replica]
- labels: {cls: pg-test, ins: pg-test-3, ip: 10.10.10.13, role: replica, svc: pg-test-replica}
  targets: [10.10.10.13:9630, 10.10.10.13:9100, 10.10.10.13:9631, 10.10.10.13:9101]

身份关联

无论是通过Consul服务发现,还是静态文件服务发现。最终的效果是实现身份信息实例监控指标相互关联。

这一关联,是通过 监控指标维度标签实现的。

身份参数 维度标签 取值样例
pg_cluster cls pg-test
pg_instance ins pg-test-1
pg_services svc pg-test-primary
pg_role role primary
node_ip ip 10.10.10.11

阅读下一节 监控指标 ,了解这些指标是如何通过标签组织起来的。

3.3.4 - 监控指标

监控指标的形式,模型,数量,层次,衍生规则,

指标(Metric) 是Pigsty监控系统的核心概念。

指标形式

指标在形式上是可累加的,原子性的逻辑计量单元,可在时间段上进行更新与统计汇总。

指标通常以 带有维度标签的时间序列 的形式存在。举个例子,Pigsty沙箱中的pg:ins:qps_realtime指展示了所有实例的实时QPS

pg:ins:qps_realtime{cls="pg-meta", ins="pg-meta-1", ip="10.10.10.10", role="primary"} 0
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-1", ip="10.10.10.11", role="primary"} 327.6
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-2", ip="10.10.10.12", role="replica"} 517.0
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-3", ip="10.10.10.13", role="replica"} 0

用户可以对指标进行运算:求和、求导,聚合,等等。例如:

$ sum(pg:ins:qps_realtime) by (cls)        -- 查询按集群聚合的 实时实例QPS
{cls="pg-meta"} 0
{cls="pg-test"} 844.6

$ avg(pg:ins:qps_realtime) by (cls)        -- 查询每个集群中 所有实例的平均 实时实例QPS
{cls="pg-meta"} 0
{cls="pg-test"} 280

$ avg_over_time(pg:ins:qps_realtime[30m])  -- 过去30分钟内实例的平均QPS
pg:ins:qps_realtime{cls="pg-meta", ins="pg-meta-1", ip="10.10.10.10", role="primary"} 0
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-1", ip="10.10.10.11", role="primary"} 130
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-2", ip="10.10.10.12", role="replica"} 100
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-3", ip="10.10.10.13", role="replica"} 0

指标模型

每一个指标(Metric),都是一数据,通常会对应多个时间序列(time series)。同一个指标对应的不同时间序列通过维度进行区分。

指标 + 维度,可以具体定位一个时间序列。每一个时间序列都是由 (时间戳,取值)二元组构成的数组。

Pigsty采用Prometheus的指标模型,其逻辑概念可以用以下的SQL DDL表示。

-- 指标表,指标与时间序列构成1:n关系
CREATE TABLE metrics (
    id   INT PRIMARY KEY,         -- 指标标识
    name TEXT UNIQUE              -- 指标名称,[...其他指标元数据,例如类型]
);

-- 时间序列表,每个时间序列都对应一个指标。
CREATE TABLE series (
    id        BIGINT PRIMARY KEY,               -- 时间序列标识 
    metric_id INTEGER REFERENCES metrics (id),  -- 时间序列所属的指标
    dimension JSONB DEFAULT '{}'                -- 时间序列带有的维度信息,采用键值对的形式表示
);

-- 时许数据表,保存最终的采样数据点。每个采样点都属于一个时间序列
CREATE TABLE series_data (
    series_id BIGINT REFERENCES series(id),     -- 时间序列标识
    ts        TIMESTAMP,                        -- 采样点时间戳
    value     FLOAT,                            -- 采样点指标值
    PRIMARY KEY (series_id, ts)                 -- 每个采样点可以通过 所属时间序列 与 时间戳 唯一标识
);

这里我们以pg:ins:qps指标为例:

-- 样例指标数据
INSERT INTO metrics VALUES(1, 'pg:ins:qps');  -- 该指标名为 pg:ins:qps ,是一个 GAUGE。
INSERT INTO series VALUES                     -- 该指标包含有四个时间序列,通过维度标签区分
(1001, 1, '{"cls": "pg-meta", "ins": "pg-meta-1", "role": "primary", "other": "..."}'),
(1002, 1, '{"cls": "pg-test", "ins": "pg-test-1", "role": "primary", "other": "..."}'),
(1003, 1, '{"cls": "pg-test", "ins": "pg-test-2", "role": "replica", "other": "..."}'),
(1004, 1, '{"cls": "pg-test", "ins": "pg-test-3", "role": "replica", "other": "..."}');
INSERT INTO series_data VALUES                 -- 每个时间序列底层的采样点
(1001, now(), 1000),                           -- 实例 pg-meta-1 在当前时刻QPS为1000
(1002, now(), 1000),                           -- 实例 pg-test-1 在当前时刻QPS为1000
(1003, now(), 5000),                           -- 实例 pg-test-2 在当前时刻QPS为1000
(1004, now(), 5001);                           -- 实例 pg-test-3 在当前时刻QPS为5001
  • pg_up 是一个指标,包含有4个时间序列。记录了整个环境中所有实例的存活状态。
  • pg_up{ins": "pg-test-1", ...}是一个时间序列,记录了特定实例pg-test-1 的存活状态

指标来源

Pigsty的监控数据主要有四种主要来源: 数据库连接池操作系统负载均衡器。通过相应的exporter对外暴露。

完整来源包括:

  • PostgreSQL本身的监控指标
  • PostgreSQL日志中的统计指标
  • PostgreSQL系统目录信息
  • Pgbouncer连接池中间价的指标
  • PgExporter指标
  • 数据库工作节点Node的指标
  • 负载均衡器Haproxy指标
  • DCS(Consul)工作指标
  • 监控系统自身工作指标:Grafana,Prometheus,Nginx
  • Blackbox探活指标

关于全部可用的指标清单,请查阅 参考-指标清单 一节

指标数量

那么,Pigsty总共包含了多少指标呢? 这里是一副各个指标来源占比的饼图。我们可以看到,右侧蓝绿黄对应的部分是数据库及数据库相关组件所暴露的指标,而左下方红橙色部分则对应着机器节点相关指标。左上方紫色部分则是负载均衡器的相关指标。

数据库指标中,与postgres本身有关的原始指标约230个,与中间件有关的原始指标约50个,基于这些原始指标,Pigsty又通过层次聚合与预计算,精心设计出约350个与DB相关的衍生指标。

因此,对于每个数据库集群来说,单纯针对数据库及其附件的监控指标就有621个。而机器原始指标281个,衍生指标83个一共364个。加上负载均衡器的170个指标,我们总共有接近1200类指标。

注意,这里我们必须辨析一下指标(metric)与时间序列( Time-series)的区别。 这里我们使用的量词是 类 而不是个 。 因为一个指标可能对应多个时间序列。例如一个数据库中有20张表,那么 pg_table_index_scan 这样的指标就会对应有20个对应的时间序列。

截止至2021年,Pigsty的指标覆盖率在所有作者已知的开源/商业监控系统中一骑绝尘,详情请参考横向对比

指标层次

Pigsty还会基于现有指标进行加工处理,产出 衍生指标(Derived Metrics)

例如指标可以按照不同的层次进行聚合

从原始监控时间序列数据,到最终的成品图表,中间还有着若干道加工工序。

这里以TPS指标的衍生流程为例。

原始数据是从Pgbouncer抓取得到的事务计数器,集群中有四个实例,而每个实例上又有两个数据库,所以一个实例总共有8个DB层次的TPS指标。

而下面的图表,则是整个集群内每个实例的QPS横向对比,因此在这里,我们使用预定义的规则,首先对原始事务计数器求导获取8个DB层面的TPS指标,然后将8个DB层次的时间序列聚合为4个实例层次的TPS指标,最后再将这四个实例级别的TPS指标聚合为集群层次的TPS指标。

Pigsty共定义了360类衍生聚合指标,后续还会不断增加。衍生指标定义规则详见 参考-衍生指标

特殊指标

目录(Catalog) 是一种特殊的指标

Catalog与Metrics比较相似但又不完全相同,边界比较模糊。最简单的例子,一个表的页面数量和元组数量,应该算Catalog还是算Metrics?

跳过这种概念游戏,实践上Catalog和Metrics主要的区别是,Catalog里的信息通常是不怎么变化的,比如表的定义之类的,如果也像Metrics这样比如几秒抓一次,显然是一种浪费。所以我们会将这一类偏静态的信息划归Catalog。

Catalog主要由定时任务(例如巡检)负责抓取,而不由Prometheus采集。一些特别重要的Catalog信息,例如pg_class中的一些信息,也会转换为指标被Prometheus所采集。

小结

了解了Pigsty指标后,不妨了解一下Pigsty的 报警系统 是如何将这些指标数据用于实际生产用途的。

3.3.5 - 报警规则

介绍Pigsty附带的数据库报警规则,以及如何定制报警规则

报警对于日常故障响应,提高系统可用性至关重要。

漏报会导致可用性降低,误报会导致敏感性下降,有必要对报警规则进行审慎的设计。

  • 合理定义报警级别,以及相应的处理流程
  • 合理定义报警指标,去除重复报警项,补充缺失报警项
  • 根据历史监控数据科学配置报警阈值,减少误报率。
  • 合理疏理特例规则,消除维护工作,ETL,离线查询导致的误报。

报警分类学

按紧急程度分类

  • P0:FATAL:产生重大场外影响的事故,需要紧急介入处理。例如主库宕机,复制中断。(严重事故)

  • P1:ERROR:场外影响轻微,或有冗余处理的事故,需要在分钟级别内进行响应处理。(事故)

  • P2:WARNING:即将产生影响,放任可能在小时级别内恶化,需在小时级别进行响应。(关注事件)

  • P3:NOTICE:需要关注,不会有即时的影响,但需要在天级别内进行响应。(偏差现象)

按报警层次分类

  • 系统级:操作系统,硬件资源的报警。DBA只会特别关注CPU与磁盘报警,其他由运维负责。
  • 数据库级:数据库本身的报警,DBA重点关注。由PG,PGB,Exporter本身的监控指标产生。
  • 应用级:应用报警由业务方自己负责,但DBA会为QPS,TPS,Rollback,Seasonality等业务指标设置报警

按指标类型分类

  • 错误:PG Down, PGB Down, Exporter Down, 流复制中断,单集簇多主
  • 流量:QPS,TPS,Rollback,Seasonaility
  • 延迟: 平均响应时间,复制延迟
  • 饱和度:连接堆积,闲事务数,CPU,磁盘,年龄(事务号),缓冲区

报警可视化

Pigsty使用条状图呈现报警信息。横轴代表时间段,一段色条代表报警事件。只有处于 激发(Firing) 状态的报警才会显示在报警图表中。

报警规则详解

报警规则按类型可粗略分为四类:错误,延迟,饱和度,流量。其中:

  • 错误:主要关注各个组件的存活性(Aliveness),以及网络中断,脑裂等异常情况,级别通常较高(P0|P1)。
  • 延迟:主要关注查询响应时间,复制延迟,慢查询,长事务。
  • 饱和度:主要关注CPU,磁盘(这两个属于系统监控但对于DB非常重要所以纳入),连接池排队,数据库后端连接数,年龄(本质是可用事物号的饱和度),SSD寿命等。
  • 流量:QPS,TPS,Rollback(流量通常与业务指标有关属于业务监控范畴,但因为对于DB很重要所以纳入),QPS的季节性,TPS的突增。

错误报警

Postgres实例宕机区分主从,主库宕机触发P0报警,从库宕机触发P1报警。两者都需要立即介入,但从库通常有多个实例,且可以降级到主库上查询,有着更高的处理余量,所以从库宕机定为P1。

# primary|master instance down for 1m triggers a P0 alert
- alert: PG_PRIMARY_DOWN
  expr: pg_up{instance=~'.*master.*'}
  for: 1m
  labels:
    team: DBA
    urgency: P0
  annotations:
    summary: "P0 Postgres Primary Instance Down: {{$labels.instance}}"
    description: "pg_up = {{ $value }} {{$labels.instance}}"

# standby|slave instance down for 1m triggers a P1 alert
- alert: PG_STANDBY_DOWN
  expr: pg_up{instance!~'.*master.*'}
  for: 1m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P1 Postgres Standby Instance Down: {{$labels.instance}}"
    description: "pg_up = {{ $value }} {{$labels.instance}}"

Pgbouncer实例因为与Postgres实例一一对应,其存活性报警规则与Postgres统一。

# primary pgbouncer down for 1m triggers a P0 alert
- alert: PGB_PRIMARY_DOWN
  expr: pgbouncer_up{instance=~'.*master.*'}
  for: 1m
  labels:
    team: DBA
    urgency: P0
  annotations:
    summary: "P0 Pgbouncer Primary Instance Down: {{$labels.instance}}"
    description: "pgbouncer_up = {{ $value }} {{$labels.instance}}"

# standby pgbouncer down for 1m triggers a P1 alert
- alert: PGB_STANDBY_DOWN
  expr: pgbouncer_up{instance!~'.*master.*'}
  for: 1m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P1 Pgbouncer Standby Instance Down: {{$labels.instance}}"
    description: "pgbouncer_up = {{ $value }} {{$labels.instance}}"

Prometheus Exporter的存活性定级为P1,虽然Exporter宕机本身并不影响数据库服务,但这通常预示着一些不好的情况,而且监控数据的缺失也会产生某些相应的报警。Exporter的存活性是通过Prometheus自己的up指标检测的,需要注意某些单实例多DB的特例。

# exporter down for 1m triggers a P1 alert
- alert: PG_EXPORTER_DOWN
  expr: up{port=~"(9185|9127)"} == 0
  for: 1m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P1 Exporter Down: {{$labels.instance}} {{$labels.port}}"
    description: "port = {{$labels.port}}, {{$labels.instance}}"

所有存活性检测的持续时间阈值设定为1分钟,对15s的默认采集周期而言是四个样本点。常规的重启操作通常不会触发存活性报警。

延迟报警

与复制延迟有关的报警有三个:复制中断,复制延迟高,复制延迟异常,分别定级为P1, P2, P3

  • 其中复制中断是一种错误,使用指标:pg_repl_state_count{state="streaming"}进行判断,当前streaming状态的从库如果数量发生负向变动,则触发break报警。walsender会决定复制的状态,从库直接断开会产生此现象,缓冲区出现积压时会从streaming进入catchup状态也会触发此报警。此外,采用-Xs手工制作备份结束时也会产生此报警,此报警会在10分钟后自动Resolve。复制中断会导致客户端读到陈旧的数据,具有一定的场外影响,定级为P1。

  • 复制延迟可以使用延迟时间或者延迟字节数判定。以延迟字节数为权威指标。常规状态下,复制延迟时间在百毫秒量级,复制延迟字节在百KB量级均属于正常。目前采用的是5s,15s的时间报警阈值。根据历史经验数据,这里采用了时间8秒与字节32MB的阈值,大致报警频率为每天个位数个。延迟时间更符合直觉,所以采用8s的P2报警,但并不是所有的从库都能有效取到该指标所以使用32MB的字节阈值触发P3报警补漏。

  • 特例:antispam,stats,coredb均经常出现复制延迟。

      # replication break for 1m triggers a P0 alert. auto-resolved after 10 minutes.
      - alert: PG_REPLICATION_BREAK
        expr: pg_repl_state_count{state="streaming"} - (pg_repl_state_count{state="streaming"} OFFSET 10m) < 0
        for: 1m
        labels:
          team: DBA
          urgency: P0
        annotations:
          summary: "P0 Postgres Streaming Replication Break: {{$labels.instance}}"
          description: "delta = {{ $value }} {{$labels.instance}}"

      # replication lag greater than 8 second for 3m triggers a P1 alert
      - alert: PG_REPLICATION_LAG
        expr: pg_repl_replay_lag{application_name="walreceiver"} > 8
        for: 3m
        labels:
          team: DBA
          urgency: P1
        annotations:
          summary: "P1 Postgres Replication Lagged: {{$labels.instance}}"
          description: "lag = {{ $value }} seconds, {{$labels.instance}}"

      # replication diff greater than 32MB for 5m triggers a P3 alert
      - alert: PG_REPLICATOIN_DIFF
        expr: pg_repl_lsn{application_name="walreceiver"} - pg_repl_replay_lsn{application_name="walreceiver"} > 33554432
        for: 5m
        labels:
          team: DBA
          urgency: P3
        annotations:
          summary: "P3 Postgres Replication Diff Deviant: {{$labels.instance}}"
          description: "delta = {{ $value }} {{$labels.instance}}"

饱和度报警

饱和度指标主要资源,包含很多系统级监控的指标。主要包括:CPU,磁盘(这两个属于系统监控但对于DB非常重要所以纳入),连接池排队,数据库后端连接数,年龄(本质是可用事物号的饱和度),SSD寿命等。

堆积检测

堆积主要包含两类指标,一方面是PG本身的后端连接数与活跃连接数,另一方面是连接池的排队情况。

PGB排队是决定性的指标,它代表用户端可感知的阻塞已经出现,因此,配置排队超过15持续1分钟触发P0报警。

# more than 8 client waiting in queue for 1 min triggers a P0 alert
- alert: PGB_QUEUING
  expr: sum(pgbouncer_pool_waiting_clients{datname!="pgbouncer"}) by (instance,datname) > 8
  for: 1m
  labels:
    team: DBA
    urgency: P0
  annotations:
    summary: "P0 Pgbouncer {{ $value }} Clients Wait in Queue: {{$labels.instance}}"
    description: "waiting clients = {{ $value }} {{$labels.instance}}"

后端连接数是一个重要的报警指标,如果后端连接持续达到最大连接数,往往也意味着雪崩。连接池的排队连接数也能反映这种情况,但不能覆盖应用直连数据库的情况。后端连接数的主要问题是它与连接池关系密切,连接池在短暂堵塞后会迅速打满后端连接,但堵塞恢复后这些连接必须在默认约10min的Timeout后才被释放。因此收到短暂堆积的影响较大。同时外晚上1点备份时也会出现这种情况,容易产生误报。

注意后端连接数与后端活跃连接数不同,目前报警使用的是活跃连接数。后端活跃连接数通常在0~1,一些慢库在十几左右,离线库可能会达到20~30。但后端连接/进程数(不管活跃不活跃),通常均值可达50。后端连接数更为直观准确。

对于后端连接数,这里使用两个等级的报警:超过90持续3分钟P1,以及超过80持续10分钟P2,考虑到通常数据库最大连接数为100。这样做可以以尽可能低的误报率检测到雪崩堆积。

# num of backend exceed 90 for 3m
- alert: PG_BACKEND_HIGH
  expr: sum(pg_db_numbackends) by (node) > 90
  for: 3m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P1 Postgres Backend Number High: {{$labels.instance}}"
    description: "numbackend = {{ $value }} {{$labels.instance}}"

# num of backend exceed 80 for 10m (avoid pgbouncer jam false alert)
- alert: PG_BACKEND_WARN
  expr: sum(pg_db_numbackends) by (node) > 80
  for: 10m
  labels:
    team: DBA
    urgency: P2
  annotations:
    summary: "P2 Postgres Backend Number Warn: {{$labels.instance}}"
    description: "numbackend = {{ $value }} {{$labels.instance}}"

空闲事务

目前监控使用IDEL In Xact的绝对数量作为报警条件,其实 Idle In Xact的最长持续时间可能会更有意义。因为这种现象其实已经被后端连接数覆盖了。长时间的空闲是我们真正关注的,因此这里使用所有空闲事务中最高的闲置时长作为报警指标。设置3分钟为P2报警阈值。经常出现IDLE的非Offline库有:moderation, location, stats,sms, device, moderationdevice

# max idle xact duration exceed 3m
- alert: PG_IDLE_XACT
  expr: pg_activity_max_duration{instance!~".*offline.*", state=~"^idle in transaction.*"} > 180
  for: 3m
  labels:
    team: DBA
    urgency: P2
  annotations:
    summary: "P2 Postgres Long Idle Transaction: {{$labels.instance}}"
    description: "duration = {{ $value }} {{$labels.instance}}"

资源报警

CPU, 磁盘,AGE

默认清理年龄为2亿,超过10Y报P1,既留下了充分的余量,又不至于让人忽视。

# age wrap around (progress in half 10Y) triggers a P1 alert
- alert: PG_XID_WRAP
  expr: pg_database_age{} > 1000000000
  for: 3m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P1 Postgres XID Wrap Around: {{$labels.instance}}"
    description: "age = {{ $value }} {{$labels.instance}}"

磁盘和CPU由运维配置,不变

流量

因为各个业务的负载情况不一,为流量指标设置绝对值是相对困难的。这里只对TPS和Rollback设置绝对值指标。而且较为宽松。

Rollback OPS超过4则发出P3警告,TPS超过24000发P2,超过30000发P1

# more than 30k TPS lasts for 1m triggers a P1 (pgbouncer bottleneck)
- alert: PG_TPS_HIGH
  expr: rate(pg_db_xact_total{}[1m]) > 30000
  for: 1m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P1 Postgres TPS High: {{$labels.instance}} {{$labels.datname}}"
    description: "TPS = {{ $value }} {{$labels.instance}}"

# more than 24k TPS lasts for 3m triggers a P2
- alert: PG_TPS_WARN
  expr: rate(pg_db_xact_total{}[1m]) > 24000
  for: 3m
  labels:
    team: DBA
    urgency: P2
  annotations:
    summary: "P2 Postgres TPS Warning: {{$labels.instance}} {{$labels.datname}}"
    description: "TPS = {{ $value }} {{$labels.instance}}"

# more than 4 rollback per seconds lasts for 5m
- alert: PG_ROLLBACK_WARN
  expr: rate(pg_db_xact_rollback{}[1m]) > 4
  for: 5m
  labels:
    team: DBA
    urgency: P2
  annotations:
    summary: "P2 Postgres Rollback Warning: {{$labels.instance}}"
    description: "rollback per sec = {{ $value }} {{$labels.instance}}"

QPS的指标与业务高度相关,因此不适合配置绝对值,可以为QPS突增配置一个报警项

短时间(和10分钟)前比突增30%会触发一个P2警报,同时避免小QPS下的突发流量,设置一个绝对阈值10k

# QPS > 10000 and have a 30% inc for 3m triggers P2 alert
- alert: PG_QPS_BURST
  expr: sum by(datname,instance)(rate(pgbouncer_stat_total_query_count{datname!="pgbouncer"}[1m]))/sum by(datname,instance) (rate(pgbouncer_stat_total_query_count{datname!="pgbouncer"}[1m] offset 10m)) > 1.3 and sum by(datname,instance) (rate(pgbouncer_stat_total_query_count{datname!="pgbouncer"}[1m])) > 10000
  for: 3m
  labels:
    team: DBA
    urgency: P1
  annotations:
    summary: "P2 Pgbouncer QPS Burst 30% and exceed 10000: {{$labels.instance}}"
    description: "qps = {{ $value }} {{$labels.instance}}"

Prometheus报警规则

完整的报警规则详见:参考-报警规则

3.4 - 供给方案

Pigsty供给方案的相关概念

所谓供给方案(Provisioning Solution),指的是一套向用户交付数据库服务与监控系统的系统。

供给方案不是数据库,而是数据库工厂

用户向供给系统提交一份配置,供给系统便会按照用户所需的规格在环境中创建出所需的数据库集群来

这比较类似于向Kubernetes提交YAML文件,创建所需的各类资源。

定义数据库集群

例如,以下配置信息声明了一套名为pg-test的PostgreSQL数据库集群。

#-----------------------------
# cluster: pg-test
#-----------------------------
pg-test: # define cluster named 'pg-test'
  # - cluster members - #
  hosts:
    10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
    10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
    10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}

  # - cluster configs - #
  vars:
    # basic settings
    pg_cluster: pg-test                 # define actual cluster name
    pg_version: 13                      # define installed pgsql version
    node_tune: tiny                     # tune node into oltp|olap|crit|tiny mode
    pg_conf: tiny.yml                   # tune pgsql into oltp/olap/crit/tiny mode

    # business users, adjust on your own needs
    pg_users:
      - name: test                      # example production user have read-write access
        password: test                  # example user's password
        roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly|dbrole_offline
        pgbouncer: true                 # production user that access via pgbouncer
        comment: default test user for production usage

    pg_databases:                       # create a business database 'test'
      - name: test                      # use the simplest form

    pg_default_database: test           # default database will be used as primary monitor target

    # proxy settings
    vip_mode: l2                        # enable/disable vip (require members in same LAN)
    vip_address: 10.10.10.3             # virtual ip address
    vip_cidrmask: 8                     # cidr network mask length
    vip_interface: eth1                 # interface to add virtual ip

当执行 数据库供给 脚本 ./pgsql.yml 时,供给系统会根据清单中的定义,在10.10.10.1110.10.10.1210.10.10.13这三台机器上生成一主两从的PostgreSQL集群pg-test。并创建名为test的用户与数据库。同时,Pigsty还会根据要求,声明一个10.10.10.3的VIP绑定在集群的主库上面。结构如下图所示。

定义基础设施

用户能够定义的不仅仅是数据库集群,还包括了整个基础设施。

Pigsty通过154个变量实现了对数据库运行时环境的完整表述。

详细的可配置项,请参考 配置指南

供给方案的职责

供给方案通常只负责集群的创建。一旦集群创建完毕,日常的管理应当由管控平台负责。

尽管如此,Pigsty目前不包含管控平台部分,因此也提供了简单的资源回收销毁脚本,并亦可用于资源的更新与管理。但须知此并非供给方案的本职工作。

3.4.1 - 数据库接入

如何接入Pigsty所创建的数据库?

Pigsty提供了丰富的接入方式,用户可以根据自己的基础设施情况与喜好自行选择接入模式。

数据库访问方式

用户可以通过多种方式访问数据库服务。

在集群层次,用户可以通过集群域名+服务端口的方式访问集群提供的 四种默认服务,Pigsty强烈建议使用这种方式。当然用户也可以绕开域名,直接使用集群的VIP(L2 or L4)访问数据库集群。

在实例层次,用户可以通过节点IP/域名 + 5432端口直连Postgres数据库,也可以用6432端口经由Pgbouncer访问数据库。还可以通过Haproxy经由5433~543x访问实例所属集群提供的服务。

如何访问数据库,最终取决于数据库所使用的流量接入方案

典型接入方案

Pigsty推荐使用基于Haproxy的接入方案(1/2),在生产环境中如果有基础设施支持,也可以使用基于L4VIP(或与之等效的负载均衡服务)的接入方案(3)。

序号 方案 说明
1 DNS + Haproxy 标准高可用接入方案,系统无单点。
2 L2VIP + Haproxy Pigsty沙箱使用的标准接入架构,使用L2 VIP确保Haproxy高可用
3 L4VIP + Haproxy 方案2的变体,使用L4 VIP确保Haprxoy高可用。
4 L4 VIP 大规模高性能生产环境建议使用DPVS L4 VIP直接接入
5 Consul DNS 使用Consul DNS进行服务发现,绕开VIP与Haproxy
6 Static DNS 传统静态DNS接入方式
7 IP 采用智能客户端接入

DNS + Haproxy

方案简介

标准高可用接入方案,系统无单点。灵活性,适用性,性能的最佳平衡点。

集群中的Haproxy采用Node Port的方式统一对外暴露 服务。每个Haproxy都是幂等的实例,提供完整的负载均衡与服务分发功能。Haproxy部署于每一个数据库节点上,因此整个集群的每一个成员在使用效果上都是幂等的。(例如访问任何一个成员的5433端口都会连接至主库连接池,访问任意成员的5434端口都会连接至某个从库的连接池)

Haproxy本身的可用性通过幂等副本实现,每一个Haproxy都可以作为访问入口,用户可以使用一个、两个、多个,所有Haproxy实例,每一个Haproxy提供的功能都是完全相同的。

用户需要自行确保应用能够访问到任意一个健康的Haproxy实例。作为最朴素的一种实现,用户可以将数据库集群的DNS域名解析至若干Haproxy实例,并启用DNS轮询响应。而客户端可以选择完全不缓存DNS,或者使用长连接并实现建立连接失败后重试的机制。又或者参考方案2,在架构侧通过额外的L2/L4 VIP确保Haproxy本身的高可用。

方案优越性

  • 无单点,高可用

  • VIP固定绑定至主库,可以灵活访问

方案局限性

  • 多一跳

  • Client IP地址丢失,部分HBA策略无法正常生效

  • Haproxy本身的高可用通过幂等副本,DNS轮询与客户端重连实现

    DNS应有轮询机制,客户端应当使用长连接,并有建连失败重试机制。以便单Haproxy故障时可以自动漂移至集群中的其他Haproxy实例。如果无法做到这一点,可以考虑使用接入方案2,使用L2/L4 VIP确保Haproxy高可用。

方案示意

L2 VIP + Haproxy

方案简介

Pigsty沙箱使用的标准接入方案,采用单个域名绑定至单个L2 VIP,VIP指向集群中的HAProxy。

集群中的Haproxy采用Node Port的方式统一对外暴露 服务。每个Haproxy都是幂等的实例,提供完整的负载均衡与服务分发功能。而Haproxy本身的可用性则通过L2 VIP来保证

每个集群都分配有一个L2 VIP,固定绑定至集群主库。当主库发生切换时,该L2 VIP也会随之漂移至新的主库上。这是通过vip-manager实现的:vip-manager会查询Consul获取集群当前主库信息,然后在主库上监听VIP地址。

集群的L2 VIP有与之对应的域名。域名固定解析至该L2 VIP,在生命周期中不发生变化。

方案优越性

  • 无单点,高可用

  • VIP固定绑定至主库,可以灵活访问

方案局限性

  • 多一跳

  • Client IP地址丢失,部分HBA策略无法正常生效

  • 所有候选主库必须位于同一二层网络

    作为另一种备选变体,用户也可以通过使用L4 VIP绕开此限制,但相比L2 VIP会额外多一跳。

方案示意

L4 VIP + Haproxy

方案简介

接入方案1/2的另一种变体,通过L4 VIP确保Haproxy的高可用

方案优越性

  • 无单点,高可用
  • 可以同时使用所有的Haproxy实例,均匀承载流量。
  • 所有候选主库不需要位于同一二层网络。
  • 可以操作单一VIP完成流量切换(如果同时使用了多个Haproxy,不需要逐个调整)

方案局限性

  • 多两跳,较为浪费,如果有条件可以直接使用方案4: L4 VIP直接接入。
  • Client IP地址丢失,部分HBA策略无法正常生效

方案示意

L4 VIP

方案简介

大规模高性能生产环境建议使用 L4 VIP接入(FullNAT,DPVS)

方案优越性

  • 性能好,吞吐量大
  • 可以通过toa模块获取正确的客户端IP地址,HBA可以完整生效。

方案局限性

  • 仍然多一条。
  • 需要依赖外部基础设施,部署复杂。
  • 未启用toa内核模块时,仍然会丢失客户端IP地址。
  • 没有Haproxy屏蔽主从差异,集群中的每个节点不再“幂等”。

方案示意

Consul DNS

方案简介

L2 VIP并非总是可用,特别是所有候选主库必须位于同一二层网络的要求可能不一定能满足。

在这种情况下,可以使用DNS解析代替L2 VIP,进行

方案优越性

  • 少一跳

方案局限性

  • 依赖Consul DNS
  • 用户需要合理配置DNS缓存策略

方案示意

Static DNS

方案简介

传统静态DNS接入方式

方案优越性

  • 少一跳
  • 实施简单

方案局限性

  • 没有灵活性
  • 主从切换时容易导致流量损失

方案示意

IP

方案简介

采用智能客户端直连数据库IP接入

方案优越性

  • 直连数据库/连接池,少一条
  • 不依赖额外组件进行主从区分,降低系统复杂性。

方案局限性

  • 灵活性太差,集群扩缩容繁琐。

方案示意

3.4.2 - 数据库服务

如何在Pigsty中定义新的服务

服务(Service),是数据库集群对外提供功能的形式。通常来说,一个数据库集群至少应当提供两种服务

  • 读写服务(primary) :用户可以写入数据库
  • 只读服务(replica) :用户可以访问只读副本

此外,根据具体的业务场景,可能还会有其他的服务:

  • 离线从库服务(offline):不承接线上只读流量的专用从库,用于ETL与个人用户查询。
  • 同步从库服务(standby) :采用同步提交,没有复制延迟的只读服务。
  • 延迟从库服务(delayed) : 允许业务访问固定时间间隔之前的旧数据。
  • 默认直连服务(default) : 允许(管理)用户绕过连接池直接管理数据库的服务

默认服务

Pigsty默认对外提供四种服务:primary, replica, default, offline

服务 端口 用途 说明
primary 5433 生产读写 通过连接池连接至集群主库
replica 5434 生产只读 通过连接池连接至集群从库
default 5436 管理 直接连接至集群主库
offline 5438 ETL/个人用户 直接连接至集群可用的离线实例
服务 端口 说明 样例
primary 5433 只有生产用户可以连接 postgres://test@pg-test:5433/test
replica 5434 只有生产用户可以连接 postgres://test@pg-test:5434/test
default 5436 管理员与DML执行者可以连接 postgres://dbuser_admin@pg-test:5436/test
offline 5438 ETL/STATS 个人用户可以连接 postgres://dbuser_stats@pg-test-tt:5438/test
postgres://dbp_vonng@pg-test:5438/test

Primary服务

Primary服务服务于线上生产读写访问,它将集群的5433端口,映射为 主库连接池(默认6432) 端口。

Primary服务选择集群中的所有实例作为其成员,但只有健康检查/primary为真者,才能实际承接流量。

在集群中有且仅有一个实例是主库,只有其健康检查为真。

- name: primary           # service name {{ pg_cluster }}_primary
  src_ip: "*"
  src_port: 5433
  dst_port: pgbouncer     # 5433 route to pgbouncer
  check_url: /primary     # primary health check, success when instance is primary
  selector: "[]"          # select all instance as primary service candidate

Replica服务

Replica服务服务于线上生产只读访问,它将集群的5434端口,映射为 从库连接池(默认6432) 端口。

Replica服务选择集群中的所有实例作为其成员,但只有健康检查/read-only为真者,才能实际承接流量,该健康检查对所有可以承接只读流量的实例(包括主库)返回成功。所以集群中的任何成员都可以承载只读流量。

但默认情况下,只有从库承载只读请求,Replica服务定义了selector_backup,该选择器将集群的主库作为 备份实例 加入到Replica服务中。只要当Replica服务中所有其他实例,即所有从库宕机时,主库才会开始承接只读流量

# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica           # service name {{ pg_cluster }}_replica
  src_ip: "*"
  src_port: 5434
  dst_port: pgbouncer
  check_url: /read-only   # read-only health check. (including primary)
  selector: "[]"          # select all instance as replica service candidate
  selector_backup: "[? pg_role == `primary`]"   # primary are used as backup server in replica service

Default服务

Default服务服务于线上主库直连,它将集群的5436端口,映射为主库Postgres端口(默认5432)。

Default服务针对交互式的读写访问,包括:执行管理命令,执行DDL变更,连接至主库执行DML,执行CDC。交互式的操作不应当通过连接池访问,因此Default服务将流量直接转发至Postgres,绕过了Pgbouncer。

Default服务与Primary服务类似,采用相同的配置选项。出于演示目显式填入了默认参数。

# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
  src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
  src_port: 5436          # bind port, mandatory
  dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
  check_method: http      # health check method: only http is available for now
  check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
  check_url: /primary     # health check url path, / as default
  check_code: 200         # health check http code, 200 as default
  selector: "[]"          # instance selector
  haproxy:                # haproxy specific fields
    maxconn: 3000         # default front-end connection
    balance: roundrobin   # load balance algorithm (roundrobin by default)
    default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

Offline服务

Offline服务用于离线访问与个人查询。它将集群的5438端口,映射为离线实例Postgres端口(默认5432)。

Offline服务针对交互式的只读访问,包括:ETL,离线大型分析查询,个人用户查询。交互式的操作不应当通过连接池访问,因此Default服务将流量直接转发至离线实例的Postgres,绕过了Pgbouncer。

离线实例指的是 pg_role == offline 或带有pg_offline_query标记的实例。离线实例外的其他其他从库将作为Offline的备份实例,这样当Offline实例宕机时,Offline服务仍然可以从其他从库获取服务。

# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline           # service name {{ pg_cluster }}_replica
  src_ip: "*"
  src_port: 5438
  dst_port: postgres
  check_url: /replica     # offline MUST be a replica
  selector: "[? pg_role == `offline` || pg_offline_query ]"         # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
  selector_backup: "[? pg_role == `replica` && !pg_offline_query]"  # replica are used as backup server in offline service

服务定义

由服务定义对象构成的数组,定义了每一个数据库集群中对外暴露的服务。每一个集群都可以定义多个服务,每个服务包含任意数量的集群成员,服务通过端口进行区分。

服务通过 pg_servicespg_services_extra 进行定义。前者用于定义整个环境中通用的服务,后者用于定义集群特定的额外服务。两者都是由服务定义组成的数组,Pigsty默认服务的定义如下所示:

# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary           # service name {{ pg_cluster }}_primary
  src_ip: "*"
  src_port: 5433
  dst_port: pgbouncer     # 5433 route to pgbouncer
  check_url: /primary     # primary health check, success when instance is primary
  selector: "[]"          # select all instance as primary service candidate

# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica           # service name {{ pg_cluster }}_replica
  src_ip: "*"
  src_port: 5434
  dst_port: pgbouncer
  check_url: /read-only   # read-only health check. (including primary)
  selector: "[]"          # select all instance as replica service candidate
  selector_backup: "[? pg_role == `primary`]"   # primary are used as backup server in replica service

# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
  src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
  src_port: 5436          # bind port, mandatory
  dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
  check_method: http      # health check method: only http is available for now
  check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
  check_url: /primary     # health check url path, / as default
  check_code: 200         # health check http code, 200 as default
  selector: "[]"          # instance selector
  haproxy:                # haproxy specific fields
    maxconn: 3000         # default front-end connection
    balance: roundrobin   # load balance algorithm (roundrobin by default)
    default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline           # service name {{ pg_cluster }}_replica
  src_ip: "*"
  src_port: 5438
  dst_port: postgres
  check_url: /replica     # offline MUST be a replica
  selector: "[? pg_role == `offline` || pg_offline_query ]"         # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
  selector_backup: "[? pg_role == `replica` && !pg_offline_query]"  # replica are used as backup server in offline service

必选项目

  • 名称(service.name

    服务名称,服务的完整名称以数据库集群名为前缀,以service.name为后缀,通过-连接。例如在pg-test集群中name=primary的服务,其完整服务名称为pg-test-primary

  • 端口(service.port

    在Pigsty中,服务默认采用NodePort的形式对外暴露,因此暴露端口为必选项。但如果使用外部负载均衡服务接入方案,您也可以通过其他的方式区分服务。

  • 选择器(service.selector

    选择器指定了服务的实例成员,采用JMESPath的形式,从所有集群实例成员中筛选变量。默认的[]选择器会选取所有的集群成员。

可选项目

  • 备份选择器(service.selector

    可选的 备份选择器service.selector_backup会选择或标记用于服务备份的实例列表,即集群中所有其他成员失效时,备份实例才接管服务。例如可以将primary实例加入replica服务的备选集中,当所有从库失效后主库依然可以承载集群的只读流量。

  • 源端IP(service.src_ip

    表示服务对外使用的IP地址,默认为*,即本机所有IP地址。使用vip则会使用vip_address变量取值,或者也可以填入网卡支持的特定IP地址。

  • 宿端口(service.dst_port

    服务的流量将指向目标实例上的哪个端口?postgres 会指向数据库监听的端口,pgbouncer会指向连接池所监听的端口,也可以填入固定的端口号。

  • 健康检查方式(service.check_method:

    服务如何检查实例的健康状态?目前仅支持HTTP

  • 健康检查端口(service.check_port:

    服务检查实例的哪个端口获取实例的健康状态? patroni会从Patroni(默认8008)获取,pg_exporter会从PG Exporter(默认9630)获取,用户也可以填入自定义的端口号。

  • 健康检查路径(service.check_url:

    服务执行HTTP检查时,使用的URL PATH。默认会使用/作为健康检查,PG Exporter与Patroni提供了多样的健康检查方式,可以用于主从流量区分。例如,/primary仅会对主库返回成功,/replica仅会对从库返回成功。/read-only则会对任何支持只读的实例(包括主库)返回成功。

  • 健康检查代码(service.check_code:

    HTTP健康检查所期待的代码,默认为200

  • Haproxy特定配置(service.haproxy

    关于服务供应软件(HAproxy)的专有配置项

3.4.3 - 高可用

介绍可用性的概念,以及Pigsty在高可用上的实践

Pigsty创建的数据库集群是分布式、高可用的数据库集群。

从效果上讲,只要集群中有任意实例存活,集群就可以对外提供完整的读写服务与只读服务

数据库集群中的每个数据库实例在使用上都是幂等的,任意实例都可以通过内建负载均衡组件提供完整的读写服务。

数据库集群可以自动进行故障检测与主从切换,普通故障能在几秒到几十秒内自愈,且期间只读流量不受影响。

高可用

两个核心场景:Switchover,Failover

四个核心问题:故障检测,Fencing,选主,流量切换

关于高可用的核心场景演练,请参考 高可用演练 一节。

基于Patroni的高可用方案

基于 Patroni 的高可用方案部署简单,不需要使用特殊硬件,具有大量实际生产使用案例背书。

Pigsty的高可用方案基于Patroni,vip-manager,haproxy

Patroni基于DCS(etcd/consul/zookeeper)达成选主共识。

Patroni的故障检测采用心跳包保活,DCS租约机制实现。主库持有租约,秦失其鹿,则天下共逐之。

Patroni的Fencing基于Linux内核模块watchdog

Patroni提供了主从健康检查,便于与外部负载均衡器相集成。

基于Haproxy与VIP的接入层方案

Pigsty沙箱默认使用基于L2 VIP与Haproxy的接入层方案。Pigsty提供多种可选的 数据库接入 方式。

Haproxy幂等地部署在集群的每个实例上,任何一个或多个Haproxy实例都可以作为集群的负载均衡器。

Haproxy采用类似Node Port的方式对外暴露服务,默认情况下,5433端口提供集群的读写服务,而5434端口提供集群的只读服务。

Haproxy本身的高可用性可通过以下几种方式达成:

  • 使用智能客户端,利用Consul提供的DNS或服务发现机制连接至数据库。
  • 使用智能客户端,利用Multi-Host特性填入集群中的所有实例。
  • 使用绑定在Haproxy前的VIP(2层或4层)
  • 使用外部负载均衡器保证
  • 使用DNS轮询解析至多个Haproxy,客户端会在建连失败后重新执行DNS解析并重试。

Patroni在故障时的行为表现

场景 位置 Patroni的动作
PG Down replica 尝试重新拉起PG
Patroni Down replica PG随之关闭(维护模式下不变)
Patroni Crash replica PG不会随Patroni一并关闭
DCS Network Partition replica 无事
Promote replica 将PG降为从库并重新挂至主库。
PG Down primary 尝试重启PG
超过master_start_timeout后执行Failover
Patroni Down primary 关闭PG并触发Failover
Patroni Crash primary 触发Failover,可能触发脑裂。
可通过watchdog fencing避免。
DCS Network Partition primary 主库降级为从库,触发Failover
DCS Down DCS 主库降级为从库,集群中没有主库,不可写入。
同步模式下无可用备选 临时切换为异步复制。
恢复为同步复制前不会Failover

合理配置Patroni可以应对绝大多数故障。不过DCS Down这种场景(Consul/Etcd宕机或网络不可达)会导致所有生产数据库集群不可写入,需要特别关注。必须确保DCS的可用性高于数据库的可用性。

Known Issue

请尽量确保服务器的时间同步服务先于Patroni启动。

3.4.4 - 目录结构

介绍Pigsty默认设置的目录结构

以下参数与Pigsty目录结构相关

  • pg_dbsu_home:Postgres默认用户的家目录,默认为/var/lib/pgsql
  • pg_bin_dir:Postgres二进制目录,默认为/usr/pgsql/bin/
  • pg_data:Postgres数据库目录,默认为/pg/data
  • pg_fs_main:Postgres主数据盘挂载点,默认为/export
  • pg_fs_bkup:Postgres备份盘挂载点,默认为/var/backups(可选,也可以选择备份到主数据盘上)

概览

#------------------------------------------------------------------------------
# Create Directory
#------------------------------------------------------------------------------
# this assumes that
#   /pg is shortcut for postgres home
#   {{ pg_fs_main }} contains the main data             (MUST ALREADY MOUNTED)
#   {{ pg_fs_bkup }} contains archive and backup data   (MUST ALREADY MOUNTED)
#   cluster-version is the default parent folder for pgdata (e.g pg-test-12)
#------------------------------------------------------------------------------
# default variable:
#     pg_fs_main = /export           fast ssd
#     pg_fs_bkup = /var/backups      cheap hdd
#
#     /pg      -> /export/postgres/pg-test-12
#     /pg/data -> /export/postgres/pg-test-12/data
#------------------------------------------------------------------------------
- name: Create postgresql directories
  tags: pg_dir
  become: yes
  block:
    - name: Make sure main and backup dir exists
      file: path={{ item }} state=directory owner=root mode=0777
      with_items:
        - "{{ pg_fs_main }}"
        - "{{ pg_fs_bkup }}"

    # pg_cluster_dir:    "{{ pg_fs_main }}/postgres/{{ pg_cluster }}-{{ pg_version }}"
    - name: Create postgres directory structure
      file: path={{ item }} state=directory owner={{ pg_dbsu }} group=postgres mode=0700
      with_items:
        - "{{ pg_fs_main }}/postgres"
        - "{{ pg_cluster_dir }}"
        - "{{ pg_cluster_dir }}/bin"
        - "{{ pg_cluster_dir }}/log"
        - "{{ pg_cluster_dir }}/tmp"
        - "{{ pg_cluster_dir }}/conf"
        - "{{ pg_cluster_dir }}/data"
        - "{{ pg_cluster_dir }}/meta"
        - "{{ pg_cluster_dir }}/stat"
        - "{{ pg_cluster_dir }}/change"
        - "{{ pg_backup_dir }}/postgres"
        - "{{ pg_backup_dir }}/arcwal"
        - "{{ pg_backup_dir }}/backup"
        - "{{ pg_backup_dir }}/remote"

PG二进制目录结构

在RedHat/CentOS上,默认的Postgres发行版安装位置为

/usr/pgsql-${pg_version}/

安装剧本会自动创建指向当前安装版本的软连接,例如,如果安装了13版本的Postgres,则有:

/usr/pgsql -> /usr/pgsql-13

因此,默认的pg_bin_dir/usr/pgsql/bin/,该路径会在/etc/profile.d/pgsql.sh中添加至所有用户的PATH环境变量中。

PG数据目录结构

Pigsty假设用于部署数据库实例的单个节点上至少有一块主数据盘(pg_fs_main),以及一块可选的备份数据盘(pg_fs_bkup)。通常主数据盘是高性能SSD,而备份盘是大容量廉价HDD。

#------------------------------------------------------------------------------
# Create Directory
#------------------------------------------------------------------------------
# this assumes that
#   /pg is shortcut for postgres home
#   {{ pg_fs_main }} contains the main data             (MUST ALREADY MOUNTED)
#   {{ pg_fs_bkup }} contains archive and backup data   (MAYBE ALREADY MOUNTED)
#   {{ pg_cluster }}-{{ pg_version }} is the default parent folder 
#    for pgdata (e.g pg-test-12)
#------------------------------------------------------------------------------
# default variable:
#     pg_fs_main = /export           fast ssd
#     pg_fs_bkup = /var/backups      cheap hdd
#
#     /pg      -> /export/postgres/pg-test-12
#     /pg/data -> /export/postgres/pg-test-12/data

PG数据库集簇目录结构

# basic
{{ pg_fs_main }}     /export                      # contains all business data (pg,consul,etc..)
{{ pg_dir_main }}    /export/postgres             # contains postgres main data
{{ pg_cluster_dir }} /export/postgres/pg-test-13  # contains cluster `pg-test` data (of version 13)
                     /export/postgres/pg-test-13/bin            # binary scripts
                     /export/postgres/pg-test-13/log            # misc logs
                     /export/postgres/pg-test-13/tmp            # tmp, sql files, records
                     /export/postgres/pg-test-13/conf           # configurations
                     /export/postgres/pg-test-13/data           # main data directory
                     /export/postgres/pg-test-13/meta           # identity information
                     /export/postgres/pg-test-13/stat           # stats information
                     /export/postgres/pg-test-13/change         # changing records

{{ pg_fs_bkup }}     /var/backups                      # contains all backup data (pg,consul,etc..)
{{ pg_dir_bkup }}    /var/backups/postgres             # contains postgres backup data
{{ pg_backup_dir }}  /var/backups/postgres/pg-test-13  # contains cluster `pg-test` backup (of version 13)
                     /var/backups/postgres/pg-test-13/backup   # base backup
                     /var/backups/postgres/pg-test-13/arcwal   # WAL archive
                     /var/backups/postgres/pg-test-13/remote   # mount NFS/S3 remote resources here

# links
/pg             -> /export/postgres/pg-test-12               # pg root link
/pg/data        -> /export/postgres/pg-test-12/data          # real data dir
/pg/backup      -> /var/backups/postgres/pg-test-13/backup   # base backup
/pg/arcwal      -> /var/backups/postgres/pg-test-13/arcwal   # WAL archive
/pg/remote      -> /var/backups/postgres/pg-test-13/remote   # mount NFS/S3 remote resources here

Pgbouncer配置文件结构

Pgbouncer使用Postgres用户运行,配置文件位于/etc/pgbouncer。配置文件包括:

  • pgbouncer.ini,主配置文件
  • userlist.txt:列出连接池中的用户
  • pgb_hba.conf:列出连接池用户的访问权限
  • database.txt:列出连接池中的数据库

3.4.5 - 访问控制

介绍Pigsty中的访问控制模型

PostgreSQL提供了两类访问控制机制:认证(Authentication)权限(Privileges)

Pigsty带有基本的访问控制模型,足以覆盖绝大多数应用场景。

用户体系

Pigsty的默认权限系统包含四个默认用户四类默认角色

用户可以通过修改 pg_default_roles 来修改默认用户的名字,但默认角色的名字不建议新用户自行修改。

默认角色

Pigsty带有四个默认角色:

  • 只读角色(dbrole_readonly):只读
  • 读写角色(dbrole_readwrite):读写,继承dbrole_readonly
  • 管理角色(dbrole_admin):执行DDL变更,继承dbrole_readwrite
  • 离线角色(dbrole_offline):只读,用于执行慢查询/ETL/交互查询,仅允许在特定实例上访问。

默认用户

Pigsty带有四个默认用户:

  • 超级用户(postgres),数据库的拥有者与创建者,与操作系统用户一致
  • 复制用户(replicator),用于主从复制的用户。
  • 监控用户(dbuser_monitor),用于监控数据库指标的用户。
  • 管理员(dbuser_admin),执行日常管理操作与数据库变更。(通常供DBA使用)
name attr roles desc
dbrole_readonly Cannot login role for global readonly access
dbrole_readwrite Cannot login dbrole_readonly role for global read-write access
dbrole_offline Cannot login role for restricted read-only access (offline instance)
dbrole_admin Cannot login
Bypass RLS
pg_monitor
pg_signal_backend
dbrole_readwrite
role for object creation
postgres Superuser
Create role
Create DB
Replication
Bypass RLS
system superuser
replicator Replication
Bypass RLS
pg_monitor
dbrole_readonly
system replicator
dbuser_monitor 16 connections pg_monitor
dbrole_readonly
system monitor user
dbuser_admin Bypass RLS
Superuser
dbrole_admin system admin user

相关配置

以下是8个默认用户/角色的相关变量

默认用户有专用的用户名与密码配置选项,会覆盖 pg_default_roles中的选项。因此无需在其中为默认用户配置密码。

出于安全考虑,不建议为DBSU配置密码,故pg_dbsu没有专门的密码配置项。如有需要,用户可以在pg_default_roles中为超级用户指定密码。

# - system roles - #
pg_replication_username: replicator           # system replication user
pg_replication_password: DBUser.Replicator    # system replication password
pg_monitor_username: dbuser_monitor           # system monitor user
pg_monitor_password: DBUser.Monitor           # system monitor password
pg_admin_username: dbuser_admin               # system admin user
pg_admin_password: DBUser.Admin               # system admin password

# - default roles - #
# chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
pg_default_roles:

  # common production readonly user
  - name: dbrole_readonly                 # production read-only roles
    login: false
    comment: role for global readonly access

  # common production read-write user
  - name: dbrole_readwrite                # production read-write roles
    login: false
    roles: [dbrole_readonly]             # read-write includes read-only access
    comment: role for global read-write access

  # offline have same privileges as readonly, but with limited hba access on offline instance only
  # for the purpose of running slow queries, interactive queries and perform ETL tasks
  - name: dbrole_offline
    login: false
    comment: role for restricted read-only access (offline instance)

  # admin have the privileges to issue DDL changes
  - name: dbrole_admin
    login: false
    bypassrls: true
    comment: role for object creation
    roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]

  # dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
  - name: postgres
    superuser: true
    comment: system superuser

  # default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
  - name: replicator
    replication: true
    roles: [pg_monitor, dbrole_readonly]
    comment: system replicator

  # default replication user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
  - name: dbuser_monitor
    connlimit: 16
    comment: system monitor user
    roles: [pg_monitor, dbrole_readonly]

  # default admin user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
  - name: dbuser_admin
    bypassrls: true
    comment: system admin user
    roles: [dbrole_admin]

  # default stats user, for ETL and slow queries
  - name: dbuser_stats
    password: DBUser.Stats
    comment: business offline user for offline queries and ETL
    roles: [dbrole_offline]

Pgbouncer用户

Pgbouncer的操作系统用户将与数据库超级用户保持一致,默认都使用postgres

Pigsty默认会使用Postgres管理用户作为Pgbouncer的管理用户,使用Postgres的监控用户同时作为Pgbouncer的监控用户。

Pgbouncer的用户权限通过/etc/pgbouncer/pgb_hba.conf进行控制。

Pgbounce的用户列表通过/etc/pgbouncer/userlist.txt文件进行控制。

定义用户时,只有显式添加pgbouncer: true 的用户,才会被加入到Pgbouncer的用户列表中。

用户的定义

Pigsty中的用户可以通过以下两个参数进行声明,两者使用同样的形式:

用户的创建

Pigsty的用户可以通过 pgsql-createuser.yml 剧本完成创建


权限模型

默认情况下,角色拥有的权限如下所示:

GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly
GRANT SELECT                        ON TABLES    TO dbrole_readonly
GRANT SELECT                        ON SEQUENCES TO dbrole_readonly
GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
GRANT USAGE                         ON SCHEMAS   TO dbrole_offline
GRANT SELECT                        ON TABLES    TO dbrole_offline
GRANT SELECT                        ON SEQUENCES TO dbrole_offline
GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite
GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite
GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin
GRANT CREATE                        ON SCHEMAS   TO dbrole_admin
GRANT USAGE                         ON TYPES     TO dbrole_admin

其他业务用户默认都应当属于四种默认角色之一:只读读写管理员离线访问

Owner Schema Type Access privileges
username function =X/postgres
postgres=X/postgres
dbrole_readonly=X/postgres
dbrole_offline=X/postgres
username schema postgres=UC/postgres
dbrole_readonly=U/postgres
dbrole_offline=U/postgres
dbrole_admin=C/postgres
username sequence postgres=rwU/postgres
dbrole_readonly=r/postgres
dbrole_readwrite=wU/postgres
dbrole_offline=r/postgres
username table postgres=arwdDxt/postgres
dbrole_readonly=r/postgres
dbrole_readwrite=awd/postgres
dbrole_offline=r/postgres
dbrole_admin=Dxt/postgres

所有用户都可以访问所有模式,只读用户可以读取所有表,读写用户可以对所有表进行DML操作,管理员可以执行DDL变更操作。离线用户与只读用户类似,但只允许访问pg_role == 'offline' 或带有 pg_offline_query = true 的实例。

数据库权限

数据库有三种权限:CONNECT, CREATE, TEMP,以及特殊的属主OWNERSHIP。数据库的定义由参数 pg_database 控制。一个完整的数据库定义如下所示:

pg_databases:
  - name: meta                      # name is the only required field for a database
    owner: postgres                 # optional, database owner
    template: template1             # optional, template1 by default
    encoding: UTF8                  # optional, UTF8 by default
    locale: C                       # optional, C by default
    allowconn: true                 # optional, true by default, false disable connect at all
    revokeconn: false               # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
    tablespace: pg_default          # optional, 'pg_default' is the default tablespace
    connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
    extensions:                     # optional, extension name and where to create
      - {name: postgis, schema: public}
    parameters:                     # optional, extra parameters with ALTER DATABASE
      enable_partitionwise_join: true
    pgbouncer: true                 # optional, add this database to pgbouncer list? true by default
    comment: pigsty meta database   # optional, comment string for database

默认情况下,如果数据库没有配置属主,那么数据库超级用户dbsu将会作为数据库的默认OWNER,否则将为指定用户。

默认情况下,所有用户都具有对新创建数据库的CONNECT 权限,如果希望回收该权限,设置 revokeconn == true,则该权限会被回收。只有默认用户(dbsu|admin|monitor|replicator)与数据库的属主才会被显式赋予CONNECT权限。同时,admin|owner将会具有CONNECT权限的GRANT OPTION,可以将CONNECT权限转授他人。

如果希望实现不同数据库之间的访问隔离,可以为每一个数据库创建一个相应的业务用户作为owner,并全部设置revokeconn选项。这种配置对于多租户实例尤为实用。

创建新对象

默认情况下,出于安全考虑,Pigsty会撤销PUBLIC用户在数据库下CREATE新模式的权限,同时也会撤销PUBLIC用户在public模式下创建新关系的权限。数据库超级用户与管理员不受此限制,他们总是可以在任何地方执行DDL变更。

Pigsty非常不建议使用业务用户执行DDL变更,因为PostgreSQL的ALTER DEFAULT PRIVILEGE仅针对“由特定用户创建的对象”生效,默认情况下超级用户postgresdbuser_admin创建的对象拥有默认的权限配置,如果用户希望授予业务用户dbrole_admin,请在使用该业务管理员执行DDL变更时首先执行:

SET ROLE dbrole_admin; -- dbrole_admin 创建的对象具有正确的默认权限

在数据库中创建对象的权限与用户是否为数据库属主无关,这只取决于创建该用户时是否为该用户赋予管理员权限。

pg_users:
  - {name: test1, password: xxx , groups: [dbrole_readwrite]}  # 不能创建Schema与对象
  - {name: test2, password: xxx , groups: [dbrole_admin]}      # 可以创建Schema与对象

认证模型

HBA是Host Based Authentication的缩写,可以将其视作IP黑白名单。

HBA配置方式

在Pigsty中,所有实例的HBA都由配置文件生成而来,最终生成的HBA规则取决于实例的角色(pg_role) Pigsty的HBA由下列变量控制:

  • pg_hba_rules: 环境统一的HBA规则
  • pg_hba_rules_extra: 特定于实例或集群的HBA规则
  • pgbouncer_hba_rules: 链接池使用的HBA规则
  • pgbouncer_hba_rules_extra: 特定于实例或集群的链接池HBA规则

每个变量都是由下列样式的规则组成的数组:

- title: allow intranet admin password access
  role: common
  rules:
    - host    all     +dbrole_admin               10.0.0.0/8          md5
    - host    all     +dbrole_admin               172.16.0.0/12       md5
    - host    all     +dbrole_admin               192.168.0.0/16      md5

基于角色的HBA

role = common的HBA规则组会安装到所有的实例上,而其他的取值,例如(role : primary)则只会安装至pg_role = primary的实例上。因此用户可以通过角色体系定义灵活的HBA规则。

作为一个特例role: offline 的HBA规则,除了会安装至pg_role == 'offline'的实例,也会安装至pg_offline_query == true的实例上。

默认配置

在默认配置下,主库与从库会使用以下的HBA规则:

  • 超级用户通过本地操作系统认证访问
  • 其他用户可以从本地用密码访问
  • 复制用户可以从局域网段通过密码访问
  • 监控用户可以通过本地访问
  • 所有人都可以在元节点上使用密码访问
  • 管理员可以从局域网通过密码访问
  • 所有人都可以从内网通过密码访问
  • 读写用户(生产业务账号)可以通过本地(链接池)访问 (部分访问控制转交链接池处理)
  • 在从库上:只读用户(个人)可以从本地(链接池)访问。 (意味主库上拒绝只读用户连接)
  • pg_role == 'offline' 或带有pg_offline_query == true的实例上,会添加允许dbrole_offline分组用户访问的HBA规则。
#==============================================================#
# Default HBA
#==============================================================#
# allow local su with ident"
local   all             postgres                               ident
local   replication     postgres                               ident

# allow local user password access
local   all             all                                    md5

# allow local/intranet replication with password
local   replication     replicator                              md5
host    replication     replicator         127.0.0.1/32         md5
host    all             replicator         10.0.0.0/8           md5
host    all             replicator         172.16.0.0/12        md5
host    all             replicator         192.168.0.0/16       md5
host    replication     replicator         10.0.0.0/8           md5
host    replication     replicator         172.16.0.0/12        md5
host    replication     replicator         192.168.0.0/16       md5

# allow local role monitor with password
local   all             dbuser_monitor                          md5
host    all             dbuser_monitor      127.0.0.1/32        md5

#==============================================================#
# Extra HBA
#==============================================================#
# add extra hba rules here




#==============================================================#
# primary HBA
#==============================================================#


#==============================================================#
# special HBA for instance marked with 'pg_offline_query = true'
#==============================================================#



#==============================================================#
# Common HBA
#==============================================================#
#  allow meta node password access
host    all     all                         10.10.10.10/32      md5

#  allow intranet admin password access
host    all     +dbrole_admin               10.0.0.0/8          md5
host    all     +dbrole_admin               172.16.0.0/12       md5
host    all     +dbrole_admin               192.168.0.0/16      md5

#  allow intranet password access
host    all             all                 10.0.0.0/8          md5
host    all             all                 172.16.0.0/12       md5
host    all             all                 192.168.0.0/16      md5

#  allow local read/write (local production user via pgbouncer)
local   all     +dbrole_readonly                                md5
host    all     +dbrole_readonly           127.0.0.1/32         md5





#==============================================================#
# Ad Hoc HBA
#===========================================================

4 - 界面

了解Pigsty提供的图形化用户界面

Pigsty提供了专业且易用的PostgreSQL监控系统,浓缩了业界监控的最佳实践。

用户可以方便地进行修改与定制;复用监控基础设施,或与其他监控系统相集成。

全局 集群 服务 实例 数据库
Home PG Cluster PG Service PG Instance PG Database
PG Overview PG Cluster Replication PG DNS PG Instance Log PG Query
PG Shard PG Cluster Activity Node PG Catalog
PG Alert PG Cluster Session PG Pgbouncer PG Table
PG KPI PG Cluster Node PG Proxy PG Table Detail
PG Capacity PG Cluster Persist PG Exporter
PG Change PG Cluster Database PG Setting
PG Monitor PG Cluster Stats PG Stat Activity
PG Cluster Table PG Stat Statements
PG Cluster Table Detail
PG Cluster Query
PG Cluster Health
PG Cluster Log
PG Cluster All

注:加粗的面板是Pigsty默认提供的监控面板,其他则是专业版提供的额外特性。

默认监控已经足以覆盖绝大多数场景,如果您需要更加深入的掌控与洞察,请联系 专业支持

4.1 - 全局监控

介绍全局监控面板

4.1.1 - Home

Home面板简介

Home Dashboard是Pigsty的默认主页,包含了到其他系统的导航连接。

您可以在这里发布公告,添加业务系统的导航,集成其他的监控面板等。

4.1.2 - PG Overview

PG Overview面板简介

PG Overview是总揽整个环境中所有数据库集群的地方。

这里提供了到所有数据库集群与数据库实例的快捷导航,并直观地呈现出整个环境的资源状态,异常事件,系统饱和度等等。

PG Overview的图表主要以集群为基本单位进行呈现,主要用于从全局视角快速定位异常集群。

长图

4.1.3 - PG Shard

PG Shard针对水平分片的并行集群而专门设计。

PG Shard针对水平分片的并行集群而专门设计。

水平分片是Pigsty专业版本提供的高级特性,可以将较大(TB到PB)的业务数据拆分为多个水平的业务集群对外提供服务。

PG Shard提供的指标与PG Overview类似,但会通过预定义的正则表达式筛选出所有同属于一个Shard的所有Cluster。

因此用户可以直观的比较不同分片之间的活动与负载,对于定位数据倾斜问题特别有帮助。

4.1.4 - PG Alert

PG Alert面板简介

PG Alert是总揽整个环境中所有报警信息的地方。包括所有与报警相关指标的快速面板。

4.1.5 - PG KPI

PG KPI 展示了环境中关键指标的概览

PG KPI 展示了环境中关键指标的概览,您可以在这里快速定位整个环境中的异常指标与异常实例。

4.1.6 - PG Capacity

PG Capacity 展示了数据库的水位状态

PG Capacity 展示了数据库的水位状态,这是Pigsty专业版提供的面板。

4.1.7 - PG Change

PG Change包含了整个环境中所发布的历史DDL变更。

PG Change包含了整个环境中所发布的历史DDL变更。

该面板必须与 Pigsty专业版特性: DDL发布系统 共同使用,在此不列出

4.1.8 - PG Monitor

PG Monitor面板简介

PG Monitor是监控系统的自我监控,包括Grafana,Prometheus,Consul,Nginx的监控。

自我监控属于Pigsty企业版特性。

4.2 - 集群监控

集群级别的监控面板

DB监控:PG集群

PG集群监控是最常用的Dashboard,因为PG以集群为单位提供服务,因此集群层面集合了最完整全面的信息。

大多数监控图都是实例级监控的泛化与上卷,即从展示单个实例内的细节,变为展现集群内每个实例的信息,以及集群和服务层次聚合后的指标。

集群概览

Cluster级别的集群概览相比实例级别多了一些东西:

  • 时间线与领导权,当数据库发生Failover或Switchover时,时间线会步进,领导权会发生变化。
  • 集群拓扑,集群拓扑展现了集群中的复制拓扑,以及采用的复制方式(同步/异步)。
  • 集群负载,包括整个集群实时、1分钟、5分钟、15分钟的负载情况。以及集群中每个节点的Load1
  • 集群报警与事件。

4.2.1 - PG Cluster

PG Cluster面板简介

PG Cluster 关注单个集群的整体情况,并提供到其他集群信息的导航。

DB监控:PG集群

PG集群监控是最常用的Dashboard,因为PG以集群为单位提供服务,因此Cluster集合了最完整全面的信息。

大多数监控图都是实例级监控的泛化与上卷,即从展示单个实例内的细节,变为展现集群内每个实例的信息,以及集群和服务层次聚合后的指标。

集群概览

Cluster级别的集群概览相比实例级别多了一些东西:

  • 时间线与领导权,当数据库发生Failover或Switchover时,时间线会步进,领导权会发生变化。
  • 集群拓扑,集群拓扑展现了集群中的复制拓扑,以及采用的复制方式(同步/异步)。
  • 集群负载,包括整个集群实时、1分钟、5分钟、15分钟的负载情况。以及集群中每个节点的Load1
  • 集群报警与事件。

集群复制

Cluster级别的Dashboard与Instance级别Dashboard最重要的区别之一就是提供了整个集群的复制全景。包括:

  • 集群中的主库与级联桥接库。集群是否启用同步提交,同步从库名称。桥接库与级联库数量,最大从库配置

  • 成对出现的Walsender与Walreceiver列表,体现一对主从关系的复制状态

  • 以秒和字节衡量的复制延迟(通常1秒的复制延迟对应10M~100M不等的字节延迟),复制槽堆积量。

  • 从库视角的复制延迟

  • 集群中从库的数量,备份或拉取从库时可以从这里看到异常。

  • 集群的LSN进度,用于整体展示集群的复制状态与持久化状态。

节点指标

PG机器的相关指标,按照集群进行聚合。

事务与查询

与实例级别的类似,但添加了Service层次的聚合(一个集群通常提供primarystandby两种Service)。

其他指标与实例级别差别不大。

4.2.2 - PG Cluster Replication

PG Cluster Replication 关注单个集群内的复制活动。

PG Cluster Replication 关注单个集群内的复制活动。

总览

4.2.3 - PG Cluster Activity

PG Cluster Activity 关注特定集群的活动状态,包括事务,查询,锁,等等。

PG Cluster Activity 关注单个集群的活动,包括事务,查询,锁,等等。

4.2.4 - PG Cluster Session

PG Cluster Session 关注特定集群中连接、连接池的工作状态。

PG Cluster Session 关注特定集群中连接、连接池的工作状态。

4.2.5 - PG Cluster Node

PG Cluster Node关注整个集群的机器资源使用情况

PG Cluster Node关注整个集群的机器资源使用情况

4.2.6 - PG Cluster Persist

PG Cluster Persist 关注集群的持久化,检查点与IO状态。

PG Cluster Persist 关注集群的持久化,检查点与IO状态。

4.2.7 - PG Cluster Database

PG Cluster Database 关注特定集群中与数据库有关的指标:TPS,增删改查,年龄等。

PG Cluster Activity 关注单个集群的活动,包括事务,查询,锁,等等。

4.2.8 - PG Cluster Stat

PG Cluster Stat 用于展示集群在过去一段统计周期内的用量信息

PG Cluster Stat 用于展示集群在过去一段统计周期内的用量信息

4.2.9 - PG Cluster Table

PG Cluster Table 关注单个集群中所有表的增删改查情况

PG Cluster Table 关注单个集群中所有表的增删改查情况

4.2.10 - PG Cluster Table Detail

PG Cluster Table Detail关注单个集群中某张特定表的增删改查情况

PG Cluster Table Detail关注单个集群中某张特定表的增删改查情况

您可以从该面板跳转到

  • PG Cluster Table: 上卷至集群中的所有表
  • PG Instance Table Detail:查看这张表在集群中的单个特定实例上的详细状态。

4.2.11 - PG Cluster Query

PG Cluster Query 关注特定集群内所有的查询状况

PG Cluster Query 关注特定集群内所有的查询状况

DB监控:PG慢查询平台

显示慢查询相关的指标,上方是本实例的查询总览。鼠标悬停查询ID可以看到查询语句,点击查询ID会跳转到对应的查询细分指标页(Query Detail)。

  • 左侧是格式化后的查询语句,右侧是查询的主要指标,包括
    • 每秒查询数量:QPS
    • 实时的平均响应时间(RT Realtime)
    • 每次查询平均返回的行数
    • 每次查询平均用于BlockIO的时长
    • 响应时间的均值,标准差,最小值,最大值(自从上一次统计周期以来)
    • 查询最近一天的调用次数,返回行数,总耗时。以及自重置以来的总调用次数。
  • 下方是指定时间段的查询指标图表,是概览指标的细化。

4.2.12 - PG Cluster Health

PG Cluster Health基于规则对集群进行健康度评分

PG Cluster Health基于规则对集群进行健康度评分。

4.2.13 - PG Cluster Log

PG Cluster Log面板简介

PG Cluster Log 关注单个集群内的所有日志事件。

该面板提供了到外部的基于Pgbadger的日志摘要平台的连接,这是一个专业版特性(也就是还没弄到开源版里)。

4.2.14 - PG Cluster All

PG Cluster All 包含了集群中所有的监控信息,用于细节对比与分析。

PG Cluster All 包含了集群中所有的监控信息,用于细节对比与分析。

4.3 - 服务监控

服务级别的监控面板

服务级监控

一个典型的数据库集群提供两种服务

读写服务:主库

只读服务:从库

而服务往往与域名、解析、负载均衡,路由,流量分发紧密相关

服务级监控主要关注以下内容

  • 主从流量分发与权重

  • 后端服务器健康检测

  • 负载均衡器统计信息

4.3.1 - PG Service

PG Service关注数据库角色层次的聚合信息,DNS解析,域名,代理流量权重等。

PG Service 关注数据库对外暴露的服务

注意这里的监控指标只有当启用Haproxy作为 service provided时才可用。

旧PG Service Dashboard

旧PG Service Dashboard按照角色层次进行信息聚合,呈现DNS解析,域名,代理流量权重等。现在已经弃用。

4.3.2 - PG DNS

PG DNS 关注服务域名的解析情况

PG DNS 关注服务域名的解析情况。以及与之绑定的VIP

但是鉴于各个用户定义与管理服务的方式不一,Pigsty不在公开发行版本提供更多关于服务级别的监控面板

4.4 - 实例监控

实例级监控关注单个组件的实例

实例级监控

实例级监控关注于单个实例,无论是一台机器,一个数据库实例,一个连接池实例,还是负载均衡器,指标导出器,都可以在实例级监控中找到最详细的信息。

4.4.1 - PG Instance

PG Instance 详细展示了单个数据库实例的完整指标信息

PG Instance 详细展示了单个数据库实例的完整指标信息

DB监控:PG实例

实例概览

  • 实例身份信息:集群名,ID,所属节点,软件版本,所属集群其他成员等
  • 实例配置信息:一些关键配置,目录,端口,配置路径等
  • 实例健康信息,实例角色(Primary,Standby)等。
  • 黄金指标:PG Load,复制延迟,活跃后端,排队连接,查询延迟,TPS,数据库年龄
  • 数据库负载:实时(Load0),1分钟,5分钟,15分钟
  • 数据库警报与提醒事件

节点概览

  • 四大基本资源:CPU,内存,磁盘,网卡的配置规格,关键功能,与核心指标
  • 右侧是网卡详情与磁盘详情

单日统计

以最近1日为周期的统计信息(从当前时刻算起的前24小时),比如最近一天的查询总数,返回的记录总数等。上面两行是节点级别的统计,下面两行是主要是PG相关的统计指标。

对于计量计费,水位评估特别有用。

复制

  • 当前节点的Replication配置
  • 复制延迟:以秒计,以字节计的复制延迟,复制槽堆积量
  • 下游节点对应的Walsender统计
  • 各种LSN进度,综合展示集群的复制状况与持久化状态。
  • 下游节点数量统计,可以看出复制中断的问题

事务

事务部分用于洞悉实例中的活动情况,包括TPS,响应时间,锁等。

  • TPS概览信息:TPS,TPS与过去两天的DoD环比。DB事务数与回滚数

  • 回滚事务数量与回滚率

  • TPS详情:绿色条带为±1σ,黄色条带为±3σ,以过去30分钟作为计算标准,通常超出黄色条带可认为TPS波动过大

  • Xact RT,事务平均响应时间,从连接池抓取。绿色条带为±1σ,黄色条带为±3σ。

  • TPS与RT的偏离程度,是一个无量纲的可横向比较的值,越大表示指标抖动越厉害。$(μ/σ)^2$

  • 按照DB细分的TPS与事务响应时间,通常一个实例只有一个DB,但少量实例有多个DB。

  • 事务数,回滚数(TPS来自连接池,而这两个指标直接来自DB本身)

  • 锁的数量,按模式聚合(8种表锁),按大类聚合(读锁,写锁,排他锁)

查询

大多数指标与事务中的指标类似,不过统计单位从事务变成了查询语句。查询部分可用于分析实例上的慢查询,定位性能瓶颈。

  • QPS 每秒查询数,与Query RT查询平均响应时间,以及这两者的波动程度,QPS的周期环比等
  • 生产环境对查询平均响应时间有要求:1ms为黄线,100ms为红线

语句

语句展示了查询中按语句细分的指标。每条语句(查询语法树抽离常量变量后如果一致,则算同一条查询)都会有一个查询ID,可以在慢查询平台中获取到具体的语句与详细指标与统计。

  • 左侧慢查询列表是按pg_stat_statments中的平均响应时间从大到小排序的,点击查询ID会自动跳转到慢查询平台
  • 这里列出的查询,是累计查询耗时最长的32个查询,但排除只有零星调用的长耗时单次查询与监控查询。
  • 右侧包括了每个查询的实时QPS,平均响应时间。按照RT与总耗时的排名。

后端进程

后端进程用于显示与PG本身的连接,后端进程相关的统计指标。特别是按照各种维度进行聚合的结果,特别适合定位雪崩,慢查询,其他疑难杂症。

  • 后端进程数按种类聚合,后端进程按状态聚合,后端进程按DB聚合,后端进程按等待事件类型聚合。
  • 活跃状态的进程/连接,在事务中空闲的连接,长事务。

连接池

连接池部分与后端进程部分类似,但全都是从Pgbouncer中间件上获取的监控指标

  • 连接池后端连接的状态:活跃,刚用过,空闲,测试过,登录状态。
  • 分别按照User,按照DB,按照Pool(User:DB)聚合的前端连接,用于排查异常连接问题。
  • 等待客户端数(重要),以及队首客户端等待的时长,用于定位连接堆积问题。
  • 连接池可用连接使用比例。

数据库概览

Database部分主要来自pg_stat_databasepg_database,包含数据库相关的指标:

  • WAL Rate,标识数据库的写入负载,每秒产生的WAL字节数量。
  • Buffer Hit Rate,数据库 ShareBuffer 命中率,未命中的页面将从操作系统PageCache和磁盘获取。
  • 每秒增删改查的记录条数
  • 临时文件数量与临时文件大小,可以定位大型查询问题。

持久化

持久化主要包含数据落盘,Checkpoint,块访问相关的指标

  • 重要的持久化参数,比如是否出现数据校验和验证失败(如果启用可以检测到数据腐坏)
  • 数据库文件(DB,WAL,Log)的大小与增速。
  • 检查点的数量与检查点耗时。
  • 每秒分配的块,与每秒刷盘的块。每秒访问的块,以及每秒从磁盘中读取的块。(以字节计,注意一个Buffer Page是8192,一个Disk Block是4096)

监控Exporter

Exporter展示了监控系统组件本身的监控指标,包括:

  • Exporter是否存活,Uptime,Exporter每分钟被抓取的次数
  • 每个监控查询的耗时,产生的指标数量与错误数量。

4.4.2 - PG Instance Log

PG Instance Log展示单个数据库实例的日志信息

PG Instance 详细展示了单个数据库实例的完整指标信息。

Pigsty日志基于Loki 与 Promtail,是可选的额外模组。

您必须先在元节点上执行 infra-loki.yml 并在普通数据节点上执行 pgsql-promtail.yml 方能启用本功能。

用户可以从这里查阅 每个实例上 Postgres, Pgbouncer, Patroni的相关日志。

上方的三个图表显示的是当前时间段中的Log Rate,单位时间内的日志数量。

Search框中可以填入关键字搜索,右上角的Log Rate显示的是包含该关键字的Log Rate。

4.4.3 - Node

Node详细展示了单个机器节点的指标,该面板可用于任何安装有Node Exporter的节点

Node详细展示了单个机器节点的指标,该面板可用于任何安装有Node Exporter的节点

4.4.4 - PG Pgbouncer

PG Instance 详细展示了单个数据库实例的完整指标信息

PG Pgbouncer 详细展示了单个数据库连接池实例的完整指标信息

4.4.5 - PG Proxy

PG Proxy 详细展示了单个数据库代理 Haproxy 的状态信息

PG Proxy 详细展示了单个数据库代理 Haproxy 的状态信息

4.4.6 - PG Exporter

PG Exporter 详细展示了单个数据库实例的监控指标导出器本身的健康状态

PG Exporter 详细展示了单个数据库实例的监控指标导出器本身的健康状态

4.4.7 - PG Setting

PG Setting 详细展示了单个数据库实例的配置信息

PG Setting 详细展示了单个数据库实例的完整指标信息

4.4.8 - PG Stat Activity

PG Stat Activity 详细展示了单个数据库实例内的实时活动

PG Stat Activity 详细展示了单个数据库实例内的实时活动,注意这里的数据是从Catalog中实时获取,而非监控系统采集。

4.4.9 - PG Stat Statements

PG Stat Statements 详细展示了单个数据库实例内实时的查询状态统计

PG Stat Statements 详细展示了单个数据库实例内实时的查询状态统计

4.5 - 数据库监控

数据库级别的监控面板

数据库级监控

数据库级监控更像是“业务级”监控,它会展现出系统中每一张表,每一个索引,每一个函数的详细使用情况。

对于业务优化与故障分析而言有着巨大的作用。

但是当心监控信息也可能透露出关键的业务数据,例如对用户表的更新QPS可能反映出业务的日活数。请在生产环境中对Grafana做好权限控制,避免不必要的风险。

4.5.1 - PG Database

PG Database 关注单个数据库内发生的细节

PG Database 关注单个数据库内发生的详细情况,对于单实例多DB的情况尤其实用。

4.5.2 - PG Pool

PG Pool关注连接池中的单个连接池,即用户与数据库构成的二元组

PG Pool关注连接池中的单个User-DB对,当您使用多租户特性时,这个面板对于连接池问题的排查会很有帮助。

4.5.3 - PG Query

PG Query 关注单个数据库内发生的查询细节

PG Query 关注单个数据库内发生的整体查询细节

您可以用本面板定位出实例内的具体异常查询,然后跳转到PG Query Detail面板查看具体查询的详细信息

Query Overview

Database Statementes

Statemente RT

Statement Time Spend per Second

Statement RT Ranking

4.5.4 - PG Table Catalog

PG Catalog可以直接从数据库目录中获取并展示特定表的元数据

PG Catalog可以直接从数据库目录中获取并展示特定表的元数据

请注意,Catalog类型的信息是直接连接至数据库目录进行查询的,可能导致不必要的安全风险。

身份信息

基本指标

标识符

表特性

关键数值描述

持久化

访问权限

表选项

统计指标

垃圾清理

分析诊断

IO统计

字段详情

索引详情

关系大小

4.5.5 - PG Table

PG Table关注单个数据库中的所有表的增删改查等。

PG Table关注单个数据库中的所有表,增删改查,访问等。

您可以点击具体的表,跳转至PG Table Detail查阅这张表的详细指标。

4.5.6 - PG Table Detail

PG Table Detail关注单个数据库中的单张表

PG Table Detail关注单个数据库中的单张表

您可以在本面板中跳转至 PG Cluster Table Detail,来了解这张表在整个集群的不同实例上的工作状态。

4.5.7 - PG Query Detail

PG Query Detail关注单个数据库内发生的单个查询的细节

PG Query Detail关注单个数据库内发生的单个查询的细节。

请注意,这里的查询都使用QueryID进行标识。 您可以使用PG Stat Statementes面板提供的实时查询接口获取查询对应的语句。 直接在面板中展示SQL语句可能会导致不必要的安全风险,但该特性会在Pigsty专业版中提供。

5 - 部署

如何将Pigsty部署至生产环境

无论是沙箱环境还是实际生产环境,Pigsty都采用同样的三步走部署流程:准备资源修改配置执行剧本

Pigsty在部署前需要进行一些准备工作:配置带有正确权限配置的节点,下载安装相关软件。置备完成后,用户应当按照自己的需求修改配置。并执行剧本将系统调整至配置描述的状态。

如果用户希望使用Pigsty监控现有数据库集群,或只希望部署Pigsty监控系统部分,请参考 仅监控部署

准备工作

修改配置

执行剧本

5.1 - 准备资源

如何完成Pigsty资源准备工作

节点置备

在部署Pigsty前,用户需要准备机器节点资源,包括至少一个元节点,与任意数量的数据库节点。

数据库节点可以使用任意SSH可达节点:物理机、虚拟机、容器等,但目前Pigsty仅支持CentOS 7操作系统。

Pigsty推荐使用物理机与虚拟机进行部署。使用本地沙箱环境时,Pigsty基于VagrantVirtualbox快速拉起本地虚拟机资源,详情请参考 Vagrant教程

元节点置备

Pigsty需要元节点作为整个环境的控制中心,并提供 基础设施 服务。元节点的数量要求最少1个,推荐3个,建议不超过5个。如果将DCS部署至元节点上,建议在生产环境使用3个元节点,以充分保证DCS服务的可用性。

用户应当确保自己可以登录元节点,并能从元节点上 免密码SSH登录 其他节点,并 免密码 执行sudo命令。

用户应当确保自己可以直接或间接访问元节点的80端口,以访问Pigsty提供的用户界面。

软件置备

用户应当在元节点上 下载本项目,以及 离线软件包(可选)。

使用本地沙箱拉起Pigsty时,用户还需要在宿主机上额外安装:

5.1.1 - Vagrant

如何安装使用Vagrant

通常为了测试“数据库集群”这样的系统,用户需要事先准备若干台虚拟机。尽管云服务已经非常方便,但本地虚拟机访问通常比云虚拟机访问方便,响应迅速,成本低廉。本地虚拟机配置相对繁琐,Vagrant 可解决这一问题。

Pigsty用户无需了解vagrant的原理,只需要知道vagrant可以简单、快捷地按照用户的需求,在笔记本、PC或Mac上拉起若干台虚拟机。用户需要完成的工作,就是将自己的虚拟机需求,以vagrant配置文件的形式表达出来。

Vagrant安装

访问Vagrant官网

https://www.vagrantup.com/downloads

下载Vagrant

最新版本为2.2.14

安装Vagrant

点击 vagrant.pkg 执行安装,安装过程需要输入密码。https://www.virtualbox.org/

Vagrant配置文件

https://github.com/Vonng/pigsty/blob/master/vagrant/Vagrantfile 提供了一个Vagrantfile样例。

这是Pigsty沙箱所使用的Vagrantfile,定义了四台虚拟机,包括一台2核/4GB的中控机/元节点,和3台 1核/1GB 的数据库节点

vagrant 二进制程序根据 Vagrantfile 中的定义,默认调用 Virtualbox 完成本地虚拟机的创建工作。

进入Pigsty根目录下的vagrant目录,执行vagrant up,即可拉起所有的四台虚拟机。

IMAGE_NAME = "centos/7"
N=3  # 数据库机器节点数量,可修改为0

Vagrant.configure("2") do |config|
    config.vm.box = IMAGE_NAME
    config.vm.box_check_update = false
    config.ssh.insert_key = false

    # 元节点
    config.vm.define "meta", primary: true do |meta|  # 元节点默认的ssh别名为`meta`
        meta.vm.hostname = "meta"
        meta.vm.network "private_network", ip: "10.10.10.10"
        meta.vm.provider "virtualbox" do |v|
            v.linked_clone = true
            v.customize [
                    "modifyvm", :id,
                    "--memory", 4096, "--cpus", "2",   # 元节点的内存与CPU核数:默认为2核/4GB
                    "--nictype1", "virtio", "--nictype2", "virtio",
                    "--hwv·irtex", "on", "--ioapic", "on", "--rtcuseutc", "on", "--vtxvpid", "on", "--largepages", "on"
                ]
        end
        meta.vm.provision "shell", path: "provision.sh"
    end

    # 初始化N个数据库节点
    (1..N).each do |i|
        config.vm.define "node-#{i}" do |node|  # 数据库节点默认的ssh别名分别为`node-{1,2,3}`
            node.vm.box = IMAGE_NAME
            node.vm.network "private_network", ip: "10.10.10.#{i + 10}"
            node.vm.hostname = "node-#{i}"
            node.vm.provider "virtualbox" do |v|
                v.linked_clone = true
                v.customize [
                        "modifyvm", :id,
                        "--memory", 2048, "--cpus", "1", # 数据库节点的内存与CPU核数:默认为1核/2GB
                        "--nictype1", "virtio", "--nictype2", "virtio",
                        "--hwvirtex", "on", "--ioapic", "on", "--rtcuseutc", "on", "--vtxvpid", "on", "--largepages", "on"
                    ]
            end
            node.vm.provision "shell", path: "provision.sh"
        end
    end
end

定制Vagrantfile

如果用户的机器配置不足,则可以考虑使用更小的N值,减少数据库节点的数量。如果只希望运行单个元节点,将其修改为0即可。

用户还可以修改每台机器的CPU核数和内存资源等,如配置文件中的注释所述,详情参阅Vagrant与Pigsty文档。

沙箱环境默认使用IMAGE_NAME = "centos/7",首次执行时会从vagrant官方下载centos 7.8 virtualbox 镜像,确保宿主机拥有合适的网络访问权限(科学上网)!

快捷方式

Pigsty已经提供了对常用vagrant命令的包装,用户可以在项目的Makefile中看到虚拟机管理的相关命令:

make        # 启动集群
make new    # 销毁并创建新集群
make dns    # 将Pigsty域名记录写入本机/etc/hosts (需要sudo权限)
make ssh    # 将虚拟机SSH配置信息写入 ~/.ssh/config
make clean	# 销毁现有本地集群
make cache	# 制作离线安装包,并拷贝至宿主机本地,加速后续集群创建
make upload # 将离线安装缓存包 pkg.tgz 上传并解压至默认目录 /www/pigsty

更多信息,请参考Makefile

###############################################################
# vm management
###############################################################
clean:
	cd vagrant && vagrant destroy -f --parallel; exit 0
up:
	cd vagrant && vagrant up
halt:
	cd vagrant && vagrant halt
down: halt
status:
	cd vagrant && vagrant status
suspend:
	cd vagrant && vagrant suspend
resume:
	cd vagrant && vagrant resume
provision:
	cd vagrant && vagrant provision
# sync ntp time
sync:
	echo meta node-1 node-2 node-3 | xargs -n1 -P4 -I{} ssh {} 'sudo ntpdate pool.ntp.org'; true
	# echo meta node-1 node-2 node-3 | xargs -n1 -P4 -I{} ssh {} 'sudo chronyc -a makestep'; true
# show vagrant cluster status
st: status
start: up ssh sync
stop: halt

# only init partial of cluster
meta-up:
	cd vagrant && vagrant up meta
node-up:
	cd vagrant && vagrant up node-1 node-2 node-3
node-new:
	cd vagrant && vagrant destroy -f node-1 node-2 node-3
	cd vagrant && vagrant up node-1 node-2 node-3

5.1.2 - Virtualbox

如何在MacOS上安装Virtualbox

在MacOS上安装Virtualbox非常简单,其他操作系统上与之类似。

前往Virtualbox官网

https://www.virtualbox.org/

下载Virtualbox

最新版本为6.1.18

安装Virtualbox

点击 VirtualBox.pkg 执行安装,安装过程需要输入密码并重启。

如果安装失败,请检查您的 系统偏好设置 - 安全性与隐私 - 通用 - 允许以下位置的App中点击“允许”按钮。

就这?

没错,您已经成功安装完Oracle Virtualbox了!

5.1.3 - Ansible

如何安装使用Vagrant

Ansible是一个流行的简单的自动化IT工具,广泛用于运维管理与软件部署。

Ansible是Pigsty剧本的执行载体,如果不需要定制本项目,用户并不需要了解太多Ansible的细节,将其看作一个高级的Shell或Python解释器即可。

如何安装

Ansible可以通过包管理器安装

brew install ansible # macos
yum  install ansible # linux

检查安装的软件版本:

$ echo $(ansible --version)
ansible 2.10.3

建议使用2.9以上版本的Ansible,更低版本的Ansible可能遭遇兼容性问题。

如何使用

Pigsty项目根目录下提供了一系列Ansible剧本,在其开头的Hashbang中调用ansible-playbook来执行自己。

#!/usr/bin/env ansible-playbook

因此,您通常不需要关心Ansible如何使用,安装完成后,直接使用下面的方式执行Ansible剧本即可。

./pgsql.yml

离线安装Ansible

Pigsty依赖Ansible进行环境初始化。但如果元节点本身没有安装Ansible,也没有互联网访问怎么办?

离线安装包中本身带有 Ansible,可以直接通过本地文件Yum源的方式使用,假设用户已经将离线安装包解压至默认位置:/www/pigsty

那么将以下Repo文件写入/etc/yum.repos.d/pigsty-local.repo 中,就可以直接使用该源。

[pigsty-local]
name=Local Yum Repo pigsty
baseurl=file:///www/pigsty
skip_if_unavailable = 1
enabled = 1
priority = 1
gpgcheck = 0

执行以下命令,在元节点上离线安装Ansible

yum clean all
yum makecache
yum install ansible

5.1.4 - 管理用户

如何配置SSH免密码登陆,以及免密码sudo

Pigsty需要一个管理用户,该用户能够从元节点上免密码SSH登陆其他节点,并免密码执行sudo命令。

管理用户

Pigsty推荐将管理用户的创建,权限配置与密钥分发放在虚拟机的Provisioning阶段完成,作为交付内容的一部分。

沙箱环境的默认用户vagrant默认已经配置有免密登陆和免密sudo,您可以从宿主机或沙箱元节点使用vagrant登陆所有的数据库节点。对于生产环境来说,即机器交付时,应当已经配置有这样一个具有免密远程SSH登陆并执行免密sudo的用户。

如果没有,则需要用户自行创建。如果用户拥有root权限,也可以用root身份直接执行初始化,Pigsty可以在初始化过程中完成管理用户的创建。相关配置参数包括:

node_admin_setup

是否在每个节点上创建管理员用户(免密sudo与ssh),默认会创建。

Pigsty默认会创建名为admin (uid=88)的管理用户,可以从元节点上通过SSH免密访问环境中的其他节点并执行免密sudo。

node_admin_uid

管理员用户的uid,默认为88

node_admin_username

管理员用户的名称,默认为admin

node_admin_ssh_exchange

是否在当前执行命令的机器之间相互交换管理员用户的SSH密钥?

默认会执行交换,这样管理员可以在机器间快速跳转。

node_admin_pks

写入到管理员~/.ssh/authorized_keys中的密钥

持有对应私钥的用户可以以管理员身份登陆。

Pigsty默认会创建uid=88的管理员用户admin,并将该用户的密钥在集群范围内进行交换。

node_admin_pks 中给出的公钥会被安装至管理员账户的authorized_keys中,持有对应私钥的用户可以直接远程免密登陆。

配置SSH免密访问

在元节点上,假设执行命令的用户名为vagrant

生成密钥

vagrant用户的身份执行以下命令,会为vagrant生成公私钥对,用于登陆。

ssh-keygegn
  • 默认公钥:~/.ssh/id_rsa.pub
  • 默认私钥:~/.ssh/id_rsa

安装密钥

将公钥添加至需要登陆机器的对应用户上:/home/vagrant/.ssh/authorized_keys

如果您已经可以直接通过密码访问远程机器,可以直接通过ssh-copy-id的方式拷贝公钥。

# 输入密码以完成公钥拷贝
ssh-copy-id <ip>

# 直接将密码嵌入命令中,避免交互式密码输入
sshpass -p <password> ssh-copy-id <ip>

然后便可以通过该用户免密码SSH登陆远程机器。

配置免密SUDO

假设用户名为vagrant,则通过visudo 命令,或创建/etc/sudoers.d/vagrant 文件添加以下记录:

%vagrant ALL=(ALL) NOPASSWD: ALL

则 vagrant 用户即可免密sudo执行所有命令

5.1.5 - 软件置备

如何离线安装Pigsty

用户需要将Pigsty项目下载至元节点(在沙箱环境中,也可以使用宿主机发起控制)

下载Pigsty源码

用户可以使用 git 直接从 Github 克隆项目,或从 Github Release 页面下载最新版本的Pigsty源码包:

git clone https://github.com/Vonng/pigsty
git clone git@github.com:Vonng/pigsty.git

也可以从 Pigsty CDN 下载最新版本的Pigsty: pigsty.tar.gz

http://pigsty-1304147732.cos.accelerate.myqcloud.com/latest/pigsty.tar.gz

下载离线安装包

Pigsty自带了一个沙箱环境,沙箱环境的离线安装包默认放置于files目录中,可以从Github Release页面下载。

cd <pigsty>/files/
wget https://github.com/Vonng/pigsty/releases/download/v0.6.0/pkg.tgz 

Pigsty的官方CDN也提供最新版本的 pkg.tgz 下载,只需要执行以下命令即可。

make downlaod
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz -o files/pkg.tgz

离线安装包的具体使用方法,请参考 离线安装 一节。

仅监控模式资源

如果用户希望采用仅监控部署,通常建议使用拷贝监控组件二进制的方式部署监控Agent。因此需要预先将Linux Binary下载并放置于files 目录中。

files
   ^---- pg_exporter    (linux amd64 binary)
   ^---- node_exporter  (linux amd64 binary)

自带脚本 files/download-exporter.sh 会自动互联网上下载最新版本的 node_exporterpg_exporter

5.1.6 - 离线安装

如何离线安装Pigsty

Pigsty是一个复杂的软件系统,为了确保系统的稳定,Pigsty会在初始化过程中从互联网下载所有依赖的软件包并建立本地仓库 (本地Yum源)。

所有依赖的软件总大小约1GB左右,下载速度取决于用户的网络情况。尽管Pigsty已经尽量使用镜像源以加速下载,但少量包的下载仍可能受到防火墙的阻挠,可能出现非常慢的情况。用户可以通过 proxy_env 配置项设置下载代理,以完成首次下载。

如果您使用了不同于CentOS 7.8的操作系统,通常建议用户采用完整的在线下载安装流程。并在首次初始化完成后缓存下载的软件,参见制作离线安装包

如果您希望跳过漫长的下载过程,或者执行控制的元节点没有互联网访问,则可以考虑下载预先打包好的离线安装包

离线安装包的内容

为了快速拉起Pigsty,建议使用离线下载软件包并上传的方式完成安装。

离线安装包收纳了本地Yum源的所有软件包。默认情况下,Pigsty会在基础设施初始化时创建本地Yum源,

{{ repo_home }}
  |---- {{ repo_name }}.repo
  ^---- {{ repo_name}}/repo_complete
  ^---- {{ repo_name}}/**************.rpm

默认情况下,{{ repo_home }} 是Nginx静态文件服务器的根目录,默认为/wwwrepo_name是自定义的本地源名称,默认为pigsty

以默认情况为例,/www/pigsty 目录包含了所有 RPM 软件包,离线安装包实际上就是 /www/pigsty 目录的压缩包 。

离线安装包的原理是,Pigsty在执行基础设施初始化的过程中,会检查本地Yum源相关文件是否已经存在。如果已经存在,则会跳过下载软件包及其依赖的过程。

检测所用的标记文件为{{ repo_home }}/{{ repo_name }}/repo_complete,默认情况下为/www/pigsty/repo_complete,如果该标记文件存在,(通常是由Pigsty在创建本地源之后设置),则表示本地源已经建立完成,可以直接使用。否则,Pigsty会执行常规的下载逻辑。下载完毕后,您可以将该目录压缩复制归档,用于加速其他环境的初始化。

沙箱环境

下载离线安装包

Pigsty自带了一个沙箱环境,沙箱环境的离线安装包默认放置于files目录中,可以从Github Release页面下载。

cd <pigsty>/files/
wget https://github.com/Vonng/pigsty/releases/download/v0.6.0/pkg.tgz 

Pigsty的官方CDN也提供最新版本的pkg.tgz下载,只需要执行以下命令即可。

make downlaod
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz -o files/pkg.tgz

上传离线安装包

使用Pigsty沙箱时,下载离线安装至本地files目录后,则可以直接使用 Makefile 提供的快捷指令make upload上传离线安装包至元节点上。

使用 make upload,也会将本地的离线安装包(Yum缓存)拷贝至元节点上。

# upload rpm cache to meta controller
upload:
	ssh -t meta "sudo rm -rf /tmp/pkg.tgz"
	scp -r files/pkg.tgz meta:/tmp/pkg.tgz
	ssh -t meta "sudo mkdir -p /www/pigsty/; sudo rm -rf /www/pigsty/*; sudo tar -xf /tmp/pkg.tgz --strip-component=1 -C /www/pigsty/"

制作离线安装包

使用 Pigsty 沙箱时,可以通过 make cache 将沙箱中元节点的缓存制为离线安装包,并拷贝到本地。

# cache rpm packages from meta controller
cache:
	rm -rf pkg/* && mkdir -p pkg;
	ssh -t meta "sudo tar -zcf /tmp/pkg.tgz -C /www pigsty; sudo chmod a+r /tmp/pkg.tgz"
	scp -r meta:/tmp/pkg.tgz files/pkg.tgz
	ssh -t meta "sudo rm -rf /tmp/pkg.tgz"

在生产环境离线安装包

在生产环境使用离线安装包前,您必须确保生产环境的操作系统与制作该离线安装包的机器操作系统一致。Pigsty提供的离线安装包默认使用CentOS 7.8。

使用不同操作系统版本的离线安装包可能会出错,也可能不会,我们强烈建议不要这么做。

如果需要在其他版本的操作系统(例如CentOS7.3,7.7等)上运行Pigsty,建议用户在安装有同版本操作系统的沙箱中完整执行一遍初始化流程,不使用离线安装包,而是直接从上游源下载的方式进行初始化。对于没有网络访问的生产环境元节点而言,制作离线软件包是至关重要的。

常规初始化完成后,用户可以通过make cache或手工执行相关命令,将特定操作系统的软件缓存打为离线安装包。供生产环境使用。

从初始化完成的本地元节点构建离线安装包:

tar -zcf /tmp/pkg.tgz -C /www pigsty     # 制作离线软件包

在生产环境使用离线安装包与沙箱环境类似,用户需要将pkg.tgz复制到元节点上,然后将离线安装包解压至目标地址。

这里以默认的 /www/pigsty 为例,将压缩包中的所有内容(RPM包,repo_complete标记文件,repodata 源的元数据库等)解压至目标目录/www/pigsty中,可以使用以下命令。

mkdir -p /www/pigsty/
sudo rm -rf /www/pigsty/*
sudo tar -xf /tmp/pkg.tgz --strip-component=1 -C /www/pigsty/

5.2 - 修改配置

如何根据环境修改Pigsty配置

用户可以通过下列 配置项,对基础设施数据库集群进行配置

通常而言,大多数参数可以直接使用默认值。

基础设施部分需要修改的内容很少,通常涉及到的唯一修改只是对元节点的IP地址进行文本替换。

相比之下,用户需要关注 数据库集群 的定义与配置。数据库集群会部署在数据库节点上,用户必须提供数据库集群的 身份信息与数据库节点的连接信息身份信息 (如集群名,实例号)用于描述数据库集群中的实体,而连接信息 (如IP地址)则用于访问数据库节点。同时,用户应当在创建集群时,一并定义默认业务用户业务数据库

此外,用户也可以通过修改参数,定制默认的访问控制模型模板数据库,对外暴露的服务

数据库定制

在Pigsty中,数据库初始化分为五个部分:

1. 安装数据库软件

安装什么版本,安装哪些插件,用什么用户

通常这一部分的参数不需要修改任何内容即可直接使用(当PG版本升级时需要进行调整)。

2. 供给数据库集群

在哪创建目录,创建什么用途的集群,监听哪些IP端口,采用何种连接池模式

在这一部分中,身份信息 是必选参数,除此之外需要修改默认参数的地方很少。

通过 pg_conf 可以使用默认的数据库集群模板(普通事务型 OLTP/普通分析型 OLAP/核心金融型 CRIT/微型虚机 TINY)。如果希望创建自定义的模板,可以在roles/postgres/templates中克隆默认配置并自行修改后采用,详见 Patroni模板定制

3. 定制数据库模板

创建哪些角色、用户、数据库、模式,启用哪些扩展,如何设置权限与白名单

重点关注,因为这里是业务声明自己所需数据库的地方。用户可以通过数据库模板定制:

  • 业务用户:(使用哪些用户访问数据库?属性,限制,角色,权限……)
  • 业务数据库:(需要什么样的数据库?扩展,模式,参数,权限……)
  • 默认模板数据库 (template1) (模式、扩展、默认权限)
  • 访问控制系统(角色,用户,HBA)
  • 暴露的服务 (使用哪些端口,将流量导向哪些实例,健康检测,权重……)

4. 拉起数据库监控

部署Pigsty监控系统组件

通常情况下不需要调整,但在 仅监控部署 模式下需要重点关注,进行调整。

5. 暴露数据库服务

通过HAproxy/VIP对外提供数据库服务

除非用户希望定义额外的服务,否则不需要调整这里的配置。

配置项参考

大多数参数都提供了合理的默认值,请参考配置项手册按需修改。

No 类目 英文 大类 功能
1 连接参数 connect 基础设施 代理服务器配置,管理对象的连接信息
2 本地仓库 repo 基础设施 定制本地Yum源,离线安装包
3 节点供给 node 基础设施 在普通节点上配置基础设施
4 基础设施 meta 基础设施 在元节点上安装启用基础设施服务
5 元数据库 dcs 基础设施 在所有节点上配置DCS服务(consul/etcd)
6 PG安装 pg-install 数据库-集群 安装PostgreSQL数据库
7 PG供给 pg-provision 数据库-集群 拉起PostgreSQL数据库集群
8 PG模板 pg-template 数据库-模板 定制PostgreSQL数据库内容
9 监控系统 monitor 数据库-附属 安装Pigsty数据库监控系统
10 服务供给 service 数据库-附属 通过Haproxy或VIP对外暴露数据库服务

5.2.1 - 配置身份信息

如何配置数据库集群与节点的身份信息

Pigsty基于 身份标识(Identity) 管理数据库对象。

身份参数

身份参数是定义数据库集群时必须提供的信息,包括:

名称 属性 说明 例子
pg_cluster 必选,集群级别 集群名 pg-test
pg_role 必选,实例级别 实例角色 primary, replica
pg_seq 必选,实例级别 实例序号 1, 2, 3,...
pg_shard 可选,集群级别 分片集群名 test
pg_sindex 可选,集群级别 分片集群号 1

身份参数的内容遵循 Pigsty命名原则 。其中 pg_clusterpg_rolepg_seq 属于核心身份参数,是定义数据库集群所需的最小必须参数集。核心身份参数必须显式指定,手工分配。

  • pg_cluster 标识了集群的名称,在集群层面进行配置,作为集群资源的顶层命名空间。
  • pg_role在实例层面进行配置,标识了实例在集群中扮演的角色。可选值包括:
    • primary:集群中的唯一主库,集群领导者,提供写入服务。
    • replica:集群中的普通从库,承接常规生产只读流量。
    • offline:集群中的离线从库,承接ETL/SAGA/个人用户/交互式/分析型查询。
    • standby:集群中的同步从库,采用同步复制,没有复制延迟。
    • delayed:集群中的延迟从库,显式指定复制延迟,用于执行回溯查询与数据抢救。
  • pg_seq 用于在集群内标识实例,通常采用从0或1开始递增的整数,一旦分配不再更改。
  • pg_shard 用于标识集群所属的上层 分片集簇,只有当集群是水平分片集簇的一员时需要设置。
  • pg_sindex 用于标识集群的分片集簇编号,只有当集群是水平分片集簇的一员时需要设置。
  • pg_instance衍生身份参数,用于唯一命名标识一个数据库实例,其规则为

    {{ pg_cluster }}-{{ pg_seq }} 因为pg_seq是集群内唯一的,因此该标识符全局唯一。

定义数据库集群

以下配置文件定义了一个名为pg-test的集群。集群中包含三个实例:pg-test-1pg-test-2pg-test-3,分别为主库,从库,离线库。该配置是一个集群定义所需的最小配置

  pg-test:
    vars: { pg_cluster: pg-test }
    hosts:
      10.10.10.11: {pg_seq: 1, pg_role: primary}
      10.10.10.12: {pg_seq: 2, pg_role: replica}
      10.10.10.13: {pg_seq: 3, pg_role: offline}

pg_clusterpg_rolepg_seq 属于 身份参数

除了IP地址外,这三个参数是定义一套新的数据库集群的最小必须参数集,如下面的配置所示。

其他参数都可以继承自全局配置或默认配置,但身份参数必须显式指定手工分配

  • pg_cluster 标识了集群的名称,在集群层面进行配置。
  • pg_role 在实例层面进行配置,标识了实例的角色,只有primary角色会进行特殊处理,如果不填,默认为replica角色,此外,还有特殊的delayedoffline角色。
  • pg_seq 用于在集群内标识实例,通常采用从0或1开始递增的整数,一旦分配不再更改。
  • {{ pg_cluster }}-{{ pg_seq }} 被用于唯一标识实例,即pg_instance
  • {{ pg_cluster }}-{{ pg_role }} 用于标识集群内的服务,即pg_service

定义水平分片数据库集簇

pg_shardpg_sindex 用于定义特殊的分片数据库集簇,是可选的身份参数。

假设用户有一个水平分片的 分片数据库集簇(Shard) ,名称为test。这个集簇由四个独立的集群组成:pg-test1, pg-test2pg-test3pg-test-4。则用户可以将 pg_shard: test 的身份绑定至每一个数据库集群,将pg_sindex: 1|2|3|4 分别绑定至每一个数据库集群上。如下所示:

pg-test1:
  vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 1}
  hosts: {10.10.10.10: {pg_seq: 1, pg_role: primary}}
pg-test2:
  vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 2}
  hosts: {10.10.10.11: {pg_seq: 1, pg_role: primary}}
pg-test3:
  vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 3}
  hosts: {10.10.10.12: {pg_seq: 1, pg_role: primary}}
pg-test4:
  vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 4}
  hosts: {10.10.10.13: {pg_seq: 1, pg_role: primary}}

数据库节点与数据库实例

数据库集群需要部署在数据库节点上,Pigsty使用数据库节点与数据库实例一一对应的部署模式。

数据库节点使用IP地址作为标识符,数据库实例使用形如pg-test-1的标识符。 数据库节点(Node)数据库实例(Instance) 的标识符可以相互对应,相互转换。

连接信息

如果说身份参数是数据库集群的标识,那么连接信息就是数据库节点的标识

例如在 定义数据库集群 的例子中,数据库集群pg_cluster = pg-testpg_seq = 1 的数据库实例(pg-test-1)部署在IP地址为10.10.10.11 的数据库节点上。这里的IP地址10.10.10.11就是连接信息

Pigsty使用IP地址作为数据库节点的唯一标识,该IP地址必须是数据库实例监听并对外提供服务的IP地址

这一点非常重要,即使您是通过跳板机或SSH代理访问该数据库节点,也应当在配置时保证这一点。

其他连接方式

如果您的目标机器藏在SSH跳板机之后,或者无法通过ssh ip的方式直接方案,则可以考虑使用Ansible提供的连接参数

例如下面的例子中,ansible_host 通过SSH别名的方式告知Pigsty通过ssh node-1 的方式而不是ssh 10.10.10.11的方式访问目标数据库节点。

  pg-test:
    vars: { pg_cluster: pg-test }
    hosts:
      10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
      10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
      10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}

通过这种方式,用户可以自由指定数据库节点的连接方式,并将连接配置保存在管理用户的~/.ssh/config中。

接下来

完成身份参数配置后,用户可以对数据库集群进行进一步定制。

5.2.2 - 定制业务用户

配置Pigsty中的业务用户

可以通过 pg_users 定制集群特定的业务用户。该配置项通常用于在数据库集群层面定义业务用户,与 pg_default_roles 采用相同的形式。

样例

一个完整的用户定义由一个JSON/YAML对象构成,如下所示:

# complete example of user/role definition for production user
- name: dbuser_meta               # example production user have read-write access
  password: DBUser.Meta           # example user's password, can be encrypted
  login: true                     # can login, true by default (should be false for role)
  superuser: false                # is superuser? false by default
  createdb: false                 # can create database? false by default
  createrole: false               # can create role? false by default
  inherit: true                   # can this role use inherited privileges?
  replication: false              # can this role do replication? false by default
  bypassrls: false                # can this role bypass row level security? false by default
  connlimit: -1                   # connection limit, -1 disable limit
  expire_at: '2030-12-31'         # 'timestamp' when this role is expired
  expire_in: 365                  # now + n days when this role is expired (OVERWRITE expire_at)
  roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly
  pgbouncer: true                 # add this user to pgbouncer? false by default (true for production user)
  parameters:                     # user's default search path
  	search_path: public
  comment: test user

说明

一个用户对象由以下键值构成,只有用户名是必选项,其他参数均为可选,不添加相应键则会使用默认值。

  • name(string) : 用户名称,必选项

  • password(string) : 用户的密码,可以是以md5, sha开头的密文密码。

  • login(bool) :用户是否可以登录,默认为真;如果这里是业务角色,应当将其设置为假。

  • superuser(bool) : 用户是否具有超级用户权限,默认为假

  • createdb(bool) : 用户是否具有创建数据库的权限,默认没有

  • createrole(bool) : 用户是否具有创建新角色的权限,默认没有。

  • inherit(bool) : 用户是否继承其角色的权限?默认继承

  • replication(bool) : 用户是否具有复制权限?默认没有

  • bypassrls(bool) : 用户是否可以绕过行级安全策略?默认不行

  • connlimit(number) : 是否限制用户的连接数量?留空或-1不限,默认不限

  • expire_at(date) : 用户过期时间,默认不过期

  • expire_in(number) : 自创建n天后用户将过期,如果设置将覆盖expire_at

  • roles(string[]) : 用户所属的角色/用户组

  • pgbouncer(bool) : 是否将用户加入连接池用户列表中?默认不加入,通过连接池访问的生产用户应当显式设置此项为真,交互式个人用户/ETL用户应当设置未假或留空。

  • parameters(dict) : 针对用户修改配置参数,k-v结构

  • comment(string) : 用户备注说明信息

Pigsty建议采用dbuser_dbrole_ 的前缀区分用户角色,用户的login选项应当设置为true以允许登录,角色的login选项应当设置为false以拒绝登录。

pg_userspg_default_roles 都是 user 对象构成的数组,两者会依照定义顺序依次创建,因此后创建的用户可以属于先前创建的角色。

实现

pg_default_roles 中的用户会渲染为集群主库上的单个SQL文件:

/pg/tmp/pg-init-roles.sql

pg_users 中的用户会渲染为集群主库上的SQL文件,每个用户一个:

/pg/tmp/pg-db-{{ database.name }}.sql

并依次执行。一个实际渲染的例子如下所示:

----------------------------------------------------------------------
-- File      :   pg-user-dbuser_meta.sql
-- Path      :   /pg/tmp/pg-user-dbuser_meta.sql
-- Time      :   2021-03-22 22:52
-- Note      :   managed by ansible, DO NOT CHANGE
-- Desc      :   creation sql script for user dbuser_meta
----------------------------------------------------------------------

--==================================================================--
--                            EXECUTION                             --
--==================================================================--
-- run as dbsu (postgres by default)
-- createuser -w -p 5432 'dbuser_meta';
-- psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_meta.sql

--==================================================================--
--                           CREATE USER                            --
--==================================================================--
CREATE USER "dbuser_meta" ;

--==================================================================--
--                           ALTER USER                             --
--==================================================================--
-- options
ALTER USER "dbuser_meta" ;

-- password
ALTER USER "dbuser_meta" PASSWORD 'DBUser.Meta';

-- expire
-- expire at 2022-03-22 in 365 days since 2021-03-22
ALTER USER "dbuser_meta" VALID UNTIL '2022-03-22';

-- conn limit
-- remove conn limit
-- ALTER USER "dbuser_meta" CONNECTION LIMIT -1;

-- parameters
ALTER USER "dbuser_meta" SET search_path = public;

-- comment
COMMENT ON ROLE "dbuser_meta" IS 'test user';


--==================================================================--
--                           GRANT ROLE                             --
--==================================================================--
GRANT "dbrole_readwrite" TO "dbuser_meta";


--==================================================================--
--                          PGBOUNCER USER                          --
--==================================================================--
-- user will not be added to pgbouncer user list by default,
-- unless pgbouncer is explicitly set to 'true', which means production user

-- User 'dbuser_meta' will be added to /etc/pgbouncer/userlist.txt via
-- /pg/bin/pgbouncer-create-user 'dbuser_meta' 'DBUser.Meta'


--==================================================================--

连接池

Pgbouncer有自己的用户定义文件,通常是PG用户的一个子集。

在Pigsty中,Pgbouncer的用户定义文件位于:/etc/pgbouncer/userlist.txt

$ cat userlist.txt
"postgres" ""
"dbuser_monitor" "md57bbcca538453edba8be026725c530b05"

只有在该文件中出现的用户,才可以通过PGbouncer访问数据库。

只有pgbouncer选项显式配置为true的用户,会被添加至连接池用户列表中。

修改该配置文件需要reload Pgbouncer方可生效。

导出

以下SQL查询可以使用JSON格式导出数据库中的用户(但需要少量修正)

SELECT row_to_json(u) FROM
    (SELECT r.rolname AS name,
            a.rolpassword AS password,
            r.rolcanlogin AS login,
            r.rolsuper AS superuser,
            r.rolcreatedb AS createdb,
            r.rolcreaterole AS createrole,
            r.rolinherit AS inherit,
            r.rolreplication AS replication,
            r.rolbypassrls AS bypassrls,
            r.rolconnlimit AS connlimit,
            r.rolvaliduntil AS expire_at,
            setconfig AS parameters,
            ARRAY(SELECT b.rolname FROM pg_catalog.pg_auth_members m JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid) WHERE m.member = r.oid) as roles,
            pg_catalog.shobj_description(r.oid, 'pg_authid') AS comment
     FROM pg_catalog.pg_roles r
              LEFT JOIN pg_db_role_setting rs ON r.oid = rs.setrole
              LEFT JOIN pg_authid a ON r.oid = a.oid
     WHERE r.rolname !~ '^pg_'
     ORDER BY 1) u;

创建

请尽可能通过声明的方式创建业务用户与业务数据库,而不是在数据库中手工创建。因为业务用户与业务数据库需要同时在数据库与连接池中进行变更。详情请参考:创建业务用户

在运行中的数据库集群中创建新的业务用户,首先应在集群级配置中添加新用户的定义,例如在pg-test.vars.pg_users加入新的用户对象。然后可以使用pgsql-createuser剧本创建用户:

例如,在pg-test 集群中创建或修改名为dbuser_test的用户,可以执行以下命令。

./pgsql-createuser.yml -l <pg_cluster>  -e pg_user=dbuser_test

如果dbuser_test的定义不存在,则会在检查阶段报错。

5.2.3 - 定制业务数据库

配置Pigsty中的业务数据库

可以通过 pg_databases 定制集群特定的业务数据库。

样例

一个完整的数据库定义由一个JSON/YAML对象构成,如下所示:

- name: meta                      # name is the only required field for a database
  owner: postgres                 # optional, database owner
  template: template1             # optional, template1 by default
  encoding: UTF8                # optional, UTF8 by default , must same as template database, leave blank to set to db default
  locale: C                     # optional, C by default , must same as template database, leave blank to set to db default
  lc_collate: C                 # optional, C by default , must same as template database, leave blank to set to db default
  lc_ctype: C                   # optional, C by default , must same as template database, leave blank to set to db default
  allowconn: true                 # optional, true by default, false disable connect at all
  revokeconn: false               # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
  tablespace: pg_default          # optional, 'pg_default' is the default tablespace
  connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
  schemas: [public,monitor]       # create additional schema
  extensions:                     # optional, extension name and where to create
    - {name: postgis, schema: public}
  parameters:                     # optional, extra parameters with ALTER DATABASE
    enable_partitionwise_join: true
  pgbouncer: true                 # optional, add this database to pgbouncer list? true by default
  comment: pigsty meta database   # optional, comment string for database

说明

一个数据库对象由以下键值构成,只有数据库名是必选项,其他参数均为可选,不添加相应键则会使用默认值。

  • name(string) : 数据库名称,必选项

  • owner(string) :数据库的属主,必须为已存在的用户(用户先于数据库创建)。

  • template(string):创建数据库时所使用的模板,默认为template1

  • encoding(enum):数据库使用的字符集编码,默认为UTF8,必须与实例和模板数据库保持一致。

  • locale(enum):数据库使用的本地化规则,默认与实例和模板数据库保持一致,建议不要修改。

  • lc_collate(enum):数据库使用的本地化字符串排序规则,默认为与实例和模板数据库保持一致,建议不要修改。

  • lc_ctype(enum):数据库使用的本地化规则,默认与实例和模板数据库保持一致,建议不要修改。

  • allowconn(bool):是否允许连接至数据库,默认允许。

  • revokeconn(bool):是否回收PUBLIC默认连接至数据库的权限?默认不回收,建议在多DB实例上开启。

  • tablespace(string):数据库的默认表空间,默认为pg_default

  • connlimit(number) : 是否限制数据库的连接数量?留空或-1不限,默认不限

  • schemas(string[]):需要在该数据库中额外创建的模式(默认会创建monitor模式)

  • extensions(extension[]):数据库中额外安装的扩展,每个扩展包括nameschema两个字段。

    例如{name: postgis, schema: public} 指示Pigsty在该数据库的public模式下安装PostGIS扩展

  • pgbouncer(bool) : 是否将数据库加入连接池DB列表中?默认加入

  • parameters(dict) : 针对数据库额外修改配置参数,k-v结构

  • comment(string) : 数据库备注说明信息

实现

pg_databases 是数据库定义对象构成的数组,会依次渲染为主库上的SQL文件:

/pg/tmp/pg-db-{{ database.name }}.sql

并依次执行。一个实际渲染的例子如下所示:

----------------------------------------------------------------------
-- File      :   pg-db-meta.sql
-- Path      :   /pg/tmp/pg-db-meta.sql
-- Time      :   2021-03-22 22:52
-- Note      :   managed by ansible, DO NOT CHANGE
-- Desc      :   creation sql script for database meta
----------------------------------------------------------------------


--==================================================================--
--                            EXECUTION                             --
--==================================================================--
-- run as dbsu (postgres by default)
-- createdb -w -p 5432 'meta';
-- psql meta -p 5432 -AXtwqf /pg/tmp/pg-db-meta.sql

--==================================================================--
--                         CREATE DATABASE                          --
--==================================================================--
-- create database with following commands
-- CREATE DATABASE "meta" ;
-- following commands are executed within database "meta"


--==================================================================--
--                         ALTER DATABASE                           --
--==================================================================--
-- owner

-- tablespace

-- allow connection
ALTER DATABASE "meta" ALLOW_CONNECTIONS True;

-- connection limit
ALTER DATABASE "meta" CONNECTION LIMIT -1;

-- parameters
ALTER DATABASE "meta" SET enable_partitionwise_join = True;

-- comment
COMMENT ON DATABASE "meta" IS 'pigsty meta database';


--==================================================================--
--                       REVOKE/GRANT CONNECT                       --
--==================================================================--

--==================================================================--
--                       REVOKE/GRANT CREATE                        --
--==================================================================--
-- revoke create (schema) privilege from public
REVOKE CREATE ON DATABASE "meta" FROM PUBLIC;

-- only admin role have create privilege
GRANT CREATE ON DATABASE "meta" TO "dbrole_admin";

-- revoke public schema creation
REVOKE CREATE ON SCHEMA public FROM PUBLIC;

-- admin can create objects in public schema
GRANT CREATE ON SCHEMA public TO "dbrole_admin";


--==================================================================--
--                          CREATE SCHEMAS                          --
--==================================================================--
-- create schemas


--==================================================================--
--                        CREATE EXTENSIONS                        --
--==================================================================--
-- create extensions
CREATE EXTENSION IF NOT EXISTS "postgis" WITH SCHEMA "public";


--==================================================================--
--                        PGBOUNCER DATABASE                        --
--==================================================================--
-- database will be added to pgbouncer database list by default,
-- unless pgbouncer is explicitly set to 'false', means hidden database

-- Database 'meta' will be added to /etc/pgbouncer/database.txt via
-- /pg/bin/pgbouncer-create-db 'meta'


--==================================================================--

连接池

Pgbouncer有自己的数据库定义文件,通常是PG数据库的一个子集。

在Pigsty中,Pgbouncer的数据库定义文件位于:/etc/pgbouncer/database.txt

$ cat database.txt
meta = host=/var/run/postgresql

只有在该文件中出现的数据库,才可以通过PGbouncer访问。pgbouncer选项显式配置为false的数据库不会被添加至连接池DB列表中。修改该配置文件需要reload Pgbouncer方可生效。

导出

以下SQL查询可以以JSON格式导出当前数据库的定义(需少量修正)

psql  -AXtw  <<-EOF
SELECT jsonb_pretty(row_to_json(final)::JSONB)
FROM (SELECT datname               AS name,
             datdba::RegRole::Text AS owner,
             encoding,
             datcollate            AS lc_collate,
             datctype              AS lc_ctype,
             datallowconn          AS allowconn,
             datconnlimit          AS connlimit,
             (SELECT json_agg(nspname) AS schemas FROM pg_namespace WHERE nspname !~ '^pg_' AND nspname NOT IN ('information_schema', 'monitor', 'repack')),
             (SELECT json_agg(row_to_json(ex)) AS extensions FROM (SELECT extname, extnamespace::RegNamespace AS schema FROM pg_extension WHERE extnamespace::RegNamespace::TEXT NOT IN ('information_schema', 'monitor', 'repack', 'pg_catalog')) ex),
             (SELECT json_object_agg(substring(cfg, 0 , strpos(cfg, '=')), substring(cfg, strpos(cfg, '=')+1)) AS value  FROM
                 (SELECT unnest(setconfig) AS cfg FROM pg_db_role_setting s JOIN pg_database d ON d.oid = s.setdatabase WHERE d.datname = current_database()) cf
             )
      FROM pg_database WHERE datname = current_database()
     ) final;
EOF

创建

请尽可能通过声明的方式创建业务数据库,而不是在数据库中手工创建。因为业务用户与业务数据库需要同时在数据库与连接池中进行变更。

在运行中的数据库集群中创建新的业务数据库,首先应当在集群级配置中添加新数据库的定义,例如在pg-test.vars.pg_databases加入新的数据库对象。然后可以使用pgsql-createdb剧本创建数据库:

例如,在pg-test 集群中创建或修改名为test的数据库,可以执行以下命令。

./pgsql-createdb.yml -l <pg_cluster>  -e pg_database=test

如果数据库test的定义不存在,则会在检查阶段报错。

5.2.4 - 定制模板数据库

定制Pigsty中的模板数据库

相关参数

用户可以使用 PG模板 配置项,对集群中的模板数据库 template1 进行定制。

通过这种方式确保任何在该数据库集群中新创建的数据库都带有相同的默认配置:模式,扩展,默认权限。

名称 类型 层级 说明
pg_init string G/C 自定义PG初始化脚本
pg_replication_username string G PG复制用户
pg_replication_password string G PG复制用户的密码
pg_monitor_username string G PG监控用户
pg_monitor_password string G PG监控用户密码
pg_admin_username string G PG管理用户
pg_admin_password string G PG管理用户密码
pg_default_roles role[] G 默认创建的角色与用户
pg_default_privilegs string[] G 数据库默认权限配置
pg_default_schemas string[] G 默认创建的模式
pg_default_extensions extension[] G 默认安装的扩展
pg_hba_rules rule[] G 全局HBA规则
pg_hba_rules_extra rule[] C/I 集群/实例特定的HBA规则
pgbouncer_hba_rules rule[] G/C Pgbouncer全局HBA规则
pgbouncer_hba_rules_extra rule[] G/C Pgbounce特定HBA规则
^---/pg/bin/pg-init
          |
          ^---(1)--- /pg/tmp/pg-init-roles.sql
          ^---(2)--- /pg/tmp/pg-init-template.sql
          ^---(3)--- <other customize logic in pg-init>

# 业务用户与数据库并不是在模版定制中创建的
^-------------(4)--- /pg/tmp/pg-user-{{ user.name }}.sql
^-------------(5)--- /pg/tmp/pg-db-{{ db.name }}.sql

pg-init

pg-init是用于自定义初始化模板的Shell脚本路径,该脚本将以postgres用户身份,仅在主库上执行,执行时数据库集群主库已经被拉起,可以执行任意Shell命令,或通过psql执行任意SQL命令。

如果不指定该配置项,Pigsty会使用默认的pg-init Shell脚本,如下所示。

#!/usr/bin/env bash
set -uo pipefail


#==================================================================#
#                          Default Roles                           #
#==================================================================#
psql postgres -qAXwtf /pg/tmp/pg-init-roles.sql


#==================================================================#
#                          System Template                         #
#==================================================================#
# system default template
psql template1 -qAXwtf /pg/tmp/pg-init-template.sql

# make postgres same as templated database (optional)
psql postgres  -qAXwtf /pg/tmp/pg-init-template.sql



#==================================================================#
#                          Customize Logic                         #
#==================================================================#
# add your template logic here

如果用户需要执行复杂的定制逻辑,可在该脚本的基础上进行追加。注意pg-init 用于定制数据库集群,通常这是通过修改 模板数据库 实现的。在该脚本执行时,数据库集群已经启动,但业务用户与业务数据库尚未创建。因此模板数据库的修改会反映在默认定义的业务数据库中。

pg-init-roles.sql

pg_default_roles 中可以自定义全局统一的角色体系。其中的定义会被渲染为/pg/tmp/pg-init-roles.sqlpg-test集群中的渲染样例如下所示:

```sql ---------------------------------------------------------------------- -- File : pg-init-roles.sql -- Path : /pg/tmp/pg-init-roles -- Time : 2021-03-16 21:24 -- Note : managed by ansible, DO NOT CHANGE -- Desc : creation sql script for default roles ----------------------------------------------------------------------

–###################################################################– – dbrole_readonly – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –no-login’dbrole_readonly'; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_readonly.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbrole_readonly” NOLOGIN;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbrole_readonly” NOLOGIN;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “dbrole_readonly” IS ‘role for global readonly access’;

–==================================================================– – GRANT ROLE – –==================================================================–

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbrole_readonly’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – dbrole_readwrite – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –no-login’dbrole_readwrite'; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_readwrite.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbrole_readwrite” NOLOGIN;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbrole_readwrite” NOLOGIN;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “dbrole_readwrite” IS ‘role for global read-write access’;

–==================================================================– – GRANT ROLE – –==================================================================– GRANT “dbrole_readonly” TO “dbrole_readwrite”;

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbrole_readwrite’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – dbrole_offline – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –no-login’dbrole_offline'; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_offline.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbrole_offline” NOLOGIN;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbrole_offline” NOLOGIN;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “dbrole_offline” IS ‘role for restricted read-only access (offline instance)';

–==================================================================– – GRANT ROLE – –==================================================================–

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbrole_offline’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – dbrole_admin – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –no-login’dbrole_admin’; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_admin.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbrole_admin” NOLOGIN BYPASSRLS;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbrole_admin” NOLOGIN BYPASSRLS;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “dbrole_admin” IS ‘role for object creation’;

–==================================================================– – GRANT ROLE – –==================================================================– GRANT “dbrole_readwrite” TO “dbrole_admin”; GRANT “pg_monitor” TO “dbrole_admin”; GRANT “pg_signal_backend” TO “dbrole_admin”;

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbrole_admin’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – postgres – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –superuser’postgres'; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-postgres.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “postgres” SUPERUSER;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “postgres” SUPERUSER;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “postgres” IS ‘system superuser’;

–==================================================================– – GRANT ROLE – –==================================================================–

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘postgres’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – replicator – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –replication’replicator'; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-replicator.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “replicator” REPLICATION BYPASSRLS;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “replicator” REPLICATION BYPASSRLS;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “replicator” IS ‘system replicator’;

–==================================================================– – GRANT ROLE – –==================================================================– GRANT “pg_monitor” TO “replicator”; GRANT “dbrole_readonly” TO “replicator”;

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘replicator’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – dbuser_monitor – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 ‘dbuser_monitor’; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_monitor.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbuser_monitor” ;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbuser_monitor” ;

– password

– expire

– conn limit ALTER USER “dbuser_monitor” CONNECTION LIMIT 16;

– parameters

– comment COMMENT ON ROLE “dbuser_monitor” IS ‘system monitor user’;

–==================================================================– – GRANT ROLE – –==================================================================– GRANT “pg_monitor” TO “dbuser_monitor”; GRANT “dbrole_readonly” TO “dbuser_monitor”;

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbuser_monitor’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – dbuser_admin – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 –superuser’dbuser_admin'; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_admin.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbuser_admin” SUPERUSER BYPASSRLS;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbuser_admin” SUPERUSER BYPASSRLS;

– password

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “dbuser_admin” IS ‘system admin user’;

–==================================================================– – GRANT ROLE – –==================================================================– GRANT “dbrole_admin” TO “dbuser_admin”;

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbuser_admin’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–###################################################################– – dbuser_stats – –###################################################################– – run as dbsu (postgres by default) – createuser -w -p 5432 ‘dbuser_stats’; – psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_stats.sql

–==================================================================– – CREATE USER – –==================================================================– CREATE USER “dbuser_stats” ;

–==================================================================– – ALTER USER – –==================================================================– – options ALTER USER “dbuser_stats” ;

– password ALTER USER “dbuser_stats” PASSWORD ‘DBUser.Stats’;

– expire

– conn limit

– parameters

– comment COMMENT ON ROLE “dbuser_stats” IS ‘business offline user for offline queries and ETL’;

–==================================================================– – GRANT ROLE – –==================================================================– GRANT “dbrole_offline” TO “dbuser_stats”;

–==================================================================– – PGBOUNCER USER – –==================================================================– – user will not be added to pgbouncer user list by default, – unless pgbouncer is explicitly set to ‘true’, which means production user

– User ‘dbuser_stats’ will NOT be added to /etc/pgbouncer/userlist.txt

–==================================================================–

–==================================================================– – PASSWORD OVERWRITE – –==================================================================– ALTER ROLE “replicator” PASSWORD ‘DBUser.Replicator’; ALTER ROLE “dbuser_monitor” PASSWORD ‘DBUser.Monitor’; ALTER ROLE “dbuser_admin” PASSWORD ‘DBUser.Admin’; –==================================================================–


</details>





## pg-init-template.sql

[`pg-init-template.sql`](https://github.com/Vonng/pigsty/blob/master/roles/postgres/templates/pg-init-template.sql) 是用于初始化 `template1` 数据的脚本模板。PG模板中的变量,大抵都是通过该SQL模板渲染为最终执行的SQL命令。该模板会被渲染至集群主库的`/pg/tmp/pg-init-template.sql`并执行。

Pigsty强烈建议通过提供自定义的`pg-init`脚本完成复杂的定制。如无必要,尽量不要改动`pg-init-template.sql`中的原有逻辑。

```sql
--==================================================================--
--                           Executions                             --
--==================================================================--
-- psql template1 -AXtwqf /pg/tmp/pg-init-template.sql
-- this sql scripts is responsible for post-init procedure
-- it will
--    * create system users such as replicator, monitor user, admin user
--    * create system default roles
--    * create schema, extensions in template1 & postgres
--    * create monitor views in template1 & postgres


--==================================================================--
--                          Default Privileges                      --
--==================================================================--
{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE {{ pg_dbsu }} {{ priv }};
{% endfor %}

{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE {{ pg_admin_username }} {{ priv }};
{% endfor %}

-- for additional business admin, they can SET ROLE to dbrole_admin
{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" {{ priv }};
{% endfor %}

--==================================================================--
--                              Schemas                             --
--==================================================================--
{% for schema_name in pg_default_schemas %}
CREATE SCHEMA IF NOT EXISTS "{{ schema_name }}";
{% endfor %}

-- revoke public creation
REVOKE CREATE ON SCHEMA public FROM PUBLIC;

--==================================================================--
--                             Extensions                           --
--==================================================================--
{% for extension in pg_default_extensions %}
CREATE EXTENSION IF NOT EXISTS "{{ extension.name }}"{% if 'schema' in extension %} WITH SCHEMA "{{ extension.schema }}"{% endif %};
{% endfor %}

默认的模板初始化逻辑还会创建监控模式,扩展与相关视图。

```sql --==================================================================-- -- Monitor Views -- --==================================================================--

– cleanse

CREATE SCHEMA IF NOT EXISTS monitor; GRANT USAGE ON SCHEMA monitor TO “{{ pg_monitor_username }}"; GRANT USAGE ON SCHEMA monitor TO “{{ pg_admin_username }}"; GRANT USAGE ON SCHEMA monitor TO “{{ pg_replication_username }}";

DROP VIEW IF EXISTS monitor.pg_table_bloat_human; DROP VIEW IF EXISTS monitor.pg_index_bloat_human; DROP VIEW IF EXISTS monitor.pg_table_bloat; DROP VIEW IF EXISTS monitor.pg_index_bloat; DROP VIEW IF EXISTS monitor.pg_session; DROP VIEW IF EXISTS monitor.pg_kill; DROP VIEW IF EXISTS monitor.pg_cancel; DROP VIEW IF EXISTS monitor.pg_seq_scan;


– Table bloat estimate

CREATE OR REPLACE VIEW monitor.pg_table_bloat AS SELECT CURRENT_CATALOG AS datname, nspname, relname , bs * tblpages AS size, CASE WHEN tblpages - est_tblpages_ff > 0 THEN (tblpages - est_tblpages_ff)/tblpages::FLOAT ELSE 0 END AS ratio FROM ( SELECT ceil( reltuples / ( (bs-page_hdr)fillfactor/(tpl_size100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff, tblpages, fillfactor, bs, tblid, nspname, relname, is_na FROM ( SELECT ( 4 + tpl_hdr_size + tpl_data_size + (2 * ma) - CASE WHEN tpl_hdr_size % ma = 0 THEN ma ELSE tpl_hdr_size % ma END - CASE WHEN ceil(tpl_data_size)::INT % ma = 0 THEN ma ELSE ceil(tpl_data_size)::INT % ma END ) AS tpl_size, (heappages + toastpages) AS tblpages, heappages, toastpages, reltuples, toasttuples, bs, page_hdr, tblid, nspname, relname, fillfactor, is_na FROM ( SELECT tbl.oid AS tblid, ns.nspname , tbl.relname, tbl.reltuples, tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages, coalesce(toast.reltuples, 0) AS toasttuples, coalesce(substring(array_to_string(tbl.reloptions, ' ‘) FROM ‘fillfactor=([0-9]+)')::smallint, 100) AS fillfactor, current_setting(‘block_size’)::numeric AS bs, CASE WHEN version()~‘mingw32’ OR version()~‘64-bit|x86_64|ppc64|ia64|amd64’ THEN 8 ELSE 4 END AS ma, 24 AS page_hdr, 23 + CASE WHEN MAX(coalesce(s.null_frac,0)) > 0 THEN ( 7 + count(s.attname) ) / 8 ELSE 0::int END + CASE WHEN bool_or(att.attname = ‘oid’ and att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size, sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0) ) AS tpl_data_size, bool_or(att.atttypid = ‘pg_catalog.name’::regtype) OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.attname) AS is_na FROM pg_attribute AS att JOIN pg_class AS tbl ON att.attrelid = tbl.oid JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid WHERE NOT att.attisdropped AND tbl.relkind = ‘r’ AND nspname NOT IN (‘pg_catalog’,‘information_schema’) GROUP BY 1,2,3,4,5,6,7,8,9,10 ) AS s ) AS s2 ) AS s3 WHERE NOT is_na; COMMENT ON VIEW monitor.pg_table_bloat IS ‘postgres table bloat estimate’;


– Index bloat estimate

CREATE OR REPLACE VIEW monitor.pg_index_bloat AS SELECT CURRENT_CATALOG AS datname, nspname, idxname AS relname, relpages::BIGINT * bs AS size, COALESCE((relpages - ( reltuples * (6 + ma - (CASE WHEN index_tuple_hdr % ma = 0 THEN ma ELSE index_tuple_hdr % ma END) + nulldatawidth + ma - (CASE WHEN nulldatawidth % ma = 0 THEN ma ELSE nulldatawidth % ma END)) / (bs - pagehdr)::FLOAT + 1 )), 0) / relpages::FLOAT AS ratio FROM ( SELECT nspname, idxname, reltuples, relpages, current_setting(‘block_size’)::INTEGER AS bs, (CASE WHEN version() ~ ‘mingw32’ OR version() ~ ‘64-bit|x86_64|ppc64|ia64|amd64’ THEN 8 ELSE 4 END) AS ma, 24 AS pagehdr, (CASE WHEN max(COALESCE(pg_stats.null_frac, 0)) = 0 THEN 2 ELSE 6 END) AS index_tuple_hdr, sum((1.0 - COALESCE(pg_stats.null_frac, 0.0)) * COALESCE(pg_stats.avg_width, 1024))::INTEGER AS nulldatawidth FROM pg_attribute JOIN ( SELECT pg_namespace.nspname, ic.relname AS idxname, ic.reltuples, ic.relpages, pg_index.indrelid, pg_index.indexrelid, tc.relname AS tablename, regexp_split_to_table(pg_index.indkey::TEXT, ' ‘) :: INTEGER AS attnum, pg_index.indexrelid AS index_oid FROM pg_index JOIN pg_class ic ON pg_index.indexrelid = ic.oid JOIN pg_class tc ON pg_index.indrelid = tc.oid JOIN pg_namespace ON pg_namespace.oid = ic.relnamespace JOIN pg_am ON ic.relam = pg_am.oid WHERE pg_am.amname = ‘btree’ AND ic.relpages > 0 AND nspname NOT IN (‘pg_catalog’, ‘information_schema’) ) ind_atts ON pg_attribute.attrelid = ind_atts.indexrelid AND pg_attribute.attnum = ind_atts.attnum JOIN pg_stats ON pg_stats.schemaname = ind_atts.nspname AND ((pg_stats.tablename = ind_atts.tablename AND pg_stats.attname = pg_get_indexdef(pg_attribute.attrelid, pg_attribute.attnum, TRUE)) OR (pg_stats.tablename = ind_atts.idxname AND pg_stats.attname = pg_attribute.attname)) WHERE pg_attribute.attnum > 0 GROUP BY 1, 2, 3, 4, 5, 6 ) est LIMIT 512; COMMENT ON VIEW monitor.pg_index_bloat IS ‘postgres index bloat estimate (btree-only)';


– table bloat pretty

CREATE OR REPLACE VIEW monitor.pg_table_bloat_human AS SELECT nspname || ‘.’ || relname AS name, pg_size_pretty(size) AS size, pg_size_pretty((size * ratio)::BIGINT) AS wasted, round(100 * ratio::NUMERIC, 2) as ratio FROM monitor.pg_table_bloat ORDER BY wasted DESC NULLS LAST; COMMENT ON VIEW monitor.pg_table_bloat_human IS ‘postgres table bloat pretty’;


– index bloat pretty

CREATE OR REPLACE VIEW monitor.pg_index_bloat_human AS SELECT nspname || ‘.’ || relname AS name, pg_size_pretty(size) AS size, pg_size_pretty((size * ratio)::BIGINT) AS wasted, round(100 * ratio::NUMERIC, 2) as ratio FROM monitor.pg_index_bloat; COMMENT ON VIEW monitor.pg_index_bloat_human IS ‘postgres index bloat pretty’;


– pg session

CREATE OR REPLACE VIEW monitor.pg_session AS SELECT coalesce(datname, ‘all’) AS datname, numbackends, active, idle, ixact, max_duration, max_tx_duration, max_conn_duration FROM ( SELECT datname, count() AS numbackends, count() FILTER ( WHERE state = ‘active’ ) AS active, count() FILTER ( WHERE state = ‘idle’ ) AS idle, count() FILTER ( WHERE state = ‘idle in transaction’ OR state = ‘idle in transaction (aborted)’ ) AS ixact, max(extract(epoch from now() - state_change)) FILTER ( WHERE state = ‘active’ ) AS max_duration, max(extract(epoch from now() - xact_start)) AS max_tx_duration, max(extract(epoch from now() - backend_start)) AS max_conn_duration FROM pg_stat_activity WHERE backend_type = ‘client backend’ AND pid <> pg_backend_pid() GROUP BY ROLLUP (1) ORDER BY 1 NULLS FIRST ) t; COMMENT ON VIEW monitor.pg_session IS ‘postgres session stats’;


– pg kill

CREATE OR REPLACE VIEW monitor.pg_kill AS SELECT pid, pg_terminate_backend(pid) AS killed, datname AS dat, usename AS usr, application_name AS app, client_addr AS addr, state, extract(epoch from now() - state_change) AS query_time, extract(epoch from now() - xact_start) AS xact_time, extract(epoch from now() - backend_start) AS conn_time, substring(query, 1, 40) AS query FROM pg_stat_activity WHERE backend_type = ‘client backend’ AND pid <> pg_backend_pid(); COMMENT ON VIEW monitor.pg_kill IS ‘kill all backend session’;


– quick cancel view

DROP VIEW IF EXISTS monitor.pg_cancel; CREATE OR REPLACE VIEW monitor.pg_cancel AS SELECT pid, pg_cancel_backend(pid) AS cancel, datname AS dat, usename AS usr, application_name AS app, client_addr AS addr, state, extract(epoch from now() - state_change) AS query_time, extract(epoch from now() - xact_start) AS xact_time, extract(epoch from now() - backend_start) AS conn_time, substring(query, 1, 40) FROM pg_stat_activity WHERE state = ‘active’ AND backend_type = ‘client backend’ and pid <> pg_backend_pid(); COMMENT ON VIEW monitor.pg_cancel IS ‘cancel backend queries’;


– seq scan

DROP VIEW IF EXISTS monitor.pg_seq_scan; CREATE OR REPLACE VIEW monitor.pg_seq_scan AS SELECT schemaname AS nspname, relname, seq_scan, seq_tup_read, seq_tup_read / seq_scan AS seq_tup_avg, idx_scan, n_live_tup + n_dead_tup AS tuples, n_live_tup / (n_live_tup + n_dead_tup) AS dead_ratio FROM pg_stat_user_tables WHERE seq_scan > 0 and (n_live_tup + n_dead_tup) > 0 ORDER BY seq_tup_read DESC LIMIT 50; COMMENT ON VIEW monitor.pg_seq_scan IS ‘table that have seq scan’;

{% if pg_version >= 13 %}

– pg_shmem auxiliary function – PG 13 ONLY!

CREATE OR REPLACE FUNCTION monitor.pg_shmem() RETURNS SETOF pg_shmem_allocations AS $$ SELECT * FROM pg_shmem_allocations;$$ LANGUAGE SQL SECURITY DEFINER; COMMENT ON FUNCTION monitor.pg_shmem() IS ‘security wrapper for pg_shmem’; {% endif %}

–==================================================================– – Customize Logic – –==================================================================– – This script will be execute on primary instance among a newly created – postgres cluster. it will be executed as dbsu on template1 database – put your own customize logic here – make sure they are idempotent


</details>



一个实际的渲染样例(`pg-test`)如下所示:

<details>


```sql
----------------------------------------------------------------------
-- File      :   pg-init-template.sql
-- Ctime     :   2018-10-30
-- Mtime     :   2021-02-27
-- Desc      :   init postgres cluster template
-- Path      :   /pg/tmp/pg-init-template.sql
-- Author    :   Vonng(fengruohang@outlook.com)
-- Copyright (C) 2018-2021 Ruohang Feng
----------------------------------------------------------------------


--==================================================================--
--                           Executions                             --
--==================================================================--
-- psql template1 -AXtwqf /pg/tmp/pg-init-template.sql
-- this sql scripts is responsible for post-init procedure
-- it will
--    * create system users such as replicator, monitor user, admin user
--    * create system default roles
--    * create schema, extensions in template1 & postgres
--    * create monitor views in template1 & postgres


--==================================================================--
--                          Default Privileges                      --
--==================================================================--
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT                        ON TABLES    TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT                        ON SEQUENCES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT USAGE                         ON SCHEMAS   TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT                        ON TABLES    TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT                        ON SEQUENCES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT CREATE                        ON SCHEMAS   TO dbrole_admin;

ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT                        ON TABLES    TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT                        ON SEQUENCES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT USAGE                         ON SCHEMAS   TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT                        ON TABLES    TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT                        ON SEQUENCES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT CREATE                        ON SCHEMAS   TO dbrole_admin;

-- for additional business admin, they can SET ROLE to dbrole_admin
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT                        ON TABLES    TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT                        ON SEQUENCES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT USAGE                         ON SCHEMAS   TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT                        ON TABLES    TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT                        ON SEQUENCES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT CREATE                        ON SCHEMAS   TO dbrole_admin;

--==================================================================--
--                              Schemas                             --
--==================================================================--
CREATE SCHEMA IF NOT EXISTS "monitor";

-- revoke public creation
REVOKE CREATE ON SCHEMA public FROM PUBLIC;

--==================================================================--
--                             Extensions                           --
--==================================================================--
CREATE EXTENSION IF NOT EXISTS "pg_stat_statements" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pgstattuple" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_qualstats" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_buffercache" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pageinspect" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_prewarm" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_visibility" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_freespacemap" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_repack" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "postgres_fdw";
CREATE EXTENSION IF NOT EXISTS "file_fdw";
CREATE EXTENSION IF NOT EXISTS "btree_gist";
CREATE EXTENSION IF NOT EXISTS "btree_gin";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";
CREATE EXTENSION IF NOT EXISTS "intagg";
CREATE EXTENSION IF NOT EXISTS "intarray";



--==================================================================--
--                            Monitor Views                         --
--==================================================================--

----------------------------------------------------------------------
-- cleanse
----------------------------------------------------------------------
CREATE SCHEMA IF NOT EXISTS monitor;
GRANT USAGE ON SCHEMA monitor TO "dbuser_monitor";
GRANT USAGE ON SCHEMA monitor TO "dbuser_admin";
GRANT USAGE ON SCHEMA monitor TO "replicator";

DROP VIEW IF EXISTS monitor.pg_table_bloat_human;
DROP VIEW IF EXISTS monitor.pg_index_bloat_human;
DROP VIEW IF EXISTS monitor.pg_table_bloat;
DROP VIEW IF EXISTS monitor.pg_index_bloat;
DROP VIEW IF EXISTS monitor.pg_session;
DROP VIEW IF EXISTS monitor.pg_kill;
DROP VIEW IF EXISTS monitor.pg_cancel;
DROP VIEW IF EXISTS monitor.pg_seq_scan;


----------------------------------------------------------------------
-- Table bloat estimate
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_table_bloat AS
    SELECT CURRENT_CATALOG AS datname, nspname, relname , bs * tblpages AS size,
           CASE WHEN tblpages - est_tblpages_ff > 0 THEN (tblpages - est_tblpages_ff)/tblpages::FLOAT ELSE 0 END AS ratio
    FROM (
             SELECT ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff,
                    tblpages, fillfactor, bs, tblid, nspname, relname, is_na
             FROM (
                      SELECT
                          ( 4 + tpl_hdr_size + tpl_data_size + (2 * ma)
                              - CASE WHEN tpl_hdr_size % ma = 0 THEN ma ELSE tpl_hdr_size % ma END
                              - CASE WHEN ceil(tpl_data_size)::INT % ma = 0 THEN ma ELSE ceil(tpl_data_size)::INT % ma END
                              ) AS tpl_size, (heappages + toastpages) AS tblpages, heappages,
                          toastpages, reltuples, toasttuples, bs, page_hdr, tblid, nspname, relname, fillfactor, is_na
                      FROM (
                               SELECT
                                   tbl.oid AS tblid, ns.nspname , tbl.relname, tbl.reltuples,
                                   tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
                                   coalesce(toast.reltuples, 0) AS toasttuples,
                                   coalesce(substring(array_to_string(tbl.reloptions, ' ') FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
                                   current_setting('block_size')::numeric AS bs,
                                   CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
                                   24 AS page_hdr,
                                   23 + CASE WHEN MAX(coalesce(s.null_frac,0)) > 0 THEN ( 7 + count(s.attname) ) / 8 ELSE 0::int END
                                       + CASE WHEN bool_or(att.attname = 'oid' and att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size,
                                   sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0) ) AS tpl_data_size,
                                   bool_or(att.atttypid = 'pg_catalog.name'::regtype)
                                       OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.attname) AS is_na
                               FROM pg_attribute AS att
                                        JOIN pg_class AS tbl ON att.attrelid = tbl.oid
                                        JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
                                        LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname
                                        LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
                               WHERE NOT att.attisdropped AND tbl.relkind = 'r' AND nspname NOT IN ('pg_catalog','information_schema')
                               GROUP BY 1,2,3,4,5,6,7,8,9,10
                           ) AS s
                  ) AS s2
         ) AS s3
    WHERE NOT is_na;
COMMENT ON VIEW monitor.pg_table_bloat IS 'postgres table bloat estimate';

----------------------------------------------------------------------
-- Index bloat estimate
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_index_bloat AS
    SELECT CURRENT_CATALOG AS datname, nspname, idxname AS relname, relpages::BIGINT * bs AS size,
           COALESCE((relpages - ( reltuples * (6 + ma - (CASE WHEN index_tuple_hdr % ma = 0 THEN ma ELSE index_tuple_hdr % ma END)
                                                   + nulldatawidth + ma - (CASE WHEN nulldatawidth % ma = 0 THEN ma ELSE nulldatawidth % ma END))
                                      / (bs - pagehdr)::FLOAT  + 1 )), 0) / relpages::FLOAT AS ratio
    FROM (
             SELECT nspname,
                    idxname,
                    reltuples,
                    relpages,
                    current_setting('block_size')::INTEGER                                                               AS bs,
                    (CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END)  AS ma,
                    24                                                                                                   AS pagehdr,
                    (CASE WHEN max(COALESCE(pg_stats.null_frac, 0)) = 0 THEN 2 ELSE 6 END)                               AS index_tuple_hdr,
                    sum((1.0 - COALESCE(pg_stats.null_frac, 0.0)) *
                        COALESCE(pg_stats.avg_width, 1024))::INTEGER                                                     AS nulldatawidth
             FROM pg_attribute
                      JOIN (
                 SELECT pg_namespace.nspname,
                        ic.relname                                                   AS idxname,
                        ic.reltuples,
                        ic.relpages,
                        pg_index.indrelid,
                        pg_index.indexrelid,
                        tc.relname                                                   AS tablename,
                        regexp_split_to_table(pg_index.indkey::TEXT, ' ') :: INTEGER AS attnum,
                        pg_index.indexrelid                                          AS index_oid
                 FROM pg_index
                          JOIN pg_class ic ON pg_index.indexrelid = ic.oid
                          JOIN pg_class tc ON pg_index.indrelid = tc.oid
                          JOIN pg_namespace ON pg_namespace.oid = ic.relnamespace
                          JOIN pg_am ON ic.relam = pg_am.oid
                 WHERE pg_am.amname = 'btree' AND ic.relpages > 0 AND nspname NOT IN ('pg_catalog', 'information_schema')
             ) ind_atts ON pg_attribute.attrelid = ind_atts.indexrelid AND pg_attribute.attnum = ind_atts.attnum
                      JOIN pg_stats ON pg_stats.schemaname = ind_atts.nspname
                 AND ((pg_stats.tablename = ind_atts.tablename AND pg_stats.attname = pg_get_indexdef(pg_attribute.attrelid, pg_attribute.attnum, TRUE))
                     OR (pg_stats.tablename = ind_atts.idxname AND pg_stats.attname = pg_attribute.attname))
             WHERE pg_attribute.attnum > 0
             GROUP BY 1, 2, 3, 4, 5, 6
         ) est
    LIMIT 512;
COMMENT ON VIEW monitor.pg_index_bloat IS 'postgres index bloat estimate (btree-only)';

----------------------------------------------------------------------
-- table bloat pretty
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_table_bloat_human AS
SELECT nspname || '.' || relname AS name,
       pg_size_pretty(size)      AS size,
       pg_size_pretty((size * ratio)::BIGINT) AS wasted,
       round(100 * ratio::NUMERIC, 2)  as ratio
FROM monitor.pg_table_bloat ORDER BY wasted DESC NULLS LAST;
COMMENT ON VIEW monitor.pg_table_bloat_human IS 'postgres table bloat pretty';

----------------------------------------------------------------------
-- index bloat pretty
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_index_bloat_human AS
SELECT nspname || '.' || relname              AS name,
       pg_size_pretty(size)                   AS size,
       pg_size_pretty((size * ratio)::BIGINT) AS wasted,
       round(100 * ratio::NUMERIC, 2)         as ratio
FROM monitor.pg_index_bloat;
COMMENT ON VIEW monitor.pg_index_bloat_human IS 'postgres index bloat pretty';


----------------------------------------------------------------------
-- pg session
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_session AS
SELECT coalesce(datname, 'all') AS datname,
       numbackends,
       active,
       idle,
       ixact,
       max_duration,
       max_tx_duration,
       max_conn_duration
FROM (
         SELECT datname,
                count(*)                                         AS numbackends,
                count(*) FILTER ( WHERE state = 'active' )       AS active,
                count(*) FILTER ( WHERE state = 'idle' )         AS idle,
                count(*) FILTER ( WHERE state = 'idle in transaction'
                    OR state = 'idle in transaction (aborted)' ) AS ixact,
                max(extract(epoch from now() - state_change))
                FILTER ( WHERE state = 'active' )                AS max_duration,
                max(extract(epoch from now() - xact_start))      AS max_tx_duration,
                max(extract(epoch from now() - backend_start))   AS max_conn_duration
         FROM pg_stat_activity
         WHERE backend_type = 'client backend'
           AND pid <> pg_backend_pid()
         GROUP BY ROLLUP (1)
         ORDER BY 1 NULLS FIRST
     ) t;
COMMENT ON VIEW monitor.pg_session IS 'postgres session stats';


----------------------------------------------------------------------
-- pg kill
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_kill AS
SELECT pid,
       pg_terminate_backend(pid)                 AS killed,
       datname                                   AS dat,
       usename                                   AS usr,
       application_name                          AS app,
       client_addr                               AS addr,
       state,
       extract(epoch from now() - state_change)  AS query_time,
       extract(epoch from now() - xact_start)    AS xact_time,
       extract(epoch from now() - backend_start) AS conn_time,
       substring(query, 1, 40)                   AS query
FROM pg_stat_activity
WHERE backend_type = 'client backend'
  AND pid <> pg_backend_pid();
COMMENT ON VIEW monitor.pg_kill IS 'kill all backend session';


----------------------------------------------------------------------
-- quick cancel view
----------------------------------------------------------------------
DROP VIEW IF EXISTS monitor.pg_cancel;
CREATE OR REPLACE VIEW monitor.pg_cancel AS
SELECT pid,
       pg_cancel_backend(pid)                    AS cancel,
       datname                                   AS dat,
       usename                                   AS usr,
       application_name                          AS app,
       client_addr                               AS addr,
       state,
       extract(epoch from now() - state_change)  AS query_time,
       extract(epoch from now() - xact_start)    AS xact_time,
       extract(epoch from now() - backend_start) AS conn_time,
       substring(query, 1, 40)
FROM pg_stat_activity
WHERE state = 'active'
  AND backend_type = 'client backend'
  and pid <> pg_backend_pid();
COMMENT ON VIEW monitor.pg_cancel IS 'cancel backend queries';


----------------------------------------------------------------------
-- seq scan
----------------------------------------------------------------------
DROP VIEW IF EXISTS monitor.pg_seq_scan;
CREATE OR REPLACE VIEW monitor.pg_seq_scan AS
SELECT schemaname                             AS nspname,
       relname,
       seq_scan,
       seq_tup_read,
       seq_tup_read / seq_scan                AS seq_tup_avg,
       idx_scan,
       n_live_tup + n_dead_tup                AS tuples,
       n_live_tup / (n_live_tup + n_dead_tup) AS dead_ratio
FROM pg_stat_user_tables
WHERE seq_scan > 0
  and (n_live_tup + n_dead_tup) > 0
ORDER BY seq_tup_read DESC
LIMIT 50;
COMMENT ON VIEW monitor.pg_seq_scan IS 'table that have seq scan';


----------------------------------------------------------------------
-- pg_shmem auxiliary function
-- PG 13 ONLY!
----------------------------------------------------------------------
CREATE OR REPLACE FUNCTION monitor.pg_shmem() RETURNS SETOF
    pg_shmem_allocations AS $$ SELECT * FROM pg_shmem_allocations;$$ LANGUAGE SQL SECURITY DEFINER;
COMMENT ON FUNCTION monitor.pg_shmem() IS 'security wrapper for pg_shmem';


--==================================================================--
--                          Customize Logic                         --
--==================================================================--
-- This script will be execute on primary instance among a newly created
-- postgres cluster. it will be executed as dbsu on template1 database
-- put your own customize logic here
-- make sure they are idempotent

5.2.5 - 定制业务ACL

配置Pigsty中的业务用户

PostgreSQL中的ACL包括两部分,用户权限体系(Privileges)Host Based Authentication (HBA)

Pigsty提供了默认访问控制系统,用户可在此基础上进一步定制,与ACL相关的配置项包括:

名称 类型 层级 说明
pg_default_roles role[] G 默认创建的角色与用户
pg_default_privilegs string[] G 数据库默认权限配置
pg_hba_rules rule[] G 全局HBA规则
pg_hba_rules_extra rule[] C/I 集群/实例特定的HBA规则
pgbouncer_hba_rules rule[] G/C Pgbouncer全局HBA规则
pgbouncer_hba_rules_extra rule[] G/C Pgbounce特定HBA规则
pg_users user[] C 业务用户
pg_databases database[] C 业务数据库

HBA规则

用户可以通过 pg_hba_rulespg_hba_rules_extra 定制 Postgres的HBA规则,通过 pgbouncer_hba_rulespgbouncer_hba_rules_extra 定制Pgbouncer的HBA规则。

一条HBA规则是一个对象,包含3个必选字段:titlerolerules

title: intranet password access
role: common
rules:
  - host   all          all                     10.0.0.0/8      md5
  - host   all          all                     172.16.0.0/12   md5
  - host   all          all                     192.168.0.0/16  md5
  • title 是这条规则的说明,会被渲染为注释信息。
  • role 是这条规则的应用范围,
  • rules 是具体的HBA规则数组,每一个元素都是一条规则五元组,请参考PG官方文档。

这样的一条规则,会被渲染至/pg/data/pg_hba.conf文件中。

#  allow intranet password access
host    all             all                 10.0.0.0/8          md5
host    all             all                 172.16.0.0/12       md5
host    all             all                 192.168.0.0/16      md5

规则的应用范围

规则的 role 用于控制规则安装的位置。

role = common的HBA规则组会安装到所有的实例上,而其他的取值,例如(role : primary)则只会安装至pg_role = primary的实例上。因此用户可以通过角色体系定义灵活的HBA规则。

作为一个特例role: offline 的HBA规则,除了会安装至pg_role == 'offline'的实例,也会安装至pg_offline_query == true的实例上,允许离线用户访问。

规则的应用顺序

定义的HBA规则按照以下顺序生效:

特别注意

请注意,因为在实际生产应用中,通常会基于实例的角色,对HBA进行区分与细化管理。Pigsty不建议通过Patroni管理HBA配置。如果配置了Patroni中的HBA规则,数据库的HBA会在重启时被Patroni所覆盖。

5.3 - 执行剧本

如何利用Pigsty提供的剧本完成完整的初始化。

Pigsty采用声明式接口,配置完成之后只需运行固定的 剧本(Playbook),即可完成部署

基本部署

沙箱部署

  • 沙箱部署: 针对本地沙箱环境,Pigsty提供采用交织式部署的快速初始化剧本: sandbox.yml

仅监控部署

日常管理

Pigsty还提供了一些供日常运维管理使用的预置剧本:

5.3.1 - 基础设施初始化

如何使用剧本初始化基础设施

概览

基础设施初始化通过 infra.yml 完成。该剧本会在元节点 上完成基础设施的安装与部署。

infra.yml 将元节点(默认分组名为meta)作为部署目标。

./infra.yml

注意事项

❗️必须完成元节点的初始化后,才能正常执行数据库节点的初始化

infra.yml 固定会作用于配置文件中 名为 meta 的分组

元节点可以当作普通节点复用,即在元节点上也可以定义并创建PostgreSQL数据库。

Pigsty建议使用默认配置,在元节点上创建一个pg-meta元数据库集群,用于承载Pigsty高级特性。

完整执行一遍初始化流程可能花费2~8分钟,视机器配置而异。

选择性执行

用户可以通过ansible的标签机制,选择性执行剧本的一个子集。

例如,如果只想执行本地源初始化的部分,则可以通过以下命令:

./infra.yml --tags=repo

具体的标签请参考 任务详情

一些常用的任务子集包括:

./infra.yml --tags=repo -e repo_rebuild=true            # 强制重新创建本地源
./infra.yml --tags=prometheus_reload                    # 重新加载Prometheus配置
./infra.yml --tags=nginx_haproxy                        # 重新生成Nginx Haproxy索引页
./infra.yml --tags=prometheus_targets,prometheus_reload # 重新生成Prometheus静态监控对象文件并应用

剧本说明

infra.yml 主要完成以下工作

  • 部署并启用本地源
  • 完成元节点的初始化
  • 完成元节点基础设施初始化
    • CA基础设施
    • DNS Nameserver
    • Nginx
    • Prometheus & Alertmanger
    • Grafana
  • 将Pigsty本体拷贝至元节点
  • 在元节点上完成数据库初始化(可选,用户可以通过标准的数据库集群初始化流程复用元节点)

原始内容

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   infra.yml
# Ctime     :   2020-04-13
# Mtime     :   2020-07-23
# Desc      :   init infrastructure on meta nodes
# Path      :   infra.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#


#------------------------------------------------------------------------------
# init local yum repo (only run on meta nodes)
#------------------------------------------------------------------------------
- name: Init local repo
  become: yes
  hosts: meta
  gather_facts: no
  tags: repo
  roles:
    - repo


#------------------------------------------------------------------------------
# provision nodes
#------------------------------------------------------------------------------
- name: Provision Node
  become: yes
  hosts: meta
  gather_facts: no
  tags: node
  roles:
    - node


#------------------------------------------------------------------------------
# init meta service (only run on meta nodes)
#------------------------------------------------------------------------------
- name: Init meta service
  become: yes
  hosts: meta
  gather_facts: no
  tags: meta
  roles:
    - role: ca
      tags: ca

    - role: nameserver
      tags: nameserver

    - role: nginx
      tags: nginx

    - role: prometheus
      tags: prometheus

    - role: grafana
      tags: grafana


#------------------------------------------------------------------------------
# init dcs on nodes
#------------------------------------------------------------------------------
- name: Init dcs
  become: yes
  hosts: meta
  gather_facts: no
  roles:
    - role: consul
      tags: dcs


#------------------------------------------------------------------------------
# copy scripts to meta node
#------------------------------------------------------------------------------
- name: Copy ansible scripts
  become: yes
  hosts: meta
  gather_facts: no
  ignore_errors: yes
  tags: ansible
  tasks:
    - name: Copy ansible scritps
      when: node_admin_setup is defined and node_admin_setup|bool and node_admin_username != ''
      block:
        # create copy of this repo
        - name: Create ansible tarball
          become: no
          connection: local
          run_once: true
          command:
            cmd: tar -cf files/meta.tgz roles templates ansible.cfg infra.yml pgsql.yml pgsql-remove.yml pgsql-createdb.yml pgsql-createuser.yml pgsql-service.yml pgsql-monitor.yml pigsty.yml Makefile
            chdir: "{{ playbook_dir }}"

        - name: Create ansible directory
          file: path="/home/{{ node_admin_username }}/meta" state=directory owner={{ node_admin_username }}

        - name: Copy ansible tarball
          copy: src="meta.tgz" dest="/home/{{ node_admin_username }}/meta/meta.tgz" owner={{ node_admin_username }}

        - name: Extract tarball
          shell: |
            cd /home/{{ node_admin_username }}/meta/
            tar -xf meta.tgz
            chown -R {{ node_admin_username }} /home/{{ node_admin_username }}
            rm -rf meta.tgz
            chmod a+x *.yml            



#------------------------------------------------------------------------------
# meta node database (optional)
#------------------------------------------------------------------------------
# this play will create database clusters on meta nodes.
# it's good to reuse meta node as normal database nodes too
# but it's always better to leave it be.
#------------------------------------------------------------------------------
#- name: Pgsql Initialization
#  become: yes
#  hosts: meta
#  gather_facts: no
#  roles:
#    - role: postgres                        # init postgres
#      tags: [pgsql, postgres]
#
#    - role: monitor                         # init monitor system
#      tags: [pgsql, monitor]
#
#    - role: service                         # init haproxy
#      tags: [service]


...

任务详情

使用以下命令可以列出所有基础设施初始化会执行的任务,以及可以使用的标签:

./infra.yml --list-tasks

默认任务如下:

playbook: ./infra.yml

  play #1 (meta): Init local repo	TAGS: [repo]
    tasks:
      repo : Create local repo directory	TAGS: [repo, repo_dir]
      repo : Backup & remove existing repos	TAGS: [repo, repo_upstream]
      repo : Add required upstream repos	TAGS: [repo, repo_upstream]
      repo : Check repo pkgs cache exists	TAGS: [repo, repo_prepare]
      repo : Set fact whether repo_exists	TAGS: [repo, repo_prepare]
      repo : Move upstream repo to backup	TAGS: [repo, repo_prepare]
      repo : Add local file system repos	TAGS: [repo, repo_prepare]
      repo : Remake yum cache if not exists	TAGS: [repo, repo_prepare]
      repo : Install repo bootstrap packages	TAGS: [repo, repo_boot]
      repo : Render repo nginx server files	TAGS: [repo, repo_nginx]
      repo : Disable selinux for repo server	TAGS: [repo, repo_nginx]
      repo : Launch repo nginx server	TAGS: [repo, repo_nginx]
      repo : Waits repo server online	TAGS: [repo, repo_nginx]
      repo : Download web url packages	TAGS: [repo, repo_download]
      repo : Download repo packages	TAGS: [repo, repo_download]
      repo : Download repo pkg deps	TAGS: [repo, repo_download]
      repo : Create local repo index	TAGS: [repo, repo_download]
      repo : Copy bootstrap scripts	TAGS: [repo, repo_download, repo_script]
      repo : Mark repo cache as valid	TAGS: [repo, repo_download]

  play #2 (meta): Provision Node	TAGS: [node]
    tasks:
      node : Update node hostname	TAGS: [node, node_name]
      node : Add new hostname to /etc/hosts	TAGS: [node, node_name]
      node : Write static dns records	TAGS: [node, node_dns]
      node : Get old nameservers	TAGS: [node, node_resolv]
      node : Truncate resolv file	TAGS: [node, node_resolv]
      node : Write resolv options	TAGS: [node, node_resolv]
      node : Add new nameservers	TAGS: [node, node_resolv]
      node : Append old nameservers	TAGS: [node, node_resolv]
      node : Node configure disable firewall	TAGS: [node, node_firewall]
      node : Node disable selinux by default	TAGS: [node, node_firewall]
      node : Backup existing repos	TAGS: [node, node_repo]
      node : Install upstream repo	TAGS: [node, node_repo]
      node : Install local repo	TAGS: [node, node_repo]
      node : Install node basic packages	TAGS: [node, node_pkgs]
      node : Install node extra packages	TAGS: [node, node_pkgs]
      node : Install meta specific packages	TAGS: [node, node_pkgs]
      node : Install node basic packages	TAGS: [node, node_pkgs]
      node : Install node extra packages	TAGS: [node, node_pkgs]
      node : Install meta specific packages	TAGS: [node, node_pkgs]
      node : Node configure disable numa	TAGS: [node, node_feature]
      node : Node configure disable swap	TAGS: [node, node_feature]
      node : Node configure unmount swap	TAGS: [node, node_feature]
      node : Node setup static network	TAGS: [node, node_feature]
      node : Node configure disable firewall	TAGS: [node, node_feature]
      node : Node configure disk prefetch	TAGS: [node, node_feature]
      node : Enable linux kernel modules	TAGS: [node, node_kernel]
      node : Enable kernel module on reboot	TAGS: [node, node_kernel]
      node : Get config parameter page count	TAGS: [node, node_tuned]
      node : Get config parameter page size	TAGS: [node, node_tuned]
      node : Tune shmmax and shmall via mem	TAGS: [node, node_tuned]
      node : Create tuned profiles	TAGS: [node, node_tuned]
      node : Render tuned profiles	TAGS: [node, node_tuned]
      node : Active tuned profile	TAGS: [node, node_tuned]
      node : Change additional sysctl params	TAGS: [node, node_tuned]
      node : Copy default user bash profile	TAGS: [node, node_profile]
      node : Setup node default pam ulimits	TAGS: [node, node_ulimit]
      node : Create os user group admin	TAGS: [node, node_admin]
      node : Create os user admin	TAGS: [node, node_admin]
      node : Grant admin group nopass sudo	TAGS: [node, node_admin]
      node : Add no host checking to ssh config	TAGS: [node, node_admin]
      node : Add admin ssh no host checking	TAGS: [node, node_admin]
      node : Fetch all admin public keys	TAGS: [node, node_admin]
      node : Exchange all admin ssh keys	TAGS: [node, node_admin]
      node : Install public keys	TAGS: [node, node_admin]
      node : Install ntp package	TAGS: [node, ntp_install]
      node : Install chrony package	TAGS: [node, ntp_install]
      node : Setup default node timezone	TAGS: [node, ntp_config]
      node : Copy the ntp.conf file	TAGS: [node, ntp_config]
      node : Copy the chrony.conf template	TAGS: [node, ntp_config]
      node : Launch ntpd service	TAGS: [node, ntp_launch]
      node : Launch chronyd service	TAGS: [node, ntp_launch]

  play #3 (meta): Init meta service	TAGS: [meta]
    tasks:
      ca : Create local ca directory	TAGS: [ca, ca_dir, meta]
      ca : Copy ca cert from local files	TAGS: [ca, ca_copy, meta]
      ca : Check ca key cert exists	TAGS: [ca, ca_create, meta]
      ca : Create self-signed CA key-cert	TAGS: [ca, ca_create, meta]
      nameserver : Make sure dnsmasq package installed	TAGS: [meta, nameserver]
      nameserver : Copy dnsmasq /etc/dnsmasq.d/config	TAGS: [meta, nameserver]
      nameserver : Add dynamic dns records to meta	TAGS: [meta, nameserver]
      nameserver : Launch meta dnsmasq service	TAGS: [meta, nameserver]
      nameserver : Wait for meta dnsmasq online	TAGS: [meta, nameserver]
      nameserver : Register consul dnsmasq service	TAGS: [meta, nameserver]
      nameserver : Reload consul	TAGS: [meta, nameserver]
      nginx : Make sure nginx installed	TAGS: [meta, nginx, nginx_install]
      nginx : Create local html directory	TAGS: [meta, nginx, nginx_content]
      nginx : Create nginx config directory	TAGS: [meta, nginx, nginx_content]
      nginx : Update default nginx index page	TAGS: [meta, nginx, nginx_content]
      nginx : Copy nginx default config	TAGS: [meta, nginx, nginx_config]
      nginx : Copy nginx upstream conf	TAGS: [meta, nginx, nginx_config]
      nginx : Templating /etc/nginx/haproxy.conf	TAGS: [meta, nginx, nginx_haproxy]
      nginx : Render haproxy upstream in cluster mode	TAGS: [meta, nginx, nginx_haproxy]
      nginx : Render haproxy location in cluster mode	TAGS: [meta, nginx, nginx_haproxy]
      nginx : Templating haproxy cluster index	TAGS: [meta, nginx, nginx_haproxy]
      nginx : Templating haproxy cluster index	TAGS: [meta, nginx, nginx_haproxy]
      nginx : Restart meta nginx service	TAGS: [meta, nginx, nginx_restart]
      nginx : Wait for nginx service online	TAGS: [meta, nginx, nginx_restart]
      nginx : Make sure nginx exporter installed	TAGS: [meta, nginx, nginx_exporter]
      nginx : Config nginx_exporter options	TAGS: [meta, nginx, nginx_exporter]
      nginx : Restart nginx_exporter service	TAGS: [meta, nginx, nginx_exporter]
      nginx : Wait for nginx exporter online	TAGS: [meta, nginx, nginx_exporter]
      nginx : Register cosnul nginx service	TAGS: [meta, nginx, nginx_register]
      nginx : Register consul nginx-exporter service	TAGS: [meta, nginx, nginx_register]
      nginx : Reload consul	TAGS: [meta, nginx, nginx_register]
      prometheus : Install prometheus and alertmanager	TAGS: [meta, prometheus]
      prometheus : Wipe out prometheus config dir	TAGS: [meta, prometheus, prometheus_clean]
      prometheus : Wipe out existing prometheus data	TAGS: [meta, prometheus, prometheus_clean]
      prometheus : Create postgres directory structure	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Copy prometheus bin scripts	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Copy prometheus rules scripts	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Copy altermanager config	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Render prometheus config	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Config /etc/prometheus opts	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Launch prometheus service	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Launch alertmanager service	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Wait for prometheus online	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Wait for alertmanager online	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Render prometheus targets in cluster mode	TAGS: [meta, prometheus, prometheus_targets]
      prometheus : Reload prometheus service	TAGS: [meta, prometheus, prometheus_reload]
      prometheus : Copy prometheus service definition	TAGS: [meta, prometheus, prometheus_register]
      prometheus : Copy alertmanager service definition	TAGS: [meta, prometheus, prometheus_register]
      prometheus : Reload consul to register prometheus	TAGS: [meta, prometheus, prometheus_register]
      grafana : Make sure grafana is installed	TAGS: [grafana, grafana_install, meta]
      grafana : Check grafana plugin cache exists	TAGS: [grafana, grafana_plugin, meta]
      grafana : Provision grafana plugins via cache	TAGS: [grafana, grafana_plugin, meta]
      grafana : Download grafana plugins from web	TAGS: [grafana, grafana_plugin, meta]
      grafana : Download grafana plugins from web	TAGS: [grafana, grafana_plugin, meta]
      grafana : Create grafana plugins cache	TAGS: [grafana, grafana_plugin, meta]
      grafana : Copy /etc/grafana/grafana.ini	TAGS: [grafana, grafana_config, meta]
      grafana : Remove grafana provision dir	TAGS: [grafana, grafana_config, meta]
      grafana : Copy provisioning content	TAGS: [grafana, grafana_config, meta]
      grafana : Copy pigsty dashboards	TAGS: [grafana, grafana_config, meta]
      grafana : Copy pigsty icon image	TAGS: [grafana, grafana_config, meta]
      grafana : Replace grafana icon with pigsty	TAGS: [grafana, grafana_config, grafana_customize, meta]
      grafana : Launch grafana service	TAGS: [grafana, grafana_launch, meta]
      grafana : Wait for grafana online	TAGS: [grafana, grafana_launch, meta]
      grafana : Update grafana default preferences	TAGS: [grafana, grafana_provision, meta]
      grafana : Register consul grafana service	TAGS: [grafana, grafana_register, meta]
      grafana : Reload consul	TAGS: [grafana, grafana_register, meta]

  play #4 (meta): Init dcs	TAGS: []
    tasks:
      consul : Check for existing consul	TAGS: [consul_check, dcs]
      consul : Consul exists flag fact set	TAGS: [consul_check, dcs]
      consul : Abort due to consul exists	TAGS: [consul_check, dcs]
      consul : Clean existing consul instance	TAGS: [consul_clean, dcs]
      consul : Stop any running consul instance	TAGS: [consul_clean, dcs]
      consul : Remove existing consul dir	TAGS: [consul_clean, dcs]
      consul : Recreate consul dir	TAGS: [consul_clean, dcs]
      consul : Make sure consul is installed	TAGS: [consul_install, dcs]
      consul : Make sure consul dir exists	TAGS: [consul_config, dcs]
      consul : Get dcs server node names	TAGS: [consul_config, dcs]
      consul : Get dcs node name from var	TAGS: [consul_config, dcs]
      consul : Get dcs node name from var	TAGS: [consul_config, dcs]
      consul : Fetch hostname as dcs node name	TAGS: [consul_config, dcs]
      consul : Get dcs name from hostname	TAGS: [consul_config, dcs]
      consul : Copy /etc/consul.d/consul.json	TAGS: [consul_config, dcs]
      consul : Copy consul agent service	TAGS: [consul_config, dcs]
      consul : Get dcs bootstrap expect quroum	TAGS: [consul_server, dcs]
      consul : Copy consul server service unit	TAGS: [consul_server, dcs]
      consul : Launch consul server service	TAGS: [consul_server, dcs]
      consul : Wait for consul server online	TAGS: [consul_server, dcs]
      consul : Launch consul agent service	TAGS: [consul_agent, dcs]
      consul : Wait for consul agent online	TAGS: [consul_agent, dcs]

  play #5 (meta): Copy ansible scripts	TAGS: [ansible]
    tasks:
      Create ansible tarball	TAGS: [ansible]
      Create ansible directory	TAGS: [ansible]
      Copy ansible tarball	TAGS: [ansible]
      Extract tarball	TAGS: [ansible]

5.3.2 - 数据库集群初始化

如何定义并拉起PostgreSQL数据库集群

剧本概览

完成了基础设施初始化后,用户可以 pgsql.yml 完成数据库集群的初始化

首先在 Pigsty配置文件 中完成数据库集群的定义,然后通过执行pgsql.yml将变更应用至实际环境中。

./pgsql.yml                      # 在所有清单中的机器上执行数据库集群初始化操作(危险!)
./pgsql.yml -l pg-test           # 在 pg-test 分组下的机器执行数据库集群初始化(推荐!)
./pgsql.yml -l pg-meta,pg-test   # 同时初始化pg-meta与pg-test两个集群
./pgsql.yml -l 10.10.10.11       # 初始化10.10.10.11这台机器上的数据库实例

注意事项

  • 使用不带参数的pgsql.yml虽然很方便,但在生产环境中是一个高危操作

    强烈建议您在执行时添加-l参数,限制命令执行的对象范围。

  • 用户可以将元节点当成普通节点复用,即在元节点上定义并创建PostgreSQL数据库。

    默认沙箱环境中,执行./pgsql.yml会同时完成pg-metapg-test的初始化工作。

  • 单独针对集群从库执行初始化时,用户必须自行确保主库必须已经完成初始化,主库与其从库同时进行初始化则无此要求。

保护机制

pgsql.yml提供保护机制,由配置参数pg_exists_action决定。当执行剧本前会目标机器上有正在运行的PostgreSQL实例时,Pigsty会根据pg_exists_action的配置abort|clean|skip行动。

  • abort:建议设置为默认配置,如遇现存实例,中止剧本执行,避免误删库。
  • clean:建议在本地沙箱环境使用,如遇现存实例,清除已有数据库。
  • skip: 直接在已有数据库集群上执行后续逻辑。
  • 您可以通过./pgsql.yml -e pg_exists_action=clean的方式来覆盖配置文件选项,强制抹掉现有实例

pg_disable_purge选项提供了双重保护,如果启用该选项,则``pg_exists_action会被强制设置为abort`,在任何情况下都不会抹掉运行中的数据库实例。

``dcs_exists_actiondcs_disable_purge`与上述两个选项效果一致,但针对DCS(Consul Agent)实例。

选择性执行

用户可以通过ansible的标签机制,可以选择执行剧本的一个子集。

举个例子,如果只想执行服务初始化的部分,则可以通过以下命令进行

./pgsql.yml --tags=service

常用的命令子集如下:

./pgsql.yml --tags=infra        # 完成基础设施的初始化,包括机器节点初始化与DCS部署
./pgsql.yml --tags=node         # 完成机器节点的初始化
./pgsql.yml --tags=dcs          # 完成DCS:consul/etcd的初始化
./pgsql.yml --tags=dcs -e dcs_exists_action # 完成consul/etcd的初始化,抹除已有的consul agent

./pgsql.yml --tags=pgsql        # 完成数据库与监控的部署
./pgsql.yml --tags=postgres     # 完成数据库部署
./pgsql.yml --tags=monitor      # 完成监控的部署

./pgsql.yml --tags=service       # 完成负载均衡的部署,包括Haproxy与VIP
./pgsql.yml --tags=haproxy_config,haproxy_reload  # 修改Haproxy配置并应用。

剧本说明

pgsql.yml 主要完成以下工作:

  • 初始化数据库节点基础设施(node
  • 初始化DCS Agent(如果为元节点,则为DCS Server)服务(consul)。
  • 安装、部署、初始化PostgreSQL, Pgbouncer, Patroni(postgres
  • 安装PostgreSQL监控系统(monitor
  • 安装部署Haproxy与VIP,对外暴露服务(service

精确到任务的标签请参考任务详情

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   pgsql.yml
# Mtime     :   2020-05-12
# Mtime     :   2021-03-15
# Desc      :   initialize pigsty cluster
# Path      :   pgsql.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#


#------------------------------------------------------------------------------
# init node and database
#------------------------------------------------------------------------------
- name: Pgsql Initialization
  become: yes
  hosts: all
  gather_facts: no
  roles:

    - role: node                            # init node
      tags: [infra, node]

    - role: consul                          # init consul
      tags: [infra, dcs]

    - role: postgres                        # init postgres
      tags: [pgsql, postgres]

    - role: monitor                         # init monitor system
      tags: [pgsql, monitor]

    - role: service                         # init service
      tags: [service]

...

任务详情

使用以下命令可以列出数据库集群初始化的所有任务,以及可以使用的标签:

./pgsql.yml --list-tasks

默认任务如下:

playbook: ./pgsql.yml

  play #1 (all): Pgsql Initialization	TAGS: []
    tasks:
      node : Update node hostname	TAGS: [infra, node, node_name]
      node : Add new hostname to /etc/hosts	TAGS: [infra, node, node_name]
      node : Write static dns records	TAGS: [infra, node, node_dns]
      node : Get old nameservers	TAGS: [infra, node, node_resolv]
      node : Truncate resolv file	TAGS: [infra, node, node_resolv]
      node : Write resolv options	TAGS: [infra, node, node_resolv]
      node : Add new nameservers	TAGS: [infra, node, node_resolv]
      node : Append old nameservers	TAGS: [infra, node, node_resolv]
      node : Node configure disable firewall	TAGS: [infra, node, node_firewall]
      node : Node disable selinux by default	TAGS: [infra, node, node_firewall]
      node : Backup existing repos	TAGS: [infra, node, node_repo]
      node : Install upstream repo	TAGS: [infra, node, node_repo]
      node : Install local repo	TAGS: [infra, node, node_repo]
      node : Install node basic packages	TAGS: [infra, node, node_pkgs]
      node : Install node extra packages	TAGS: [infra, node, node_pkgs]
      node : Install meta specific packages	TAGS: [infra, node, node_pkgs]
      node : Install node basic packages	TAGS: [infra, node, node_pkgs]
      node : Install node extra packages	TAGS: [infra, node, node_pkgs]
      node : Install meta specific packages	TAGS: [infra, node, node_pkgs]
      node : Node configure disable numa	TAGS: [infra, node, node_feature]
      node : Node configure disable swap	TAGS: [infra, node, node_feature]
      node : Node configure unmount swap	TAGS: [infra, node, node_feature]
      node : Node setup static network	TAGS: [infra, node, node_feature]
      node : Node configure disable firewall	TAGS: [infra, node, node_feature]
      node : Node configure disk prefetch	TAGS: [infra, node, node_feature]
      node : Enable linux kernel modules	TAGS: [infra, node, node_kernel]
      node : Enable kernel module on reboot	TAGS: [infra, node, node_kernel]
      node : Get config parameter page count	TAGS: [infra, node, node_tuned]
      node : Get config parameter page size	TAGS: [infra, node, node_tuned]
      node : Tune shmmax and shmall via mem	TAGS: [infra, node, node_tuned]
      node : Create tuned profiles	TAGS: [infra, node, node_tuned]
      node : Render tuned profiles	TAGS: [infra, node, node_tuned]
      node : Active tuned profile	TAGS: [infra, node, node_tuned]
      node : Change additional sysctl params	TAGS: [infra, node, node_tuned]
      node : Copy default user bash profile	TAGS: [infra, node, node_profile]
      node : Setup node default pam ulimits	TAGS: [infra, node, node_ulimit]
      node : Create os user group admin	TAGS: [infra, node, node_admin]
      node : Create os user admin	TAGS: [infra, node, node_admin]
      node : Grant admin group nopass sudo	TAGS: [infra, node, node_admin]
      node : Add no host checking to ssh config	TAGS: [infra, node, node_admin]
      node : Add admin ssh no host checking	TAGS: [infra, node, node_admin]
      node : Fetch all admin public keys	TAGS: [infra, node, node_admin]
      node : Exchange all admin ssh keys	TAGS: [infra, node, node_admin]
      node : Install public keys	TAGS: [infra, node, node_admin]
      node : Install ntp package	TAGS: [infra, node, ntp_install]
      node : Install chrony package	TAGS: [infra, node, ntp_install]
      node : Setup default node timezone	TAGS: [infra, node, ntp_config]
      node : Copy the ntp.conf file	TAGS: [infra, node, ntp_config]
      node : Copy the chrony.conf template	TAGS: [infra, node, ntp_config]
      node : Launch ntpd service	TAGS: [infra, node, ntp_launch]
      node : Launch chronyd service	TAGS: [infra, node, ntp_launch]
      consul : Check for existing consul	TAGS: [consul_check, dcs, infra]
      consul : Consul exists flag fact set	TAGS: [consul_check, dcs, infra]
      consul : Abort due to consul exists	TAGS: [consul_check, dcs, infra]
      consul : Clean existing consul instance	TAGS: [consul_clean, dcs, infra]
      consul : Stop any running consul instance	TAGS: [consul_clean, dcs, infra]
      consul : Remove existing consul dir	TAGS: [consul_clean, dcs, infra]
      consul : Recreate consul dir	TAGS: [consul_clean, dcs, infra]
      consul : Make sure consul is installed	TAGS: [consul_install, dcs, infra]
      consul : Make sure consul dir exists	TAGS: [consul_config, dcs, infra]
      consul : Get dcs server node names	TAGS: [consul_config, dcs, infra]
      consul : Get dcs node name from var	TAGS: [consul_config, dcs, infra]
      consul : Get dcs node name from var	TAGS: [consul_config, dcs, infra]
      consul : Fetch hostname as dcs node name	TAGS: [consul_config, dcs, infra]
      consul : Get dcs name from hostname	TAGS: [consul_config, dcs, infra]
      consul : Copy /etc/consul.d/consul.json	TAGS: [consul_config, dcs, infra]
      consul : Copy consul agent service	TAGS: [consul_config, dcs, infra]
      consul : Get dcs bootstrap expect quroum	TAGS: [consul_server, dcs, infra]
      consul : Copy consul server service unit	TAGS: [consul_server, dcs, infra]
      consul : Launch consul server service	TAGS: [consul_server, dcs, infra]
      consul : Wait for consul server online	TAGS: [consul_server, dcs, infra]
      consul : Launch consul agent service	TAGS: [consul_agent, dcs, infra]
      consul : Wait for consul agent online	TAGS: [consul_agent, dcs, infra]
      postgres : Create os group postgres	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Make sure dcs group exists	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Create dbsu {{ pg_dbsu }}	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Grant dbsu nopass sudo	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Grant dbsu all sudo	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Grant dbsu limited sudo	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Config patroni watchdog support	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Add dbsu ssh no host checking	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Fetch dbsu public keys	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Exchange dbsu ssh keys	TAGS: [instal, pg_dbsu, pgsql, postgres]
      postgres : Install offical pgdg yum repo	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Install pg packages	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Install pg extensions	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Link /usr/pgsql to current version	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Add pg bin dir to profile path	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Fix directory ownership	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Remove default postgres service	TAGS: [instal, pg_install, pgsql, postgres]
      postgres : Check necessary variables exists	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Fetch variables via pg_cluster	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Set cluster basic facts for hosts	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Assert cluster primary singleton	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Setup cluster primary ip address	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Setup repl upstream for primary	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Setup repl upstream for replicas	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Debug print instance summary	TAGS: [always, pg_preflight, pgsql, postgres, preflight]
      postgres : Check for existing postgres instance	TAGS: [pg_check, pgsql, postgres, prepare]
      postgres : Set fact whether pg port is open	TAGS: [pg_check, pgsql, postgres, prepare]
      postgres : Abort due to existing postgres instance	TAGS: [pg_check, pgsql, postgres, prepare]
      postgres : Clean existing postgres instance	TAGS: [pg_check, pgsql, postgres, prepare]
      postgres : Shutdown existing postgres service	TAGS: [pg_clean, pgsql, postgres, prepare]
      postgres : Remove registerd consul service	TAGS: [pg_clean, pgsql, postgres, prepare]
      postgres : Remove postgres metadata in consul	TAGS: [pg_clean, pgsql, postgres, prepare]
      postgres : Remove existing postgres data	TAGS: [pg_clean, pgsql, postgres, prepare]
      postgres : Make sure main and backup dir exists	TAGS: [pg_dir, pgsql, postgres, prepare]
      postgres : Create postgres directory structure	TAGS: [pg_dir, pgsql, postgres, prepare]
      postgres : Create pgbouncer directory structure	TAGS: [pg_dir, pgsql, postgres, prepare]
      postgres : Create links from pgbkup to pgroot	TAGS: [pg_dir, pgsql, postgres, prepare]
      postgres : Create links from current cluster	TAGS: [pg_dir, pgsql, postgres, prepare]
      postgres : Copy pg_cluster to /pg/meta/cluster	TAGS: [pg_meta, pgsql, postgres, prepare]
      postgres : Copy pg_version to /pg/meta/version	TAGS: [pg_meta, pgsql, postgres, prepare]
      postgres : Copy pg_instance to /pg/meta/instance	TAGS: [pg_meta, pgsql, postgres, prepare]
      postgres : Copy pg_seq to /pg/meta/sequence	TAGS: [pg_meta, pgsql, postgres, prepare]
      postgres : Copy pg_role to /pg/meta/role	TAGS: [pg_meta, pgsql, postgres, prepare]
      postgres : Copy postgres scripts to /pg/bin/	TAGS: [pg_scripts, pgsql, postgres, prepare]
      postgres : Copy alias profile to /etc/profile.d	TAGS: [pg_scripts, pgsql, postgres, prepare]
      postgres : Copy psqlrc to postgres home	TAGS: [pg_scripts, pgsql, postgres, prepare]
      postgres : Setup hostname to pg instance name	TAGS: [pg_hostname, pgsql, postgres, prepare]
      postgres : Copy consul node-meta definition	TAGS: [pg_nodemeta, pgsql, postgres, prepare]
      postgres : Restart consul to load new node-meta	TAGS: [pg_nodemeta, pgsql, postgres, prepare]
      postgres : Config patroni watchdog support	TAGS: [pg_watchdog, pgsql, postgres, prepare]
      postgres : Get config parameter page count	TAGS: [pg_config, pgsql, postgres]
      postgres : Get config parameter page size	TAGS: [pg_config, pgsql, postgres]
      postgres : Tune shared buffer and work mem	TAGS: [pg_config, pgsql, postgres]
      postgres : Hanlde small size mem occasion	TAGS: [pg_config, pgsql, postgres]
      postgres : Calculate postgres mem params	TAGS: [pg_config, pgsql, postgres]
      postgres : create patroni config dir	TAGS: [pg_config, pgsql, postgres]
      postgres : use predefined patroni template	TAGS: [pg_config, pgsql, postgres]
      postgres : Render default /pg/conf/patroni.yml	TAGS: [pg_config, pgsql, postgres]
      postgres : Link /pg/conf/patroni to /pg/bin/	TAGS: [pg_config, pgsql, postgres]
      postgres : Link /pg/bin/patroni.yml to /etc/patroni/	TAGS: [pg_config, pgsql, postgres]
      postgres : Config patroni watchdog support	TAGS: [pg_config, pgsql, postgres]
      postgres : Copy patroni systemd service file	TAGS: [pg_config, pgsql, postgres]
      postgres : create patroni systemd drop-in dir	TAGS: [pg_config, pgsql, postgres]
      postgres : Copy postgres systemd service file	TAGS: [pg_config, pgsql, postgres]
      postgres : Drop-In consul dependency for patroni	TAGS: [pg_config, pgsql, postgres]
      postgres : Render default initdb scripts	TAGS: [pg_config, pgsql, postgres]
      postgres : Launch patroni on primary instance	TAGS: [pg_primary, pgsql, postgres]
      postgres : Wait for patroni primary online	TAGS: [pg_primary, pgsql, postgres]
      postgres : Wait for postgres primary online	TAGS: [pg_primary, pgsql, postgres]
      postgres : Check primary postgres service ready	TAGS: [pg_primary, pgsql, postgres]
      postgres : Check replication connectivity to primary	TAGS: [pg_primary, pgsql, postgres]
      postgres : Render init roles sql	TAGS: [pg_init, pg_init_role, pgsql, postgres]
      postgres : Render init template sql	TAGS: [pg_init, pg_init_tmpl, pgsql, postgres]
      postgres : Render default pg-init scripts	TAGS: [pg_init, pg_init_main, pgsql, postgres]
      postgres : Execute initialization scripts	TAGS: [pg_init, pg_init_exec, pgsql, postgres]
      postgres : Check primary instance ready	TAGS: [pg_init, pg_init_exec, pgsql, postgres]
      postgres : Add dbsu password to pgpass if exists	TAGS: [pg_pass, pgsql, postgres]
      postgres : Add system user to pgpass	TAGS: [pg_pass, pgsql, postgres]
      postgres : Check replication connectivity to primary	TAGS: [pg_replica, pgsql, postgres]
      postgres : Launch patroni on replica instances	TAGS: [pg_replica, pgsql, postgres]
      postgres : Wait for patroni replica online	TAGS: [pg_replica, pgsql, postgres]
      postgres : Wait for postgres replica online	TAGS: [pg_replica, pgsql, postgres]
      postgres : Check replica postgres service ready	TAGS: [pg_replica, pgsql, postgres]
      postgres : Render hba rules	TAGS: [pg_hba, pgsql, postgres]
      postgres : Reload hba rules	TAGS: [pg_hba, pgsql, postgres]
      postgres : Pause patroni	TAGS: [pg_patroni, pgsql, postgres]
      postgres : Stop patroni on replica instance	TAGS: [pg_patroni, pgsql, postgres]
      postgres : Stop patroni on primary instance	TAGS: [pg_patroni, pgsql, postgres]
      postgres : Launch raw postgres on primary	TAGS: [pg_patroni, pgsql, postgres]
      postgres : Launch raw postgres on primary	TAGS: [pg_patroni, pgsql, postgres]
      postgres : Wait for postgres online	TAGS: [pg_patroni, pgsql, postgres]
      postgres : Check pgbouncer is installed	TAGS: [pgbouncer, pgbouncer_check, pgsql, postgres]
      postgres : Stop existing pgbouncer service	TAGS: [pgbouncer, pgbouncer_clean, pgsql, postgres]
      postgres : Remove existing pgbouncer dirs	TAGS: [pgbouncer, pgbouncer_clean, pgsql, postgres]
      postgres : Recreate dirs with owner postgres	TAGS: [pgbouncer, pgbouncer_clean, pgsql, postgres]
      postgres : Copy /etc/pgbouncer/pgbouncer.ini	TAGS: [pgbouncer, pgbouncer_config, pgbouncer_ini, pgsql, postgres]
      postgres : Copy /etc/pgbouncer/pgb_hba.conf	TAGS: [pgbouncer, pgbouncer_config, pgbouncer_hba, pgsql, postgres]
      postgres : Touch userlist and database list	TAGS: [pgbouncer, pgbouncer_config, pgsql, postgres]
      postgres : Add default users to pgbouncer	TAGS: [pgbouncer, pgbouncer_config, pgsql, postgres]
      postgres : Copy pgbouncer systemd service	TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
      postgres : Launch pgbouncer pool service	TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
      postgres : Wait for pgbouncer service online	TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
      postgres : Check pgbouncer service is ready	TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
      include_tasks	TAGS: [pg_user, pgsql, postgres]
      include_tasks	TAGS: [pg_db, pgsql, postgres]
      postgres : Reload pgbouncer to add db and users	TAGS: [pgbouncer_reload, pgsql, postgres]
      postgres : Copy pg service definition to consul	TAGS: [pg_register, pgsql, postgres, register]
      postgres : Reload postgres consul service	TAGS: [pg_register, pgsql, postgres, register]
      postgres : Render grafana datasource definition	TAGS: [pg_grafana, pgsql, postgres, register]
      postgres : Register datasource to grafana	TAGS: [pg_grafana, pgsql, postgres, register]
      monitor : Install exporter yum repo	TAGS: [exporter_install, exporter_yum_install, monitor, pgsql]
      monitor : Install node_exporter and pg_exporter	TAGS: [exporter_install, exporter_yum_install, monitor, pgsql]
      monitor : Copy node_exporter binary	TAGS: [exporter_binary_install, exporter_install, monitor, pgsql]
      monitor : Copy pg_exporter binary	TAGS: [exporter_binary_install, exporter_install, monitor, pgsql]
      monitor : Create /etc/pg_exporter conf dir	TAGS: [monitor, pg_exporter, pgsql]
      monitor : Copy default pg_exporter.yaml	TAGS: [monitor, pg_exporter, pgsql]
      monitor : Config /etc/default/pg_exporter	TAGS: [monitor, pg_exporter, pgsql]
      monitor : Config pg_exporter service unit	TAGS: [monitor, pg_exporter, pgsql]
      monitor : Launch pg_exporter systemd service	TAGS: [monitor, pg_exporter, pgsql]
      monitor : Wait for pg_exporter service online	TAGS: [monitor, pg_exporter, pgsql]
      monitor : Register pg-exporter consul service	TAGS: [monitor, pg_exporter_register, pgsql]
      monitor : Reload pg-exporter consul service	TAGS: [monitor, pg_exporter_register, pgsql]
      monitor : Config pgbouncer_exporter opts	TAGS: [monitor, pgbouncer_exporter, pgsql]
      monitor : Config pgbouncer_exporter service	TAGS: [monitor, pgbouncer_exporter, pgsql]
      monitor : Launch pgbouncer_exporter service	TAGS: [monitor, pgbouncer_exporter, pgsql]
      monitor : Wait for pgbouncer_exporter online	TAGS: [monitor, pgbouncer_exporter, pgsql]
      monitor : Register pgb-exporter consul service	TAGS: [monitor, node_exporter_register, pgsql]
      monitor : Reload pgb-exporter consul service	TAGS: [monitor, node_exporter_register, pgsql]
      monitor : Copy node_exporter systemd service	TAGS: [monitor, node_exporter, pgsql]
      monitor : Config default node_exporter options	TAGS: [monitor, node_exporter, pgsql]
      monitor : Launch node_exporter service unit	TAGS: [monitor, node_exporter, pgsql]
      monitor : Wait for node_exporter online	TAGS: [monitor, node_exporter, pgsql]
      monitor : Register node-exporter service to consul	TAGS: [monitor, node_exporter_register, pgsql]
      monitor : Reload node-exporter consul service	TAGS: [monitor, node_exporter_register, pgsql]
      service : Make sure haproxy is installed	TAGS: [haproxy_install, service]
      service : Create haproxy directory	TAGS: [haproxy_install, service]
      service : Copy haproxy systemd service file	TAGS: [haproxy_install, haproxy_unit, service]
      service : Fetch postgres cluster memberships	TAGS: [haproxy_config, service]
      service : Templating /etc/haproxy/haproxy.cfg	TAGS: [haproxy_config, service]
      service : Launch haproxy load balancer service	TAGS: [haproxy_launch, haproxy_restart, service]
      service : Wait for haproxy load balancer online	TAGS: [haproxy_launch, service]
      service : Reload haproxy load balancer service	TAGS: [haproxy_reload, service]
      service : Copy haproxy exporter definition	TAGS: [haproxy_register, service]
      service : Copy haproxy service definition	TAGS: [haproxy_register, service]
      service : Reload haproxy consul service	TAGS: [haproxy_register, service]
      service : Make sure vip-manager is installed	TAGS: [service, vip_l2_install]
      service : Copy vip-manager systemd service file	TAGS: [service, vip_l2_install]
      service : create vip-manager systemd drop-in dir	TAGS: [service, vip_l2_install]
      service : create vip-manager systemd drop-in file	TAGS: [service, vip_l2_install]
      service : Templating /etc/default/vip-manager.yml	TAGS: [service, vip_l2_config, vip_manager_config]
      service : Launch vip-manager	TAGS: [service, vip_l2_reload]
      service : Fetch postgres cluster memberships	TAGS: [service, vip_l4_config]
      service : Render L4 VIP configs	TAGS: [service, vip_l4_config]
      include_tasks	TAGS: [service, vip_l4_reload]

5.3.3 - 沙箱初始化

如何使用快速部署沙箱环境

常规初始化流程需要先完成元节点/基础设施的初始化,再完成其他数据库节点的初始化。

为了加快沙箱环境的初始化速度,Pigsty提供了专用于沙箱的初始化剧本sandbox.yml,可以采用交织的方式一次性同时完成基础设施元节点和普通节点的初始化。这种初始化方式很快,但不建议在生产环境使用。

剧本概览

用户可以直接调用sandbox.yml或通过make init的快捷方式完成沙箱环境的一键初始化。

./sandbox.yml

注意事项

沙箱初始化的具体注意事项与 基础设施部署PG集群部署 一致。

剧本说明

sandbox.ymlinfra.ymlpgsql.yml的工作交织在一起,如下所示:

#------------------------------------------------------------------------------
# init local yum repo on meta node
#------------------------------------------------------------------------------
- name: Init local repo
  become: yes
  hosts: meta
  gather_facts: no
  tags: repo
  roles:
    - repo
#------------------------------------------------------------------------------
# provision all nodes
#------------------------------------------------------------------------------
# node provision depends on existing repo on meta node
- name: Provision Node
  become: yes
  hosts: all
  gather_facts: no
  tags: node
  roles:
    - node
#------------------------------------------------------------------------------
# init meta service on meta node
#------------------------------------------------------------------------------
# meta provision depends on node provision. You'll have to provision node on meta node
# then provision meta infrastructure on meta node
- name: Init meta service
  become: yes
  hosts: meta
  gather_facts: no
  tags: meta
  roles:
    - role: ca
      tags: ca
    - role: nameserver
      tags: nameserver
    - role: nginx
      tags: nginx
    - role: prometheus
      tags: prometheus
    - role: grafana
      tags: grafana
#------------------------------------------------------------------------------
# init dcs on nodes
#------------------------------------------------------------------------------
# typically you'll have to bootstrap dcs on meta node first (or use external dcs)
# but pigsty allows you to setup server and agent at the same time.
- name: Init dcs
  become: yes
  hosts: all            # provision all nodes or just meta nodes
  gather_facts: no
  roles:
    - role: consul
      tags: dcs
#------------------------------------------------------------------------------
# create or recreate postgres database clusters
#------------------------------------------------------------------------------
- name: Init database cluster
  become: yes
  hosts: all
  gather_facts: false

  roles:
    - role: postgres                        # init postgres
      tags: postgres

    - role: monitor                         # init monitor system
      tags: monitor

    - role: haproxy                         # init haproxy
      tags: haproxy

    - role: vip                             # init vip-manager
      tags: vip

默认任务

使用以下命令可以列出所有沙箱初始化会执行的任务,以及可以使用的标签:

./sandbox.yml --list-tasks

任务列表如下:

playbook: ./sandbox.yml

  play #1 (meta): Init local repo	TAGS: [repo]
    tasks:
      repo : Create local repo directory	TAGS: [repo, repo_dir]
      repo : Backup & remove existing repos	TAGS: [repo, repo_upstream]
      repo : Add required upstream repos	TAGS: [repo, repo_upstream]
      repo : Check repo pkgs cache exists	TAGS: [repo, repo_prepare]
      repo : Set fact whether repo_exists	TAGS: [repo, repo_prepare]
      repo : Move upstream repo to backup	TAGS: [repo, repo_prepare]
      repo : Add local file system repos	TAGS: [repo, repo_prepare]
      repo : Remake yum cache if not exists	TAGS: [repo, repo_prepare]
      repo : Install repo bootstrap packages	TAGS: [repo, repo_boot]
      repo : Render repo nginx server files	TAGS: [repo, repo_nginx]
      repo : Disable selinux for repo server	TAGS: [repo, repo_nginx]
      repo : Launch repo nginx server	TAGS: [repo, repo_nginx]
      repo : Waits repo server online	TAGS: [repo, repo_nginx]
      repo : Download web url packages	TAGS: [repo, repo_download]
      repo : Download repo packages	TAGS: [repo, repo_download]
      repo : Download repo pkg deps	TAGS: [repo, repo_download]
      repo : Create local repo index	TAGS: [repo, repo_download]
      repo : Copy bootstrap scripts	TAGS: [repo, repo_download, repo_script]
      repo : Mark repo cache as valid	TAGS: [repo, repo_download]

  play #2 (all): Provision Node	TAGS: [node]
    tasks:
      node : Update node hostname	TAGS: [node, node_name]
      node : Add new hostname to /etc/hosts	TAGS: [node, node_name]
      node : Write static dns records	TAGS: [node, node_dns]
      node : Get old nameservers	TAGS: [node, node_resolv]
      node : Truncate resolv file	TAGS: [node, node_resolv]
      node : Write resolv options	TAGS: [node, node_resolv]
      node : Add new nameservers	TAGS: [node, node_resolv]
      node : Append old nameservers	TAGS: [node, node_resolv]
      node : Node configure disable firewall	TAGS: [node, node_firewall]
      node : Node disable selinux by default	TAGS: [node, node_firewall]
      node : Backup existing repos	TAGS: [node, node_repo]
      node : Install upstream repo	TAGS: [node, node_repo]
      node : Install local repo	TAGS: [node, node_repo]
      node : Install node basic packages	TAGS: [node, node_pkgs]
      node : Install node extra packages	TAGS: [node, node_pkgs]
      node : Install meta specific packages	TAGS: [node, node_pkgs]
      node : Install node basic packages	TAGS: [node, node_pkgs]
      node : Install node extra packages	TAGS: [node, node_pkgs]
      node : Install meta specific packages	TAGS: [node, node_pkgs]
      node : Node configure disable numa	TAGS: [node, node_feature]
      node : Node configure disable swap	TAGS: [node, node_feature]
      node : Node configure unmount swap	TAGS: [node, node_feature]
      node : Node setup static network	TAGS: [node, node_feature]
      node : Node configure disable firewall	TAGS: [node, node_feature]
      node : Node configure disk prefetch	TAGS: [node, node_feature]
      node : Enable linux kernel modules	TAGS: [node, node_kernel]
      node : Enable kernel module on reboot	TAGS: [node, node_kernel]
      node : Get config parameter page count	TAGS: [node, node_tuned]
      node : Get config parameter page size	TAGS: [node, node_tuned]
      node : Tune shmmax and shmall via mem	TAGS: [node, node_tuned]
      node : Create tuned profiles	TAGS: [node, node_tuned]
      node : Render tuned profiles	TAGS: [node, node_tuned]
      node : Active tuned profile	TAGS: [node, node_tuned]
      node : Change additional sysctl params	TAGS: [node, node_tuned]
      node : Copy default user bash profile	TAGS: [node, node_profile]
      node : Setup node default pam ulimits	TAGS: [node, node_ulimit]
      node : Create os user group admin	TAGS: [node, node_admin]
      node : Create os user admin	TAGS: [node, node_admin]
      node : Grant admin group nopass sudo	TAGS: [node, node_admin]
      node : Add no host checking to ssh config	TAGS: [node, node_admin]
      node : Add admin ssh no host checking	TAGS: [node, node_admin]
      node : Fetch all admin public keys	TAGS: [node, node_admin]
      node : Exchange all admin ssh keys	TAGS: [node, node_admin]
      node : Install public keys	TAGS: [node, node_admin]
      node : Install ntp package	TAGS: [node, ntp_install]
      node : Install chrony package	TAGS: [node, ntp_install]
      node : Setup default node timezone	TAGS: [node, ntp_config]
      node : Copy the ntp.conf file	TAGS: [node, ntp_config]
      node : Copy the chrony.conf template	TAGS: [node, ntp_config]
      node : Launch ntpd service	TAGS: [node, ntp_launch]
      node : Launch chronyd service	TAGS: [node, ntp_launch]

  play #3 (meta): Init meta service	TAGS: [meta]
    tasks:
      ca : Create local ca directory	TAGS: [ca, ca_dir, meta]
      ca : Copy ca cert from local files	TAGS: [ca, ca_copy, meta]
      ca : Check ca key cert exists	TAGS: [ca, ca_create, meta]
      ca : Create self-signed CA key-cert	TAGS: [ca, ca_create, meta]
      nameserver : Make sure dnsmasq package installed	TAGS: [meta, nameserver]
      nameserver : Copy dnsmasq /etc/dnsmasq.d/config	TAGS: [meta, nameserver]
      nameserver : Add dynamic dns records to meta	TAGS: [meta, nameserver]
      nameserver : Launch meta dnsmasq service	TAGS: [meta, nameserver]
      nameserver : Wait for meta dnsmasq online	TAGS: [meta, nameserver]
      nameserver : Register consul dnsmasq service	TAGS: [meta, nameserver]
      nameserver : Reload consul	TAGS: [meta, nameserver]
      nginx : Make sure nginx package installed	TAGS: [meta, nginx, nginx_install]
      nginx : Create local html directory	TAGS: [meta, nginx, nginx_dir]
      nginx : Update default nginx index page	TAGS: [meta, nginx, nginx_dir]
      nginx : Copy nginx default config	TAGS: [meta, nginx, nginx_config]
      nginx : Copy nginx upstream conf	TAGS: [meta, nginx, nginx_config]
      nginx : Fetch haproxy facts	TAGS: [meta, nginx, nginx_config, nginx_haproxy]
      nginx : Templating /etc/nginx/haproxy.conf	TAGS: [meta, nginx, nginx_config, nginx_haproxy]
      nginx : Templating haproxy.html	TAGS: [meta, nginx, nginx_config, nginx_haproxy]
      nginx : Launch nginx server	TAGS: [meta, nginx, nginx_reload]
      nginx : Restart meta nginx service	TAGS: [meta, nginx, nginx_launch]
      nginx : Wait for nginx service online	TAGS: [meta, nginx, nginx_launch]
      nginx : Make sure nginx exporter installed	TAGS: [meta, nginx, nginx_exporter]
      nginx : Config nginx_exporter options	TAGS: [meta, nginx, nginx_exporter]
      nginx : Restart nginx_exporter service	TAGS: [meta, nginx, nginx_exporter]
      nginx : Wait for nginx exporter online	TAGS: [meta, nginx, nginx_exporter]
      nginx : Register cosnul nginx service	TAGS: [meta, nginx, nginx_register]
      nginx : Register consul nginx-exporter service	TAGS: [meta, nginx, nginx_register]
      nginx : Reload consul	TAGS: [meta, nginx, nginx_register]
      prometheus : Install prometheus and alertmanager	TAGS: [meta, prometheus, prometheus_install]
      prometheus : Wipe out prometheus config dir	TAGS: [meta, prometheus, prometheus_clean]
      prometheus : Wipe out existing prometheus data	TAGS: [meta, prometheus, prometheus_clean]
      prometheus : Create postgres directory structure	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Copy prometheus bin scripts	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Copy prometheus rules scripts	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Copy altermanager config	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Render prometheus config	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Config /etc/prometheus opts	TAGS: [meta, prometheus, prometheus_config]
      prometheus : Fetch prometheus static monitoring targets	TAGS: [meta, prometheus, prometheus_config, prometheus_targets]
      prometheus : Render prometheus static targets	TAGS: [meta, prometheus, prometheus_config, prometheus_targets]
      prometheus : Launch prometheus service	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Launch alertmanager service	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Wait for prometheus online	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Wait for alertmanager online	TAGS: [meta, prometheus, prometheus_launch]
      prometheus : Reload prometheus service	TAGS: [meta, prometheus, prometheus_reload]
      prometheus : Copy prometheus service definition	TAGS: [meta, prometheus, prometheus_register]
      prometheus : Copy alertmanager service definition	TAGS: [meta, prometheus, prometheus_register]
      prometheus : Reload consul to register prometheus	TAGS: [meta, prometheus, prometheus_register]
      grafana : Make sure grafana is installed	TAGS: [grafana, grafana_install, meta]
      grafana : Check grafana plugin cache exists	TAGS: [grafana, grafana_plugin, meta]
      grafana : Provision grafana plugins via cache	TAGS: [grafana, grafana_plugin, meta]
      grafana : Download grafana plugins from web	TAGS: [grafana, grafana_plugin, meta]
      grafana : Download grafana plugins from web	TAGS: [grafana, grafana_plugin, meta]
      grafana : Create grafana plugins cache	TAGS: [grafana, grafana_plugin, meta]
      grafana : Copy /etc/grafana/grafana.ini	TAGS: [grafana, grafana_config, meta]
      grafana : Remove grafana provision dir	TAGS: [grafana, grafana_config, meta]
      grafana : Copy provisioning content	TAGS: [grafana, grafana_config, meta]
      grafana : Copy pigsty dashboards	TAGS: [grafana, grafana_config, meta]
      grafana : Copy pigsty icon image	TAGS: [grafana, grafana_config, meta]
      grafana : Replace grafana icon with pigsty	TAGS: [grafana, grafana_config, grafana_customize, meta]
      grafana : Launch grafana service	TAGS: [grafana, grafana_launch, meta]
      grafana : Wait for grafana online	TAGS: [grafana, grafana_launch, meta]
      grafana : Update grafana default preferences	TAGS: [grafana, grafana_provision, meta]
      grafana : Register consul grafana service	TAGS: [grafana, grafana_register, meta]
      grafana : Reload consul	TAGS: [grafana, grafana_register, meta]

  play #4 (all): Init dcs	TAGS: []
    tasks:
      consul : Check for existing consul	TAGS: [consul_check, dcs]
      consul : Consul exists flag fact set	TAGS: [consul_check, dcs]
      consul : Abort due to consul exists	TAGS: [consul_check, dcs]
      consul : Clean existing consul instance	TAGS: [consul_clean, dcs]
      consul : Stop any running consul instance	TAGS: [consul_clean, dcs]
      consul : Remove existing consul dir	TAGS: [consul_clean, dcs]
      consul : Recreate consul dir	TAGS: [consul_clean, dcs]
      consul : Make sure consul is installed	TAGS: [consul_install, dcs]
      consul : Make sure consul dir exists	TAGS: [consul_config, dcs]
      consul : Get dcs server node names	TAGS: [consul_config, dcs]
      consul : Get dcs node name from var	TAGS: [consul_config, dcs]
      consul : Get dcs node name from var	TAGS: [consul_config, dcs]
      consul : Fetch hostname as dcs node name	TAGS: [consul_config, dcs]
      consul : Get dcs name from hostname	TAGS: [consul_config, dcs]
      consul : Copy /etc/consul.d/consul.json	TAGS: [consul_config, dcs]
      consul : Copy consul agent service	TAGS: [consul_config, dcs]
      consul : Get dcs bootstrap expect quroum	TAGS: [consul_server, dcs]
      consul : Copy consul server service unit	TAGS: [consul_server, dcs]
      consul : Launch consul server service	TAGS: [consul_server, dcs]
      consul : Wait for consul server online	TAGS: [consul_server, dcs]
      consul : Launch consul agent service	TAGS: [consul_agent, dcs]
      consul : Wait for consul agent online	TAGS: [consul_agent, dcs]

  play #5 (all): Init database cluster	TAGS: []
    tasks:
      postgres : Create os group postgres	TAGS: [instal, pg_dbsu, postgres]
      postgres : Make sure dcs group exists	TAGS: [instal, pg_dbsu, postgres]
      postgres : Create dbsu {{ pg_dbsu }}	TAGS: [instal, pg_dbsu, postgres]
      postgres : Grant dbsu nopass sudo	TAGS: [instal, pg_dbsu, postgres]
      postgres : Grant dbsu all sudo	TAGS: [instal, pg_dbsu, postgres]
      postgres : Grant dbsu limited sudo	TAGS: [instal, pg_dbsu, postgres]
      postgres : Config patroni watchdog support	TAGS: [instal, pg_dbsu, postgres]
      postgres : Add dbsu ssh no host checking	TAGS: [instal, pg_dbsu, postgres]
      postgres : Fetch dbsu public keys	TAGS: [instal, pg_dbsu, postgres]
      postgres : Exchange dbsu ssh keys	TAGS: [instal, pg_dbsu, postgres]
      postgres : Install offical pgdg yum repo	TAGS: [instal, pg_install, postgres]
      postgres : Install pg packages	TAGS: [instal, pg_install, postgres]
      postgres : Install pg extensions	TAGS: [instal, pg_install, postgres]
      postgres : Link /usr/pgsql to current version	TAGS: [instal, pg_install, postgres]
      postgres : Add pg bin dir to profile path	TAGS: [instal, pg_install, postgres]
      postgres : Fix directory ownership	TAGS: [instal, pg_install, postgres]
      postgres : Remove default postgres service	TAGS: [instal, pg_install, postgres]
      postgres : Check necessary variables exists	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Fetch variables via pg_cluster	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Set cluster basic facts for hosts	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Assert cluster primary singleton	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Setup cluster primary ip address	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Setup repl upstream for primary	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Setup repl upstream for replicas	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Debug print instance summary	TAGS: [always, pg_preflight, postgres, preflight]
      postgres : Check for existing postgres instance	TAGS: [pg_check, postgres, prepare]
      postgres : Set fact whether pg port is open	TAGS: [pg_check, postgres, prepare]
      postgres : Abort due to existing postgres instance	TAGS: [pg_check, postgres, prepare]
      postgres : Clean existing postgres instance	TAGS: [pg_check, postgres, prepare]
      postgres : Shutdown existing postgres service	TAGS: [pg_clean, postgres, prepare]
      postgres : Remove registerd consul service	TAGS: [pg_clean, postgres, prepare]
      postgres : Remove postgres metadata in consul	TAGS: [pg_clean, postgres, prepare]
      postgres : Remove existing postgres data	TAGS: [pg_clean, postgres, prepare]
      postgres : Make sure main and backup dir exists	TAGS: [pg_dir, postgres, prepare]
      postgres : Create postgres directory structure	TAGS: [pg_dir, postgres, prepare]
      postgres : Create pgbouncer directory structure	TAGS: [pg_dir, postgres, prepare]
      postgres : Create links from pgbkup to pgroot	TAGS: [pg_dir, postgres, prepare]
      postgres : Create links from current cluster	TAGS: [pg_dir, postgres, prepare]
      postgres : Copy pg_cluster to /pg/meta/cluster	TAGS: [pg_meta, postgres, prepare]
      postgres : Copy pg_version to /pg/meta/version	TAGS: [pg_meta, postgres, prepare]
      postgres : Copy pg_instance to /pg/meta/instance	TAGS: [pg_meta, postgres, prepare]
      postgres : Copy pg_seq to /pg/meta/sequence	TAGS: [pg_meta, postgres, prepare]
      postgres : Copy pg_role to /pg/meta/role	TAGS: [pg_meta, postgres, prepare]
      postgres : Copy postgres scripts to /pg/bin/	TAGS: [pg_scripts, postgres, prepare]
      postgres : Copy alias profile to /etc/profile.d	TAGS: [pg_scripts, postgres, prepare]
      postgres : Copy psqlrc to postgres home	TAGS: [pg_scripts, postgres, prepare]
      postgres : Setup hostname to pg instance name	TAGS: [pg_hostname, postgres, prepare]
      postgres : Copy consul node-meta definition	TAGS: [pg_nodemeta, postgres, prepare]
      postgres : Restart consul to load new node-meta	TAGS: [pg_nodemeta, postgres, prepare]
      postgres : Config patroni watchdog support	TAGS: [pg_watchdog, postgres, prepare]
      postgres : Get config parameter page count	TAGS: [pg_config, postgres]
      postgres : Get config parameter page size	TAGS: [pg_config, postgres]
      postgres : Tune shared buffer and work mem	TAGS: [pg_config, postgres]
      postgres : Hanlde small size mem occasion	TAGS: [pg_config, postgres]
      postgres : Calculate postgres mem params	TAGS: [pg_config, postgres]
      postgres : create patroni config dir	TAGS: [pg_config, postgres]
      postgres : use predefined patroni template	TAGS: [pg_config, postgres]
      postgres : Render default /pg/conf/patroni.yml	TAGS: [pg_config, postgres]
      postgres : Link /pg/conf/patroni to /pg/bin/	TAGS: [pg_config, postgres]
      postgres : Link /pg/bin/patroni.yml to /etc/patroni/	TAGS: [pg_config, postgres]
      postgres : Config patroni watchdog support	TAGS: [pg_config, postgres]
      postgres : create patroni systemd drop-in dir	TAGS: [pg_config, postgres]
      postgres : Copy postgres systemd service file	TAGS: [pg_config, postgres]
      postgres : create patroni systemd drop-in file	TAGS: [pg_config, postgres]
      postgres : Render default initdb scripts	TAGS: [pg_config, postgres]
      postgres : Launch patroni on primary instance	TAGS: [pg_primary, postgres]
      postgres : Wait for patroni primary online	TAGS: [pg_primary, postgres]
      postgres : Wait for postgres primary online	TAGS: [pg_primary, postgres]
      postgres : Check primary postgres service ready	TAGS: [pg_primary, postgres]
      postgres : Check replication connectivity to primary	TAGS: [pg_primary, postgres]
      postgres : Render default pg-init scripts	TAGS: [pg_init, pg_init_config, postgres]
      postgres : Render template init script	TAGS: [pg_init, pg_init_config, postgres]
      postgres : Execute initialization scripts	TAGS: [pg_init, postgres]
      postgres : Check primary instance ready	TAGS: [pg_init, postgres]
      postgres : Add dbsu password to pgpass if exists	TAGS: [pg_pass, postgres]
      postgres : Add system user to pgpass	TAGS: [pg_pass, postgres]
      postgres : Check replication connectivity to primary	TAGS: [pg_replica, postgres]
      postgres : Launch patroni on replica instances	TAGS: [pg_replica, postgres]
      postgres : Wait for patroni replica online	TAGS: [pg_replica, postgres]
      postgres : Wait for postgres replica online	TAGS: [pg_replica, postgres]
      postgres : Check replica postgres service ready	TAGS: [pg_replica, postgres]
      postgres : Render hba rules	TAGS: [pg_hba, postgres]
      postgres : Reload hba rules	TAGS: [pg_hba, postgres]
      postgres : Pause patroni	TAGS: [pg_patroni, postgres]
      postgres : Stop patroni on replica instance	TAGS: [pg_patroni, postgres]
      postgres : Stop patroni on primary instance	TAGS: [pg_patroni, postgres]
      postgres : Launch raw postgres on primary	TAGS: [pg_patroni, postgres]
      postgres : Launch raw postgres on primary	TAGS: [pg_patroni, postgres]
      postgres : Wait for postgres online	TAGS: [pg_patroni, postgres]
      postgres : Check pgbouncer is installed	TAGS: [pgbouncer, pgbouncer_check, postgres]
      postgres : Stop existing pgbouncer service	TAGS: [pgbouncer, pgbouncer_clean, postgres]
      postgres : Remove existing pgbouncer dirs	TAGS: [pgbouncer, pgbouncer_clean, postgres]
      postgres : Recreate dirs with owner postgres	TAGS: [pgbouncer, pgbouncer_clean, postgres]
      postgres : Copy /etc/pgbouncer/pgbouncer.ini	TAGS: [pgbouncer, pgbouncer_config, pgbouncer_ini, postgres]
      postgres : Copy /etc/pgbouncer/pgb_hba.conf	TAGS: [pgbouncer, pgbouncer_config, pgbouncer_hba, postgres]
      postgres : Touch userlist and database list	TAGS: [pgbouncer, pgbouncer_config, postgres]
      postgres : Add default users to pgbouncer	TAGS: [pgbouncer, pgbouncer_config, postgres]
      postgres : Copy pgbouncer systemd service	TAGS: [pgbouncer, pgbouncer_launch, postgres]
      postgres : Launch pgbouncer pool service	TAGS: [pgbouncer, pgbouncer_launch, postgres]
      postgres : Wait for pgbouncer service online	TAGS: [pgbouncer, pgbouncer_launch, postgres]
      postgres : Check pgbouncer service is ready	TAGS: [pgbouncer, pgbouncer_launch, postgres]
      postgres : Render business init script	TAGS: [business, pg_biz_config, pg_biz_init, postgres]
      postgres : Render database baseline sql	TAGS: [business, pg_biz_config, pg_biz_init, postgres]
      postgres : Execute business init script	TAGS: [business, pg_biz_init, postgres]
      postgres : Execute database baseline sql	TAGS: [business, pg_biz_init, postgres]
      postgres : Add pgbouncer busniess users	TAGS: [business, pg_biz_pgbouncer, postgres]
      postgres : Add pgbouncer busniess database	TAGS: [business, pg_biz_pgbouncer, postgres]
      postgres : Restart pgbouncer	TAGS: [business, pg_biz_pgbouncer, postgres]
      postgres : Copy pg service definition to consul	TAGS: [pg_register, postgres, register]
      postgres : Reload postgres consul service	TAGS: [pg_register, postgres, register]
      postgres : Render grafana datasource definition	TAGS: [pg_grafana, postgres, register]
      postgres : Register datasource to grafana	TAGS: [pg_grafana, postgres, register]
      monitor : Create /etc/pg_exporter conf dir	TAGS: [monitor, pg_exporter]
      monitor : Copy default pg_exporter.yaml	TAGS: [monitor, pg_exporter]
      monitor : Config /etc/default/pg_exporter	TAGS: [monitor, pg_exporter]
      monitor : Copy pg_exporter binary	TAGS: [monitor, pg_exporter, pg_exporter_binary]
      monitor : Config pg_exporter service unit	TAGS: [monitor, pg_exporter]
      monitor : Launch pg_exporter systemd service	TAGS: [monitor, pg_exporter]
      monitor : Wait for pg_exporter service online	TAGS: [monitor, pg_exporter]
      monitor : Register pg-exporter consul service	TAGS: [monitor, pg_exporter_register]
      monitor : Reload pg-exporter consul service	TAGS: [monitor, pg_exporter_register]
      monitor : Config pgbouncer_exporter opts	TAGS: [monitor, pgbouncer_exporter]
      monitor : Config pgbouncer_exporter service	TAGS: [monitor, pgbouncer_exporter]
      monitor : Launch pgbouncer_exporter service	TAGS: [monitor, pgbouncer_exporter]
      monitor : Wait for pgbouncer_exporter online	TAGS: [monitor, pgbouncer_exporter]
      monitor : Register pgb-exporter consul service	TAGS: [monitor, node_exporter_register]
      monitor : Reload pgb-exporter consul service	TAGS: [monitor, node_exporter_register]
      monitor : Copy node_exporter binary	TAGS: [monitor, node_exporter, node_exporter_binary]
      monitor : Copy node_exporter systemd service	TAGS: [monitor, node_exporter]
      monitor : Config default node_exporter options	TAGS: [monitor, node_exporter]
      monitor : Launch node_exporter service unit	TAGS: [monitor, node_exporter]
      monitor : Wait for node_exporter online	TAGS: [monitor, node_exporter]
      monitor : Register node-exporter service to consul	TAGS: [monitor, node_exporter_register]
      monitor : Reload node-exporter consul service	TAGS: [monitor, node_exporter_register]
      haproxy : Make sure haproxy is installed	TAGS: [haproxy, haproxy_install]
      haproxy : Create haproxy directory	TAGS: [haproxy, haproxy_install]
      haproxy : Copy haproxy systemd service file	TAGS: [haproxy, haproxy_install, haproxy_unit]
      haproxy : Fetch postgres cluster memberships	TAGS: [haproxy, haproxy_config]
      haproxy : Templating /etc/haproxy/haproxy.cfg	TAGS: [haproxy, haproxy_config]
      haproxy : Launch haproxy load balancer service	TAGS: [haproxy, haproxy_launch, haproxy_restart]
      haproxy : Wait for haproxy load balancer online	TAGS: [haproxy, haproxy_launch]
      haproxy : Reload haproxy load balancer service	TAGS: [haproxy, haproxy_reload]
      haproxy : Copy haproxy service definition	TAGS: [haproxy, haproxy_register]
      haproxy : Reload haproxy consul service	TAGS: [haproxy, haproxy_register]
      vip : Templating /etc/default/vip-manager.yml	TAGS: [vip]
      vip : create vip-manager. systemd drop-in dir	TAGS: [vip]
      vip : create vip-manager systemd drop-in file	TAGS: [vip]
      vip : Launch vip-manager	TAGS: [vip]

5.3.4 - 下线数据库集群

如何下线PostgreSQL数据库集群与实例

剧本概览

数据库下线:可以移除现有的数据库集群或实例,回收节点:pgsql-remove.yml

日常管理

./pgsql-remove.yml -l pg-test  # 下线在 pg-test 集群
./pgsql-remove.yml -l pg-test  -l 10.10.10.13 # 下线在 pg-test 集群中的一个实例

剧本说明

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   pgsql-remove.yml
# Mtime     :   2020-05-12
# Mtime     :   2021-03-15
# Desc      :   remove postgres & consul services
# Path      :   pgsql-remove.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# this playbook aims at removing postgres & consul & related service
# from # existing instances. So that the node can be recycled for
# re-initialize or other database clusters.

#------------------------------------------------------------------------------
# Remove load balancer
#------------------------------------------------------------------------------
- name: Remove load balancer
  become: yes
  hosts: all
  serial: 1
  gather_facts: no
  tags: rm-lb
  tasks:
    - name: Stop load balancer
      ignore_errors: true
      systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
      with_items:
        - vip-manager
        - haproxy
        # - keepalived


#------------------------------------------------------------------------------
# Remove pg monitor
#------------------------------------------------------------------------------
- name: Remove monitor
  become: yes
  hosts: all
  gather_facts: no
  tags: rm-monitor
  tasks:

    - name: Stop monitor service
      ignore_errors: true
      systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
      with_items:
        - pg_exporter
        - pgbouncer_exporter

    - name: Deregister exporter service
      ignore_errors: true
      file: path=/etc/consul.d/svc-{{ item }}.json state=absent
      with_items:
        - haproxy
        - pg-exporter
        - pgbouncer-exporter

    - name: Reload consul
      systemd: name=consul state=reloaded


#------------------------------------------------------------------------------
# Remove watchdog owner
#------------------------------------------------------------------------------
- name: Remove monitor
  become: yes
  hosts: all
  gather_facts: no
  tags: rm-watchdog
  tasks:
    # - watchdog owner - #
    - name: Remove patroni watchdog ownership
      ignore_errors: true
      file: path=/dev/watchdog owner=root group=root


#------------------------------------------------------------------------------
# Remove postgres service
#------------------------------------------------------------------------------
- name: Remove Postgres service
  become: yes
  hosts: all
  serial: 1
  gather_facts: no
  tags: rm-pg
  tasks:
    - name: Remove postgres replica services
      when: pg_role != 'primary'
      ignore_errors: true
      systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
      with_items:
        - patroni
        - postgres
        - pgbouncer

    # if in resume mode, postgres will not be stopped
    - name: Force stop postgres non-primary process
      become_user: "{{ pg_dbsu }}"
      when: pg_role != 'primary'
      ignore_errors: true
      shell: |
        {{ pg_bin_dir }}/pg_ctl -D {{ pg_data }} stop -m immediate
        exit 0        

    - name: Remove postgres primary services
      when: pg_role == 'primary'
      ignore_errors: true
      systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
      with_items:
        - patroni
        - postgres
        - pgbouncer

    - name: Force stop postgres primary process
      become_user: "{{ pg_dbsu }}"
      when: pg_role == 'primary'
      ignore_errors: true
      shell: |
        {{ pg_bin_dir }}/pg_ctl -D {{ pg_data }} stop -m immediate
        exit 0        

    - name: Deregister postgres services
      ignore_errors: true
      file: path=/etc/consul.d/svc-{{ item }}.json state=absent
      with_items:
        - postgres
        - pgbouncer
        - patroni


#------------------------------------------------------------------------------
# Remove postgres service
#------------------------------------------------------------------------------
- name: Remove Infrastructure
  become: yes
  hosts: all
  serial: 1
  gather_facts: no
  tags: rm-infra
  tasks:

    - name: Consul leave cluster
      ignore_errors: true
      command: /usr/bin/consul leave

    - name: Stop consul and node_exporter
      ignore_errors: true
      systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
      with_items:
        - node_exporter
        - consul

#------------------------------------------------------------------------------
# Uninstall postgres and consul
#------------------------------------------------------------------------------
- name: Uninstall Packages
  become: yes
  hosts: all
  gather_facts: no
  tags: rm-pkgs
  tasks:
    - name: Uninstall postgres and consul
      when: yum_remove is defined and yum_remove|bool
      shell: |
        yum remove -y consul
        yum remove -y postgresql{{ pg_version }}*        

...

使用样例

./pgsql-remove.yml -l pg-test 

执行结果


任务详情

默认任务如下:

playbook: ./pgsql-remove.yml

  play #1 (all): Remove load balancer	TAGS: [rm-lb]
    tasks:
      Stop load balancer	TAGS: [rm-lb]

  play #2 (all): Remove monitor	TAGS: [rm-monitor]
    tasks:
      Stop monitor service	TAGS: [rm-monitor]
      Deregister exporter service	TAGS: [rm-monitor]
      Reload consul	TAGS: [rm-monitor]

  play #3 (all): Remove monitor	TAGS: [rm-watchdog]
    tasks:
      Remove patroni watchdog ownership	TAGS: [rm-watchdog]

  play #4 (all): Remove Postgres service	TAGS: [rm-pg]
    tasks:
      Remove postgres replica services	TAGS: [rm-pg]
      Force stop postgres non-primary process	TAGS: [rm-pg]
      Remove postgres primary services	TAGS: [rm-pg]
      Force stop postgres primary process	TAGS: [rm-pg]
      Deregister postgres services	TAGS: [rm-pg]

  play #5 (all): Remove Infrastructure	TAGS: [rm-infra]
    tasks:
      Consul leave cluster	TAGS: [rm-infra]
      Stop consul and node_exporter	TAGS: [rm-infra]

  play #6 (all): Uninstall Packages	TAGS: [rm-pkgs]
    tasks:
      Uninstall postgres and consul	TAGS: [rm-pkgs]

5.3.5 - 仅监控部署

如何单独部署Pigsty监控系统?

剧本概览

部署监控系统:可以在现有集群中创建新的用户或修改现有用户pgsql-monitor.yml

日常管理

# 在 pg-test 集群中部署监控
./pgsql-monitor.yml -l pg-test

剧本说明

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   pgsql-monitor.yml
# Ctime     :   2021-02-23
# Mtime     :   2021-02-27
# Desc      :   deploy monitor components only
# Path      :   pgsql-monitor.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# this is pgsql monitor setup playbook for MONITOR ONLY mode

# MONITOR-ONLY (monly) mode is a special deployment mode for
# integration with exterior provisioning solution or existing
# postgres clusters.
# with limited functionalities

# For monly deployment, The infra part is still the same.
# You MUST use static services discovery for prometheus
# You CAN NOT use services_registry


#------------------------------------------------------------------------------
# Deploy monitor on selected targets
#------------------------------------------------------------------------------
- name: Monitor Only Deployment
  become: yes
  hosts: all
  gather_facts: no
  tags: monitor
  roles:
    - role: monitor                         # init monitor system
  vars:
    #------------------------------------------------------------------------------
    # RECOMMEND CHANGES
    #------------------------------------------------------------------------------
    # You'd better change those options in your main config file
    # prometheus_sd_method: static          # MUST use static sd for monitor only mode
    service_registry: none                  # MUST NOT register services
    exporter_install: binary                # none|yum|binary, none by default

    # exporter_install controls how node_exporter & pg_exporter are installed
    #    none   : I've already installed manually
    #    yum    : Use yum install, `exporter_repo_url` will be added if specified
    #    binary : Copy binary to /usr/bin. You must have binary in your `files` dir

    #------------------------------------------------------------------------------
    # MONITOR PROVISION
    #------------------------------------------------------------------------------
    # - install - #
    # exporter_install: none                        # none|yum|binary, none by default
    # exporter_repo_url: ''                         # if set, repo will be added to /etc/yum.repos.d/ before yum installation

    # - collect - #
    # exporter_metrics_path: /metrics               # default metric path for pg related exporter

    # - node exporter - #
    # node_exporter_enabled: true                   # setup node_exporter on instance
    # node_exporter_port: 9100                      # default port for node exporter
    # node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'

    # - pg exporter - #
    # pg_exporter_config: pg_exporter-demo.yaml     # default config files for pg_exporter
    # pg_exporter_enabled: true                     # setup pg_exporter on instance
    # pg_exporter_port: 9630                        # default port for pg exporter
    # pg_exporter_url: ''                           # optional, if not set, generate from reference parameters

    # - pgbouncer exporter - #
    # pgbouncer exporter require pgbouncer to work, so it is disabled by default in monitor-only mode
    # pgbouncer_exporter_enabled: false             # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
    # pgbouncer_exporter_port: 9631                 # default port for pgbouncer exporter
    # pgbouncer_exporter_url: ''                    # optional, if not set, generate from reference parameters

    # - postgres variables reference - #
    # pg_dbsu: postgres
    # pg_port: 5432                                 # postgres port (5432 by default)
    # pgbouncer_port: 6432                          # pgbouncer port (6432 by default)
    # pg_localhost: /var/run/postgresql             # localhost unix socket dir for connection
    # pg_default_database: postgres                 # default database will be used as primary monitor target
    # pg_monitor_username: dbuser_monitor           # system monitor username, for postgres and pgbouncer
    # pg_monitor_password: DBUser.Monitor           # system monitor user's password
    # service_registry: consul                      # none | consul | etcd | both



#------------------------------------------------------------------------------
# update static inventory in meta node and reload
#------------------------------------------------------------------------------
- name: Update prometheus static sd files
  become: yes
  hosts: meta
  tags: prometheus
  gather_facts: no
  vars:
    #------------------------------------------------------------------------------
    # RECOMMEND CHANGES
    #------------------------------------------------------------------------------
    prometheus_sd_method: static                  # service discovery method: static|consul|etcd

  tasks:
    - include_tasks: roles/prometheus/tasks/targets.yml
    - include_tasks: roles/prometheus/tasks/reload.yml

...

使用样例

./pgsql-monitor.yml -l pg-test

执行结果

$ ./pgsql-monitor.yml -l pg-test -e pg_user=test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

PLAY [Create user in cluster] *****************************************************************************************************************************************************

TASK [Check parameter pg_user] ****************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [Fetch user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "msg": {
        "comment": "default test user for production usage",
        "name": "test",
        "password": "test",
        "pgbouncer": true,
        "roles": [
            "dbrole_readwrite"
        ]
    }
}
ok: [10.10.10.12] => {
    "msg": {
        "comment": "default test user for production usage",
        "name": "test",
        "password": "test",
        "pgbouncer": true,
        "roles": [
            "dbrole_readwrite"
        ]
    }
}
ok: [10.10.10.13] => {
    "msg": {
        "comment": "default test user for production usage",
        "name": "test",
        "password": "test",
        "pgbouncer": true,
        "roles": [
            "dbrole_readwrite"
        ]
    }
}

TASK [Check user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/postgres/tasks/monitor.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [Render user test creation sql] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Execute user test creation sql on primary] **********************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Add user to pgbouncer] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [Reload pgbouncer to add user] ***********************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11                : ok=9    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.10.10.12                : ok=7    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
10.10.10.13                : ok=7    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

任务详情

默认任务如下:

playbook: ./pgsql-monitor.yml

  play #1 (all): Monitor Only Deployment	TAGS: [monitor]
    tasks:
      monitor : Install exporter yum repo	TAGS: [exporter_install, exporter_yum_install, monitor]
      monitor : Install node_exporter and pg_exporter	TAGS: [exporter_install, exporter_yum_install, monitor]
      monitor : Copy node_exporter binary	TAGS: [exporter_binary_install, exporter_install, monitor]
      monitor : Copy pg_exporter binary	TAGS: [exporter_binary_install, exporter_install, monitor]
      monitor : Create /etc/pg_exporter conf dir	TAGS: [monitor, pg_exporter]
      monitor : Copy default pg_exporter.yaml	TAGS: [monitor, pg_exporter]
      monitor : Config /etc/default/pg_exporter	TAGS: [monitor, pg_exporter]
      monitor : Config pg_exporter service unit	TAGS: [monitor, pg_exporter]
      monitor : Launch pg_exporter systemd service	TAGS: [monitor, pg_exporter]
      monitor : Wait for pg_exporter service online	TAGS: [monitor, pg_exporter]
      monitor : Register pg-exporter consul service	TAGS: [monitor, pg_exporter_register]
      monitor : Reload pg-exporter consul service	TAGS: [monitor, pg_exporter_register]
      monitor : Config pgbouncer_exporter opts	TAGS: [monitor, pgbouncer_exporter]
      monitor : Config pgbouncer_exporter service	TAGS: [monitor, pgbouncer_exporter]
      monitor : Launch pgbouncer_exporter service	TAGS: [monitor, pgbouncer_exporter]
      monitor : Wait for pgbouncer_exporter online	TAGS: [monitor, pgbouncer_exporter]
      monitor : Register pgb-exporter consul service	TAGS: [monitor, node_exporter_register]
      monitor : Reload pgb-exporter consul service	TAGS: [monitor, node_exporter_register]
      monitor : Copy node_exporter systemd service	TAGS: [monitor, node_exporter]
      monitor : Config default node_exporter options	TAGS: [monitor, node_exporter]
      monitor : Launch node_exporter service unit	TAGS: [monitor, node_exporter]
      monitor : Wait for node_exporter online	TAGS: [monitor, node_exporter]
      monitor : Register node-exporter service to consul	TAGS: [monitor, node_exporter_register]
      monitor : Reload node-exporter consul service	TAGS: [monitor, node_exporter_register]

  play #2 (meta): Update prometheus static sd files	TAGS: [prometheus]
    tasks:
      include_tasks	TAGS: [prometheus]
      include_tasks	TAGS: [prometheus]

5.3.6 - 创建业务用户

如何在用户集群中新建或修改业务用户?

剧本概览

创建业务用户:可以在现有集群中创建新的用户或修改现有用户pgsql-createuser.yml

日常管理

# 在 pg-test 集群创建名为 test 的用户
./pgsql-createuser.yml -l pg-test -e pg_user=test

请注意,pg_user 指定的用户,必须已经存在于集群pg_users的定义中,否则会报错。这意味着用户必须先定义用户,再创建用户。

剧本说明

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   pgsql-createuser.yml
# Ctime     :   2021-02-27
# Mtime     :   2021-02-27
# Desc      :   create user on running cluster
# Path      :   pgsql-createuser.yml
# Deps      :   templates/pg-user.sql
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#


#=============================================================================#
# How to create user ?
#   1. define user in your configuration file! <cluster>.vars.pg_usesrs
#   2. execute this playbook with pg_user set to your new user.name
#   3. run playbook on target cluster
# It essentially does:
#   1. create sql file in /pg/tmp/pg-user-{{ user.name }}.sql
#   2. create user on primary instance with that sql
#   3. if {{ user.pgbouncer }}, add to all cluster members and reload
#=============================================================================#


- name: Create user in cluster
  become: yes
  hosts: all
  gather_facts: no
  vars:

    ##################################################################################
    # IMPORTANT: Change this or use cli-arg to specify target user in inventory  #
    ##################################################################################
    pg_user: test

  tasks:
    #------------------------------------------------------------------------------
    # pre-flight check: validate pg_user and user definition
    # ------------------------------------------------------------------------------
    - name: Preflight
      block:
        - name: Check parameter pg_user
          connection: local
          assert:
            that:
              - pg_user is defined
              - pg_user != ''
              - pg_user != 'postgres'
            fail_msg: variable 'pg_user' should be specified to create target user

        - name: Fetch user definition
          connection: local
          set_fact:
            pg_user_definition={{ pg_users | json_query(pg_user_definition_query) }}
          vars:
            pg_user_definition_query: "[?name=='{{ pg_user }}'] | [0]"

        # print user definition
        - debug:
            msg: "{{ pg_user_definition }}"

        - name: Check user definition
          assert:
            that:
              - pg_user_definition is defined
              - pg_user_definition != None
              - pg_user_definition != ''
              - pg_user_definition != {}
            fail_msg: user definition for {{ pg_user }} should exists in pg_users

    #------------------------------------------------------------------------------
    # Create user on cluster primary and add pgbouncer entry to cluster members
    #------------------------------------------------------------------------------
    # create user according to user definition
    - include_tasks: roles/postgres/tasks/createuser.yml
      vars:
        user: "{{ pg_user_definition }}"


    #------------------------------------------------------------------------------
    # Pgbouncer Reload (entire cluster)
    #------------------------------------------------------------------------------
    - name: Reload pgbouncer to add user
      when: pg_user_definition.pgbouncer is defined and pg_user_definition.pgbouncer|bool
      tags: pgbouncer_reload
      systemd: name=pgbouncer state=reloaded enabled=yes daemon_reload=yes


...

使用样例

./pgsql-createuser.yml -l pg-test -e pg_user=test

执行结果

$ ./pgsql-createuser.yml -l pg-test -e pg_user=test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

PLAY [Create user in cluster] *****************************************************************************************************************************************************

TASK [Check parameter pg_user] ****************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [Fetch user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "msg": {
        "comment": "default test user for production usage",
        "name": "test",
        "password": "test",
        "pgbouncer": true,
        "roles": [
            "dbrole_readwrite"
        ]
    }
}
ok: [10.10.10.12] => {
    "msg": {
        "comment": "default test user for production usage",
        "name": "test",
        "password": "test",
        "pgbouncer": true,
        "roles": [
            "dbrole_readwrite"
        ]
    }
}
ok: [10.10.10.13] => {
    "msg": {
        "comment": "default test user for production usage",
        "name": "test",
        "password": "test",
        "pgbouncer": true,
        "roles": [
            "dbrole_readwrite"
        ]
    }
}

TASK [Check user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [Render user test creation sql] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Execute user test creation sql on primary] **********************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Add user to pgbouncer] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [Reload pgbouncer to add user] ***********************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11                : ok=9    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.10.10.12                : ok=7    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
10.10.10.13                : ok=7    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

任务详情

默认任务如下:

playbook: ./pgsql-createuser.yml

  play #1 (all): Create user in cluster	TAGS: []
    tasks:
      Check parameter pg_user	TAGS: []
      Fetch user definition	TAGS: []
      debug	TAGS: []
      Check user definition	TAGS: []
      include_tasks	TAGS: []
      Reload pgbouncer to add user	TAGS: [pgbouncer_reload]

5.3.7 - 创建与修改服务

如何在数据库集群中新建或修改服务?

剧本概览

创建业务数据库:可以在现有集群中创建新的数据库或修改现有数据库pgsql-service.yml

日常管理

# 在 pg-test 集群创建所有服务
./pgsql-service.yml -l pg-test 

剧本说明

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   pgsql-service.yml
# Ctime     :   2021-03-12
# Mtime     :   2021-03-12
# Desc      :   reload service for postgres clusters
# Path      :   pgsql-service.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# PLEASE USE COMPLETE INVENTORY (at least contains a complete cluster definition!)

#------------------------------------------------------------------------------
# haproxy reload
#   will not reload if haproxy_reload=false
#------------------------------------------------------------------------------
- name: Reload haproxy
  become: yes
  hosts: all
  gather_facts: no
  tags: haproxy
  tasks:
    - include_tasks: roles/service/tasks/haproxy_config.yml
      when: haproxy_enabled
    - include_tasks: roles/service/tasks/haproxy_reload.yml
      when: haproxy_enabled and haproxy_reload|bool


#------------------------------------------------------------------------------
# l2-vip reload
#   will only config without reload if vip_reload=false
#------------------------------------------------------------------------------
- name: Reload l2 VIP
  become: yes
  hosts: all
  gather_facts: no
  tags: vip_l2
  tasks:
    - include_tasks: roles/service/tasks/vip_l2_config.yml
      when: vip_mode == 'l2'
    - include_tasks: roles/service/tasks/vip_l2_reload.yml
      when: vip_mode == 'l2' and vip_reload|bool


#------------------------------------------------------------------------------
# l4-vip reload
#   will not reload if vip_reload=false
#------------------------------------------------------------------------------
- name: Reload l4 VIP
  become: yes
  hosts: all
  gather_facts: no
  tags: vip_l4
  tasks:
    - include_tasks: roles/service/tasks/vip_l4_config.yml
    - include_tasks: roles/service/tasks/vip_l4_reload.yml

...

使用样例

./pgsql-service.yml -l pg-test 

执行结果

$ ./pgsql-service.yml -l pg-test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

PLAY [Reload haproxy] *************************************************************************************************************************************************************

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/haproxy_config.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [Fetch postgres cluster memberships] *****************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [Templating /etc/haproxy/haproxy.cfg] ****************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/haproxy_reload.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [Reload haproxy load balancer service] ***************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.11]

PLAY [Reload l2 VIP] **************************************************************************************************************************************************************

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/vip_l2_config.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [Templating /etc/default/vip-manager.yml] ************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.13]
ok: [10.10.10.12]

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/vip_l2_reload.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [Launch vip-manager] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

PLAY [Reload l4 VIP] **************************************************************************************************************************************************************

TASK [include_tasks] **************************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [include_tasks] **************************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11                : ok=9    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
10.10.10.12                : ok=9    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
10.10.10.13                : ok=9    changed=2    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0

任务详情

默认任务如下:

playbook: ./pgsql-service.yml

  play #1 (all): Reload haproxy	TAGS: [haproxy]
    tasks:
      include_tasks	TAGS: [haproxy]
      include_tasks	TAGS: [haproxy]

  play #2 (all): Reload l2 VIP	TAGS: [vip_l2]
    tasks:
      include_tasks	TAGS: [vip_l2]
      include_tasks	TAGS: [vip_l2]

  play #3 (all): Reload l4 VIP	TAGS: [vip_l4]
    tasks:
      include_tasks	TAGS: [vip_l4]
      include_tasks	TAGS: [vip_l4]

5.3.8 - 创建业务数据库

如何在数据库集群中新建或修改业务数据库?

剧本概览

创建业务数据库:可以在现有集群中创建新的数据库或修改现有数据库pgsql-createdb.yml

日常管理

# 在 pg-test 集群创建名为 test 的数据库
./pgsql-createdb.yml -l pg-test -e pg_database=test

剧本说明

#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File      :   pgsql-createdb.yml
# Ctime     :   2021-02-27
# Mtime     :   2021-02-27
# Desc      :   create database on running cluster
# Deps      :   templates/pg-db.sql
# Path      :   pgsql-createdb.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#


#=============================================================================#
# How to create database ?
#   1. define database in your configuration file! <cluster>.vars.pg_databases
#   2. execute this playbook with pg_database set to your new database.name
#   3. run playbook on target cluster
# It essentially does:
#   1. create sql file in /pg/tmp/pg-db-{{ database.name }}.sql
#   2. create database on primary instance with that sql
#   3. if {{ database.pgbouncer }}, add to all cluster members and reload
#=============================================================================#

- name: Create Database In Cluster
  become: yes
  hosts: all
  gather_facts: no
  vars:

    ##################################################################################
    # IMPORTANT: Change this or use cli-arg to specify target database in inventory  #
    ##################################################################################
    pg_database: test

  tasks:
    #------------------------------------------------------------------------------
    # pre-flight check: validate pg_database and database definition
    # ------------------------------------------------------------------------------
    - name: Preflight
      block:
        - name: Check parameter pg_database
          connection: local
          assert:
            that:
              - pg_database is defined
              - pg_database != ''
              - pg_database != 'postgres'
            fail_msg: variable 'pg_database' should be specified to create target database

        - name: Fetch database definition
          connection: local
          set_fact:
            pg_database_definition={{ pg_databases | json_query(pg_database_definition_query) }}
          vars:
            pg_database_definition_query: "[?name=='{{ pg_database }}'] | [0]"

        # print database definition
        - debug:
            msg: "{{ pg_database_definition }}"

        - name: Check database definition
          assert:
            that:
              - pg_database_definition is defined
              - pg_database_definition != None
              - pg_database_definition != ''
              - pg_database_definition != {}
            fail_msg: database definition for {{ pg_database }} should exists in pg_databases

    #------------------------------------------------------------------------------
    # Create database on cluster primary and add pgbouncer entry to cluster members
    #------------------------------------------------------------------------------
    # create database according to database definition
    - include_tasks: roles/postgres/tasks/createdb.yml
      vars:
        database: "{{ pg_database_definition }}"


    #------------------------------------------------------------------------------
    # Pgbouncer Reload (entire cluster)
    #------------------------------------------------------------------------------
    - name: Reload pgbouncer to add database
      when: pg_database_definition.pgbouncer is not defined or pg_database_definition.pgbouncer|bool
      tags: pgbouncer_reload
      systemd: name=pgbouncer state=reloaded enabled=yes daemon_reload=yes


...

使用样例

./pgsql-createdb.yml -l pg-test -e pg_database=test

执行结果

$ ./pgsql-createdb.yml -l pg-test -e pg_database=test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

PLAY [Create Database In Cluster] *************************************************************************************************************************************************

TASK [Check parameter pg_database] ************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [Fetch database definition] **************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "msg": {
        "name": "test"
    }
}
ok: [10.10.10.12] => {
    "msg": {
        "name": "test"
    }
}
ok: [10.10.10.13] => {
    "msg": {
        "name": "test"
    }
}

TASK [Check database definition] **************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/postgres/tasks/createdb.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13

TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "msg": {
        "name": "test"
    }
}
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [Render database test creation sql] ******************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Render database test baseline sql] ******************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [Execute database test creation command] *************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Execute database test creation sql] *****************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [Execute database test creation sql] *****************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [Add pgbouncer busniess database] ********************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [Reload pgbouncer to add database] *******************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11                : ok=11   changed=5    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
10.10.10.12                : ok=7    changed=2    unreachable=0    failed=0    skipped=6    rescued=0    ignored=0
10.10.10.13                : ok=7    changed=2    unreachable=0    failed=0    skipped=6    rescued=0    ignored=0

任务详情

默认任务如下:

playbook: ./pgsql-createdb.yml

  play #1 (all): Create Database In Cluster	TAGS: []
    tasks:
      Check parameter pg_database	TAGS: []
      Fetch database definition	TAGS: []
      debug	TAGS: []
      Check database definition	TAGS: []
      include_tasks	TAGS: []
      Reload pgbouncer to add database	TAGS: [pgbouncer_reload]

5.4 - 部署样例

在实际环境中部署Pigsty的几个例子

这里给出几个典型的部署样例,仅供参考。

5.4.1 - Vagrant沙箱环境

针对本地Vagrant沙箱的Pigsty配置示例

概述

这个配置文件,是Pigsty自带的沙箱环境所使用的配置文件。

Github原地址为:https://github.com/Vonng/pigsty/blob/master/pigsty.yml

该配置文件可作为一个标准的学习样例,例如使用相同规格的虚拟机环境部署时,通常只需要在这份配置文件的基础上进行极少量修改就可以直接使用:例如,将10.10.10.10替换为您的元节点IP,将10.10.10.*替换为数据库节点的IP,修改或移除 ansible_host 系列连接参数以提供正确的连接信息。就可以将Pigsty部署到一组虚拟机上了。

配置文件

---
######################################################################
# File      :   pigsty.yml
# Path      :   pigsty.yml
# Desc      :   Pigsty Configuration file
# Note      :   follow ansible inventory file format
# Ctime     :   2020-05-22
# Mtime     :   2021-03-16
# Copyright (C) 2018-2021 Ruohang Feng
######################################################################


######################################################################
#               Development Environment Inventory                    #
######################################################################
all: # top-level namespace, match all hosts


  #==================================================================#
  #                           Clusters                               #
  #==================================================================#
  # postgres database clusters are defined as kv pair in `all.children`
  # where the key is cluster name and the value is the object consist
  # of cluster members (hosts) and ad-hoc variables (vars)
  # meta node are defined in special group "meta" with `meta_node=true`

  children:

    #-----------------------------
    # meta controller
    #-----------------------------
    meta:       # special group 'meta' defines the main controller machine
      vars:
        meta_node: true                     # mark node as meta controller
        ansible_group_priority: 99          # meta group is top priority

      # nodes in meta group
      hosts: {10.10.10.10: {ansible_host: meta}}

    #-----------------------------
    # cluster: pg-meta
    #-----------------------------
    pg-meta:
      # - cluster members - #
      hosts:
        10.10.10.10: {pg_seq: 1, pg_role: primary, ansible_host: meta}

      # - cluster configs - #
      vars:
        pg_cluster: pg-meta                 # define actual cluster name
        pg_version: 13                      # define installed pgsql version
        node_tune: tiny                     # tune node into oltp|olap|crit|tiny mode
        pg_conf: tiny.yml                   # tune pgsql into oltp/olap/crit/tiny mode
        patroni_mode: pause                 # enter maintenance mode, {default|pause|remove}
        patroni_watchdog_mode: off          # disable watchdog (require|automatic|off)
        pg_lc_ctype: en_US.UTF8             # enabled pg_trgm i18n char support

        pg_users:
          # complete example of user/role definition for production user
          - name: dbuser_meta               # example production user have read-write access
            password: DBUser.Meta           # example user's password, can be encrypted
            login: true                     # can login, true by default (should be false for role)
            superuser: false                # is superuser? false by default
            createdb: false                 # can create database? false by default
            createrole: false               # can create role? false by default
            inherit: true                   # can this role use inherited privileges?
            replication: false              # can this role do replication? false by default
            bypassrls: false                # can this role bypass row level security? false by default
            connlimit: -1                   # connection limit, -1 disable limit
            expire_at: '2030-12-31'         # 'timestamp' when this role is expired
            expire_in: 365                  # now + n days when this role is expired (OVERWRITE expire_at)
            roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly
            pgbouncer: true                 # add this user to pgbouncer? false by default (true for production user)
            parameters:                     # user's default search path
              search_path: public
            comment: test user

          # simple example for personal user definition
          - name: dbuser_vonng2              # personal user example which only have limited access to offline instance
            password: DBUser.Vonng          # or instance with explict mark `pg_offline_query = true`
            roles: [dbrole_offline]         # personal/stats/ETL user should be grant with dbrole_offline
            expire_in: 365                  # expire in 365 days since creation
            pgbouncer: false                # personal user should NOT be allowed to login with pgbouncer
            comment: example personal user for interactive queries

        pg_databases:
          - name: meta                      # name is the only required field for a database
            # owner: postgres                 # optional, database owner
            # template: template1             # optional, template1 by default
            # encoding: UTF8                # optional, UTF8 by default , must same as template database, leave blank to set to db default
            # locale: C                     # optional, C by default , must same as template database, leave blank to set to db default
            # lc_collate: C                 # optional, C by default , must same as template database, leave blank to set to db default
            # lc_ctype: C                   # optional, C by default , must same as template database, leave blank to set to db default
            allowconn: true                 # optional, true by default, false disable connect at all
            revokeconn: false               # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
            # tablespace: pg_default          # optional, 'pg_default' is the default tablespace
            connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
            extensions:                     # optional, extension name and where to create
              - {name: postgis, schema: public}
            parameters:                     # optional, extra parameters with ALTER DATABASE
              enable_partitionwise_join: true
            pgbouncer: true                 # optional, add this database to pgbouncer list? true by default
            comment: pigsty meta database   # optional, comment string for database

        pg_default_database: meta           # default database will be used as primary monitor target

        # proxy settings
        vip_mode: l2                      # enable/disable vip (require members in same LAN)
        vip_address: 10.10.10.2             # virtual ip address
        vip_cidrmask: 8                     # cidr network mask length
        vip_interface: eth1                 # interface to add virtual ip


    #-----------------------------
    # cluster: pg-test
    #-----------------------------
    pg-test: # define cluster named 'pg-test'
      # - cluster members - #
      hosts:
        10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
        10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
        10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}

      # - cluster configs - #
      vars:
        # basic settings
        pg_cluster: pg-test                 # define actual cluster name
        pg_version: 13                      # define installed pgsql version
        node_tune: tiny                     # tune node into oltp|olap|crit|tiny mode
        pg_conf: tiny.yml                   # tune pgsql into oltp/olap/crit/tiny mode

        # business users, adjust on your own needs
        pg_users:
          - name: test                      # example production user have read-write access
            password: test                  # example user's password
            roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly|dbrole_offline
            pgbouncer: true                 # production user that access via pgbouncer
            comment: default test user for production usage

        pg_databases:                       # create a business database 'test'
          - name: test                      # use the simplest form

        pg_default_database: test           # default database will be used as primary monitor target

        # proxy settings
        vip_mode: l2                        # enable/disable vip (require members in same LAN)
        vip_address: 10.10.10.3             # virtual ip address
        vip_cidrmask: 8                     # cidr network mask length
        vip_interface: eth1                 # interface to add virtual ip


  #==================================================================#
  #                           Globals                                #
  #==================================================================#
  vars:

    #------------------------------------------------------------------------------
    # CONNECTION PARAMETERS
    #------------------------------------------------------------------------------
    # this section defines connection parameters

    # ansible_user: vagrant                       # admin user with ssh access and sudo privilege

    proxy_env: # global proxy env when downloading packages
      no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"
      # http_proxy: ''
      # https_proxy: ''
      # all_proxy: ''


    #------------------------------------------------------------------------------
    # REPO PROVISION
    #------------------------------------------------------------------------------
    # this section defines how to build a local repo

    # - repo basic - #
    repo_enabled: true                            # build local yum repo on meta nodes?
    repo_name: pigsty                             # local repo name
    repo_address: yum.pigsty                      # repo external address (ip:port or url)
    repo_port: 80                                 # listen address, must same as repo_address
    repo_home: /www                               # default repo dir location
    repo_rebuild: false                           # force re-download packages
    repo_remove: true                             # remove existing repos

    # - where to download - #
    repo_upstreams:
      - name: base
        description: CentOS-$releasever - Base - Aliyun Mirror
        baseurl:
          - http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
          - http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
          - http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
        gpgcheck: no
        failovermethod: priority

      - name: updates
        description: CentOS-$releasever - Updates - Aliyun Mirror
        baseurl:
          - http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
          - http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
          - http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
        gpgcheck: no
        failovermethod: priority

      - name: extras
        description: CentOS-$releasever - Extras - Aliyun Mirror
        baseurl:
          - http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
          - http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
          - http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
        gpgcheck: no
        failovermethod: priority

      - name: epel
        description: CentOS $releasever - EPEL - Aliyun Mirror
        baseurl: http://mirrors.aliyun.com/epel/$releasever/$basearch
        gpgcheck: no
        failovermethod: priority

      - name: grafana
        description: Grafana - TsingHua Mirror
        gpgcheck: no
        baseurl: https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm

      - name: prometheus
        description: Prometheus and exporters
        gpgcheck: no
        baseurl: https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch

      # consider using ZJU PostgreSQL mirror in mainland china
      - name: pgdg-common
        description: PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch
        gpgcheck: no
        # baseurl: https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-$releasever-$basearch
        baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch

      - name: pgdg13
        description: PostgreSQL 13 for RHEL/CentOS $releasever - $basearch
        gpgcheck: no
        # baseurl: https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-$releasever-$basearch
        baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch

      - name: centos-sclo
        description: CentOS-$releasever - SCLo
        gpgcheck: no
        mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo

      - name: centos-sclo-rh
        description: CentOS-$releasever - SCLo rh
        gpgcheck: no
        mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh

      - name: nginx
        description: Nginx Official Yum Repo
        skip_if_unavailable: true
        gpgcheck: no
        baseurl: http://nginx.org/packages/centos/$releasever/$basearch/

      - name: haproxy
        description: Copr repo for haproxy
        skip_if_unavailable: true
        gpgcheck: no
        baseurl: https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/

      # for latest consul & kubernetes
      - name: harbottle
        description: Copr repo for main owned by harbottle
        skip_if_unavailable: true
        gpgcheck: no
        baseurl: https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/

    # - what to download - #
    repo_packages:
      # repo bootstrap packages
      - epel-release nginx wget yum-utils yum createrepo                                      # bootstrap packages

      # node basic packages
      - ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned   # basic system util
      - readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates     # basic pg dependency
      - numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet  # system utils

      # dcs & monitor packages
      - grafana prometheus2 pushgateway alertmanager                                          # monitor and ui
      - node_exporter postgres_exporter nginx_exporter blackbox_exporter                      # exporter
      - consul consul_exporter consul-template etcd                                           # dcs

      # python3 dependencies
      - ansible python python-pip python-psycopg2 audit                                       # ansible & python
      - python3 python3-psycopg2 python36-requests python3-etcd python3-consul                # python3
      - python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography               # python3 patroni extra deps

      # proxy and load balancer
      - haproxy keepalived dnsmasq                                                            # proxy and dns

      # postgres common Packages
      - patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity               # major components
      - pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel                # other common utils
      - barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail

      # postgres 13 packages
      - postgresql13* postgis31* citus_13 timescaledb_13 # pgrouting_13                         # postgres 13 and postgis 31
      - pg_repack13 pg_squeeze13                                                                # maintenance extensions
      - pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13                        # stats extensions
      - plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13      # PL extensions                                      # pl extensions
      - hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13             # FDW extensions
      - wal2json13 count_distinct13 ddlx_13 geoip13 orafce13                                    # MISC extensions
      - rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13
      - pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13
      - pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13
      - pguint13 pguri13 prefix13  safeupdate_13 semver13  table_version13 tdigest13

    repo_url_packages:
      - https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter-0.3.2-1.el7.x86_64.rpm
      - https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm
      - http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm


    #------------------------------------------------------------------------------
    # NODE PROVISION
    #------------------------------------------------------------------------------
    # this section defines how to provision nodes
    # nodename:                                   # if defined, node's hostname will be overwritten

    # - node dns - #
    node_dns_hosts: # static dns records in /etc/hosts
      - 10.10.10.10 yum.pigsty
    node_dns_server: add                          # add (default) | none (skip) | overwrite (remove old settings)
    node_dns_servers:                             # dynamic nameserver in /etc/resolv.conf
      - 10.10.10.10
    node_dns_options:                             # dns resolv options
      - options single-request-reopen timeout:1 rotate
      - domain service.consul

    # - node repo - #
    node_repo_method: local                       # none|local|public (use local repo for production env)
    node_repo_remove: true                        # whether remove existing repo
    node_local_repo_url:                          # local repo url (if method=local, make sure firewall is configured or disabled)
      - http://yum.pigsty/pigsty.repo

    # - node packages - #
    node_packages:                                # common packages for all nodes
      - wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
      - numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
      - python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
      - python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
      - node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
    node_extra_packages:                          # extra packages for all nodes
      - patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
    node_meta_packages:                           # packages for meta nodes only
      - grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
      - dnsmasq,nginx,ansible,pgbadger,polysh

    # - node features - #
    node_disable_numa: false                      # disable numa, important for production database, reboot required
    node_disable_swap: false                      # disable swap, important for production database
    node_disable_firewall: true                   # disable firewall (required if using kubernetes)
    node_disable_selinux: true                    # disable selinux  (required if using kubernetes)
    node_static_network: true                     # keep dns resolver settings after reboot
    node_disk_prefetch: false                     # setup disk prefetch on HDD to increase performance

    # - node kernel modules - #
    node_kernel_modules:
      - softdog
      - br_netfilter
      - ip_vs
      - ip_vs_rr
      - ip_vs_rr
      - ip_vs_wrr
      - ip_vs_sh
      - nf_conntrack_ipv4

    # - node tuned - #
    node_tune: tiny                               # install and activate tuned profile: none|oltp|olap|crit|tiny
    node_sysctl_params:                           # set additional sysctl parameters, k:v format
      net.bridge.bridge-nf-call-iptables: 1       # for kubernetes

    # - node user - #
    node_admin_setup: true                        # setup an default admin user ?
    node_admin_uid: 88                            # uid and gid for admin user
    node_admin_username: admin                    # default admin user
    node_admin_ssh_exchange: true                 # exchange ssh key among cluster ?
    node_admin_pks:                               # public key list that will be installed
      - 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com'

    # - node ntp - #
    node_ntp_service: ntp                         # ntp or chrony
    node_ntp_config: true                         # overwrite existing ntp config?
    node_timezone: Asia/Shanghai                  # default node timezone
    node_ntp_servers:                             # default NTP servers
      - pool cn.pool.ntp.org iburst
      - pool pool.ntp.org iburst
      - pool time.pool.aliyun.com iburst
      - server 10.10.10.10 iburst


    #------------------------------------------------------------------------------
    # META PROVISION
    #------------------------------------------------------------------------------
    # - ca - #
    ca_method: create                             # create|copy|recreate
    ca_subject: "/CN=root-ca"                     # self-signed CA subject
    ca_homedir: /ca                               # ca cert directory
    ca_cert: ca.crt                               # ca public key/cert
    ca_key: ca.key                                # ca private key

    # - nginx - #
    nginx_upstream:
      - { name: home,          host: pigsty,   url: "127.0.0.1:3000"}
      - { name: consul,        host: c.pigsty, url: "127.0.0.1:8500" }
      - { name: grafana,       host: g.pigsty, url: "127.0.0.1:3000" }
      - { name: prometheus,    host: p.pigsty, url: "127.0.0.1:9090" }
      - { name: alertmanager,  host: a.pigsty, url: "127.0.0.1:9093" }
      - { name: haproxy,       host: h.pigsty, url: "127.0.0.1:9091" }

    # - nameserver - #
    dns_records: # dynamic dns record resolved by dnsmasq
      - 10.10.10.2  pg-meta                       # sandbox vip for pg-meta
      - 10.10.10.3  pg-test                       # sandbox vip for pg-test
      - 10.10.10.10 meta-1                        # sandbox node meta-1 (node-0)
      - 10.10.10.11 node-1                        # sandbox node node-1
      - 10.10.10.12 node-2                        # sandbox node node-2
      - 10.10.10.13 node-3                        # sandbox node node-3
      - 10.10.10.10 pigsty
      - 10.10.10.10 y.pigsty yum.pigsty
      - 10.10.10.10 c.pigsty consul.pigsty
      - 10.10.10.10 g.pigsty grafana.pigsty
      - 10.10.10.10 p.pigsty prometheus.pigsty
      - 10.10.10.10 a.pigsty alertmanager.pigsty
      - 10.10.10.10 n.pigsty ntp.pigsty
      - 10.10.10.10 h.pigsty haproxy.pigsty

    # - prometheus - #
    prometheus_data_dir: /export/prometheus/data  # prometheus data dir
    prometheus_options: '--storage.tsdb.retention=30d'
    prometheus_reload: false                      # reload prometheus instead of recreate it
    prometheus_sd_method: consul                  # service discovery method: static|consul|etcd
    prometheus_scrape_interval: 2s                # global scrape & evaluation interval
    prometheus_scrape_timeout: 1s                 # scrape timeout
    prometheus_sd_interval: 2s                    # service discovery refresh interval

    # - grafana - #
    grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
    grafana_admin_password: admin                  # default grafana admin user password
    grafana_plugin: install                        # none|install|reinstall
    grafana_cache: /www/pigsty/grafana/plugins.tar.gz # path to grafana plugins tarball
    grafana_customize: true                        # customize grafana resources
    grafana_plugins: # default grafana plugins list
      - redis-datasource
      - simpod-json-datasource
      - fifemon-graphql-datasource
      - sbueringer-consul-datasource
      - camptocamp-prometheus-alertmanager-datasource
      - ryantxu-ajax-panel
      - marcusolsson-hourly-heatmap-panel
      - michaeldmoore-multistat-panel
      - marcusolsson-treemap-panel
      - pr0ps-trackmap-panel
      - dalvany-image-panel
      - magnesium-wordcloud-panel
      - cloudspout-button-panel
      - speakyourcode-button-panel
      - jdbranham-diagram-panel
      - grafana-piechart-panel
      - snuids-radar-panel
      - digrich-bubblechart-panel
    grafana_git_plugins:
      - https://github.com/Vonng/grafana-echarts



    #------------------------------------------------------------------------------
    # DCS PROVISION
    #------------------------------------------------------------------------------
    service_registry: consul                      # where to register services: none | consul | etcd | both
    dcs_type: consul                              # consul | etcd | both
    dcs_name: pigsty                              # consul dc name | etcd initial cluster token
    dcs_servers:                                  # dcs server dict in name:ip format
      meta-1: 10.10.10.10                         # you could use existing dcs cluster
      # meta-2: 10.10.10.11                       # host which have their IP listed here will be init as server
      # meta-3: 10.10.10.12                       # 3 or 5 dcs nodes are recommend for production environment
    dcs_exists_action: clean                      # abort|skip|clean if dcs server already exists
    dcs_disable_purge: false                      # set to true to disable purge functionality for good (force dcs_exists_action = abort)
    consul_data_dir: /var/lib/consul              # consul data dir (/var/lib/consul by default)
    etcd_data_dir: /var/lib/etcd                  # etcd data dir (/var/lib/consul by default)


    #------------------------------------------------------------------------------
    # POSTGRES INSTALLATION
    #------------------------------------------------------------------------------
    # - dbsu - #
    pg_dbsu: postgres                             # os user for database, postgres by default (change it is not recommended!)
    pg_dbsu_uid: 26                               # os dbsu uid and gid, 26 for default postgres users and groups
    pg_dbsu_sudo: limit                           # none|limit|all|nopass (Privilege for dbsu, limit is recommended)
    pg_dbsu_home: /var/lib/pgsql                  # postgresql binary
    pg_dbsu_ssh_exchange: false                   # exchange ssh key among same cluster

    # - postgres packages - #
    pg_version: 13                                # default postgresql version
    pgdg_repo: false                              # use official pgdg yum repo (disable if you have local mirror)
    pg_add_repo: false                            # add postgres related repo before install (useful if you want a simple install)
    pg_bin_dir: /usr/pgsql/bin                    # postgres binary dir
    pg_packages:
      - postgresql${pg_version}*
      - postgis31_${pg_version}*
      - pgbouncer patroni pg_exporter pgbadger
      - patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity
      - python3 python3-psycopg2 python36-requests python3-etcd python3-consul
      - python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography

    pg_extensions:
      - pg_repack${pg_version} pg_qualstats${pg_version} pg_stat_kcache${pg_version} wal2json${pg_version}
      # - ogr_fdw${pg_version} mysql_fdw_${pg_version} redis_fdw_${pg_version} mongo_fdw${pg_version} hdfs_fdw_${pg_version}
      # - count_distinct${version}  ddlx_${version}  geoip${version}  orafce${version}                                   # popular features
      # - hypopg_${version}  ip4r${version}  jsquery_${version}  logerrors_${version}  periods_${version}  pg_auto_failover_${version}  pg_catcheck${version}
      # - pg_fkpart${version}  pg_jobmon${version}  pg_partman${version}  pg_prioritize_${version}  pg_track_settings${version}  pgaudit15_${version}
      # - pgcryptokey${version}  pgexportdoc${version}  pgimportdoc${version}  pgmemcache-${version}  pgmp${version}  pgq-${version}  pgquarrel pgrouting_${version}
      # - pguint${version}  pguri${version}  prefix${version}   safeupdate_${version}  semver${version}   table_version${version}  tdigest${version}



    #------------------------------------------------------------------------------
    # POSTGRES PROVISION
    #------------------------------------------------------------------------------
    # - identity - #
    # pg_cluster:                                 # [REQUIRED] cluster name (validated during pg_preflight)
    # pg_seq: 0                                   # [REQUIRED] instance seq (validated during pg_preflight)
    # pg_role: replica                            # [REQUIRED] service role (validated during pg_preflight)
    pg_hostname: false                            # overwrite node hostname with pg instance name
    pg_nodename: true                             # overwrite consul nodename with pg instance name

    # - retention - #
    # pg_exists_action, available options: abort|clean|skip
    #  - abort: abort entire play's execution (default)
    #  - clean: remove existing cluster (dangerous)
    #  - skip: end current play for this host
    # pg_exists: false                            # auxiliary flag variable (DO NOT SET THIS)
    pg_exists_action: clean
    pg_disable_purge: false                       # set to true to disable pg purge functionality for good (force pg_exists_action = abort)

    # - storage - #
    pg_data: /pg/data                             # postgres data directory
    pg_fs_main: /export                           # data disk mount point     /pg -> {{ pg_fs_main }}/postgres/{{ pg_instance }}
    pg_fs_bkup: /var/backups                      # backup disk mount point   /pg/* -> {{ pg_fs_bkup }}/postgres/{{ pg_instance }}/*

    # - connection - #
    pg_listen: '0.0.0.0'                          # postgres listen address, '0.0.0.0' by default (all ipv4 addr)
    pg_port: 5432                                 # postgres port (5432 by default)
    pg_localhost: /var/run/postgresql             # localhost unix socket dir for connection

    # - patroni - #
    # patroni_mode, available options: default|pause|remove
    #   - default: default ha mode
    #   - pause:   into maintenance mode
    #   - remove:  remove patroni after bootstrap
    patroni_mode: default                         # pause|default|remove
    pg_namespace: /pg                             # top level key namespace in dcs
    patroni_port: 8008                            # default patroni port
    patroni_watchdog_mode: automatic              # watchdog mode: off|automatic|required
    pg_conf: tiny.yml                             # user provided patroni config template path

    # - localization - #
    pg_encoding: UTF8                             # default to UTF8
    pg_locale: C                                  # default to C
    pg_lc_collate: C                              # default to C
    pg_lc_ctype: en_US.UTF8                       # default to en_US.UTF8

    # - pgbouncer - #
    pgbouncer_port: 6432                          # pgbouncer port (6432 by default)
    pgbouncer_poolmode: transaction               # pooling mode: (transaction pooling by default)
    pgbouncer_max_db_conn: 100                    # important! do not set this larger than postgres max conn or conn limit


    #------------------------------------------------------------------------------
    # POSTGRES TEMPLATE
    #------------------------------------------------------------------------------
    # - template - #
    pg_init: pg-init                              # init script for cluster template

    # - system roles - #
    pg_replication_username: replicator           # system replication user
    pg_replication_password: DBUser.Replicator    # system replication password
    pg_monitor_username: dbuser_monitor           # system monitor user
    pg_monitor_password: DBUser.Monitor           # system monitor password
    pg_admin_username: dbuser_admin               # system admin user
    pg_admin_password: DBUser.Admin               # system admin password

    # - default roles - #
    # chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
    pg_default_roles:

      # common production readonly user
      - name: dbrole_readonly                 # production read-only roles
        login: false
        comment: role for global readonly access

      # common production read-write user
      - name: dbrole_readwrite                # production read-write roles
        login: false
        roles: [dbrole_readonly]             # read-write includes read-only access
        comment: role for global read-write access

      # offline have same privileges as readonly, but with limited hba access on offline instance only
      # for the purpose of running slow queries, interactive queries and perform ETL tasks
      - name: dbrole_offline
        login: false
        comment: role for restricted read-only access (offline instance)

      # admin have the privileges to issue DDL changes
      - name: dbrole_admin
        login: false
        bypassrls: true
        comment: role for object creation
        roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]

      # dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
      - name: postgres
        superuser: true
        comment: system superuser

      # default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
      - name: replicator
        replication: true                          # for replication user
        bypassrls: true                            # logical replication require bypassrls
        roles: [pg_monitor, dbrole_readonly]       # logical replication require select privileges
        comment: system replicator

      # default replication user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
      - name: dbuser_monitor
        connlimit: 16
        comment: system monitor user
        roles: [pg_monitor, dbrole_readonly]

      # default admin user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
      - name: dbuser_admin
        bypassrls: true
        superuser: true
        comment: system admin user
        roles: [dbrole_admin]

      # default stats user, for ETL and slow queries
      - name: dbuser_stats
        password: DBUser.Stats
        comment: business offline user for offline queries and ETL
        roles: [dbrole_offline]


    # - privileges - #
    # object created by dbsu and admin will have their privileges properly set
    pg_default_privileges:
      - GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly
      - GRANT SELECT                        ON TABLES    TO dbrole_readonly
      - GRANT SELECT                        ON SEQUENCES TO dbrole_readonly
      - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
      - GRANT USAGE                         ON SCHEMAS   TO dbrole_offline
      - GRANT SELECT                        ON TABLES    TO dbrole_offline
      - GRANT SELECT                        ON SEQUENCES TO dbrole_offline
      - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline
      - GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite
      - GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite
      - GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin
      - GRANT CREATE                        ON SCHEMAS   TO dbrole_admin

    # - schemas - #
    pg_default_schemas: [monitor]                 # default schemas to be created

    # - extension - #
    pg_default_extensions:                        # default extensions to be created
      - { name: 'pg_stat_statements',  schema: 'monitor' }
      - { name: 'pgstattuple',         schema: 'monitor' }
      - { name: 'pg_qualstats',        schema: 'monitor' }
      - { name: 'pg_buffercache',      schema: 'monitor' }
      - { name: 'pageinspect',         schema: 'monitor' }
      - { name: 'pg_prewarm',          schema: 'monitor' }
      - { name: 'pg_visibility',       schema: 'monitor' }
      - { name: 'pg_freespacemap',     schema: 'monitor' }
      - { name: 'pg_repack',           schema: 'monitor' }
      - name: postgres_fdw
      - name: file_fdw
      - name: btree_gist
      - name: btree_gin
      - name: pg_trgm
      - name: intagg
      - name: intarray

    # - hba - #
    pg_offline_query: false                       # set to true to enable offline query on instance
    pg_reload: true                               # reload postgres after hba changes
    pg_hba_rules:                                 # postgres host-based authentication rules
      - title: allow meta node password access
        role: common
        rules:
          - host    all     all                         10.10.10.10/32      md5

      - title: allow intranet admin password access
        role: common
        rules:
          - host    all     +dbrole_admin               10.0.0.0/8          md5
          - host    all     +dbrole_admin               172.16.0.0/12       md5
          - host    all     +dbrole_admin               192.168.0.0/16      md5

      - title: allow intranet password access
        role: common
        rules:
          - host    all             all                 10.0.0.0/8          md5
          - host    all             all                 172.16.0.0/12       md5
          - host    all             all                 192.168.0.0/16      md5

      - title: allow local read/write (local production user via pgbouncer)
        role: common
        rules:
          - local   all     +dbrole_readonly                                md5
          - host    all     +dbrole_readonly           127.0.0.1/32         md5

      - title: allow offline query (ETL,SAGA,Interactive) on offline instance
        role: offline
        rules:
          - host    all     +dbrole_offline               10.0.0.0/8        md5
          - host    all     +dbrole_offline               172.16.0.0/12     md5
          - host    all     +dbrole_offline               192.168.0.0/16    md5

    pg_hba_rules_extra: []                        # extra hba rules (for cluster/instance overwrite)

    pgbouncer_hba_rules:                          # pgbouncer host-based authentication rules
      - title: local password access
        role: common
        rules:
          - local  all          all                                     md5
          - host   all          all                     127.0.0.1/32    md5

      - title: intranet password access
        role: common
        rules:
          - host   all          all                     10.0.0.0/8      md5
          - host   all          all                     172.16.0.0/12   md5
          - host   all          all                     192.168.0.0/16  md5

    pgbouncer_hba_rules_extra: []                 # extra pgbouncer hba rules (for cluster/instance overwrite)
    # pg_users: []                                # business users
    # pg_databases: []                            # business databases

    #------------------------------------------------------------------------------
    # MONITOR PROVISION
    #------------------------------------------------------------------------------
    # - install - #
    exporter_install: none                        # none|yum|binary, none by default
    exporter_repo_url: ''                         # if set, repo will be added to /etc/yum.repos.d/ before yum installation

    # - collect - #
    exporter_metrics_path: /metrics               # default metric path for pg related exporter

    # - node exporter - #
    node_exporter_enabled: true                   # setup node_exporter on instance
    node_exporter_port: 9100                      # default port for node exporter
    node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'

    # - pg exporter - #
    pg_exporter_config: pg_exporter-demo.yaml     # default config files for pg_exporter
    pg_exporter_enabled: true                     # setup pg_exporter on instance
    pg_exporter_port: 9630                        # default port for pg exporter
    pg_exporter_url: ''                           # optional, if not set, generate from reference parameters

    # - pgbouncer exporter - #
    pgbouncer_exporter_enabled: true              # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
    pgbouncer_exporter_port: 9631                 # default port for pgbouncer exporter
    pgbouncer_exporter_url: ''                    # optional, if not set, generate from reference parameters


    #------------------------------------------------------------------------------
    # SERVICE PROVISION
    #------------------------------------------------------------------------------
    pg_weight: 100              # default load balance weight (instance level)

    # - service - #
    pg_services:                                  # how to expose postgres service in cluster?
      # primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
      - name: primary           # service name {{ pg_cluster }}_primary
        src_ip: "*"
        src_port: 5433
        dst_port: pgbouncer     # 5433 route to pgbouncer
        check_url: /primary     # primary health check, success when instance is primary
        selector: "[]"          # select all instance as primary service candidate

      # replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
      - name: replica           # service name {{ pg_cluster }}_replica
        src_ip: "*"
        src_port: 5434
        dst_port: pgbouncer
        check_url: /read-only   # read-only health check. (including primary)
        selector: "[]"          # select all instance as replica service candidate
        selector_backup: "[? pg_role == `primary`]"   # primary are used as backup server in replica service

      # default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
      - name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
        src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
        src_port: 5436          # bind port, mandatory
        dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
        check_method: http      # health check method: only http is available for now
        check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
        check_url: /primary     # health check url path, / as default
        check_code: 200         # health check http code, 200 as default
        selector: "[]"          # instance selector
        haproxy:                # haproxy specific fields
          maxconn: 3000         # default front-end connection
          balance: roundrobin   # load balance algorithm (roundrobin by default)
          default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

      # offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
      - name: offline           # service name {{ pg_cluster }}_replica
        src_ip: "*"
        src_port: 5438
        dst_port: postgres
        check_url: /replica     # offline MUST be a replica
        selector: "[? pg_role == `offline` || pg_offline_query ]"         # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
        selector_backup: "[? pg_role == `replica` && !pg_offline_query]"  # replica are used as backup server in offline service

    pg_services_extra: []        # extra services to be added

    # - haproxy - #
    haproxy_enabled: true                         # enable haproxy among every cluster members
    haproxy_reload: true                          # reload haproxy after config
    haproxy_admin_auth_enabled: false             # enable authentication for haproxy admin?
    haproxy_admin_username: admin                 # default haproxy admin username
    haproxy_admin_password: admin                 # default haproxy admin password
    haproxy_exporter_port: 9101                   # default admin/exporter port
    haproxy_client_timeout: 3h                    # client side connection timeout
    haproxy_server_timeout: 3h                    # server side connection timeout

    # - vip - #
    vip_mode: none                                # none | l2 | l4
    vip_reload: true                              # whether reload service after config
    # vip_address: 127.0.0.1                      # virtual ip address ip (l2 or l4)
    # vip_cidrmask: 24                            # virtual ip address cidr mask (l2 only)
    # vip_interface: eth0                         # virtual ip network interface (l2 only)

...

5.4.2 - 腾讯云VPC部署

使用腾讯云VPC虚拟机部署Pigsty

本样例将基于腾讯云VPC部署Pigsty

资源准备

申请虚拟机

买几台虚拟机,如下图所示,其中11这一台作为元节点,带有公网IP,数据库节点3台,普通1核1G即可。

配置SSH远程登录

现在假设我们的管理用户名为vonng,就是我啦!现在首先配置我在元节点上到其他三台节点的ssh免密码访问。

# vonng@172.21.0.11           # meta
ssh-copy-id root@172.21.0.3   # pg-test-1
ssh-copy-id root@172.21.0.4   # pg-test-2
ssh-copy-id root@172.21.0.16  # pg-test-3
scp ~/.ssh/id_rsa.pub root@172.21.0.3:/tmp/
scp ~/.ssh/id_rsa.pub root@172.21.0.4:/tmp/
scp ~/.ssh/id_rsa.pub root@172.21.0.16:/tmp/
ssh root@172.21.0.3 'useradd vonng; mkdir -m 700 -p /home/vonng/.ssh; mv /tmp/id_rsa.pub /home/vonng/.ssh/authorized_keys; chown -R vonng /home/vonng; chmod 0600 /home/vonng/.ssh/authorized_keys;'
ssh root@172.21.0.4 'useradd vonng; mkdir -m 700 -p /home/vonng/.ssh; mv /tmp/id_rsa.pub /home/vonng/.ssh/authorized_keys; chown -R vonng /home/vonng; chmod 0600 /home/vonng/.ssh/authorized_keys;'
ssh root@172.21.0.16 'useradd vonng; mkdir -m 700 -p /home/vonng/.ssh; mv /tmp/id_rsa.pub /home/vonng/.ssh/authorized_keys; chown -R vonng /home/vonng; chmod 0600 /home/vonng/.ssh/authorized_keys;'

然后配置该用户免密码执行sudo的权限:

ssh root@172.21.0.3  "echo '%vonng ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/vonng"
ssh root@172.21.0.4  "echo '%vonng ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/vonng"
ssh root@172.21.0.16 "echo '%vonng ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/vonng"

# 校验配置是否成功
ssh 172.21.0.3 'sudo ls'
ssh 172.21.0.4 'sudo ls'
ssh 172.21.0.16 'sudo ls'

下载项目

# 从Github克隆代码
git clone https://github.com/Vonng/pigsty

# 如果您不能访问Github,也可以使用Pigsty CDN下载代码包
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/latest/pigsty.tar.gz -o pigsty.tgz && tar -xf pigsty.tgz && cd pigsty 

下载离线安装包

# 从Github Release页面下载
# https://github.com/Vonng/pigsty

# 如果您不能访问Github,也可以使用Pigsty CDN下载离线软件包
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/latest/pkg.tgz -o files/pkg.tgz

# 将离线安装包解压至元节点指定位置 (也许要sudo)
mv -rf /www/pigsty /www/pigsty-backup && mkdir -p /www/pigsty
tar -xf files/pkg.tgz --strip-component=1 -C /www/pigsty/

调整配置

我们可以基于Pigsty沙箱的配置文件进行调整。因为都是普通低配虚拟机,因此不需要任何实质配置修改,只需要修改连接参数与节点信息即可。简单的说,只要改IP地址就可以了!

现在将沙箱中的IP地址全部替换为云环境中的实际IP地址。(如果使用了L2 VIP,VIP也需要替换为合理的地址)

说明 沙箱IP 虚拟机IP
元节点 10.10.10.10 172.21.0.11
数据库节点1 10.10.10.11 172.21.0.3
数据库节点2 10.10.10.12 172.21.0.4
数据库节点3 10.10.10.13 172.21.0.16
pg-meta VIP 10.10.10.2 172.21.0.8
pg-test VIP 10.10.10.3 172.21.0.9

编辑配置文件:pigsty.yml,如果都是规格差不多的虚拟机,通常您只需要修改IP地址即可。特别需要注意的是在沙箱中我们是通过SSH Alias来连接的(诸如meta, node-1之类),记得移除所有ansible_host配置,我们将直接使用IP地址连接目标节点。

cat pigsty.yml | \
	sed 's/10.10.10.10/172.21.0.11/g' |\
	sed 's/10.10.10.11/172.21.0.3/g' |\
	sed 's/10.10.10.12/172.21.0.4/g' |\
	sed 's/10.10.10.13/172.21.0.16/g' |\
	sed 's/10.10.10.2/172.21.0.8/g' |\
	sed 's/10.10.10.3/172.21.0.9/g' |\
	sed 's/10.10.10.3/172.21.0.9/g' |\
	sed 's/, ansible_host: meta//g' |\
	sed 's/ansible_host: meta//g' |\
	sed 's/, ansible_host: node-[123]//g' |\
	sed 's/vip_interface: eth1/vip_interface: eth0/g' |\
	sed 's/vip_cidrmask: 8/vip_cidrmask: 24/g' > pigsty2.yml
mv pigsty.yml pigsty-backup.yml; mv pigsty2.yml pigsty.yml

就这?

是的,配置文件已经修改完了!我们可以看看到底修改了什么东西

$ diff pigsty.yml pigsty-backup.yml
38c38
<       hosts: {172.21.0.11: {}}
---
>       hosts: {10.10.10.10: {ansible_host: meta}}
46c46
<         172.21.0.11: {pg_seq: 1, pg_role: primary}
---
>         10.10.10.10: {pg_seq: 1, pg_role: primary, ansible_host: meta}
109,111c109,111
<         vip_address: 172.21.0.8             # virtual ip address
<         vip_cidrmask: 24                     # cidr network mask length
<         vip_interface: eth0                 # interface to add virtual ip
---
>         vip_address: 10.10.10.2             # virtual ip address
>         vip_cidrmask: 8                     # cidr network mask length
>         vip_interface: eth1                 # interface to add virtual ip
120,122c120,122
<         172.21.0.3: {pg_seq: 1, pg_role: primary}
<         172.21.0.4: {pg_seq: 2, pg_role: replica}
<         172.21.0.16: {pg_seq: 3, pg_role: offline}
---
>         10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
>         10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
>         10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}
147,149c147,149
<         vip_address: 172.21.0.9             # virtual ip address
<         vip_cidrmask: 24                     # cidr network mask length
<         vip_interface: eth0                 # interface to add virtual ip
---
>         vip_address: 10.10.10.3             # virtual ip address
>         vip_cidrmask: 8                     # cidr network mask length
>         vip_interface: eth1                 # interface to add virtual ip
326c326
<       - 172.21.0.11 yum.pigsty
---
>       - 10.10.10.10 yum.pigsty
329c329
<       - 172.21.0.11
---
>       - 10.10.10.10
393c393
<       - server 172.21.0.11 iburst
---
>       - server 10.10.10.10 iburst
417,430c417,430
<       - 172.21.0.8  pg-meta                       # sandbox vip for pg-meta
<       - 172.21.0.9  pg-test                       # sandbox vip for pg-test
<       - 172.21.0.11 meta-1                        # sandbox node meta-1 (node-0)
<       - 172.21.0.3 node-1                        # sandbox node node-1
<       - 172.21.0.4 node-2                        # sandbox node node-2
<       - 172.21.0.16 node-3                        # sandbox node node-3
<       - 172.21.0.11 pigsty
<       - 172.21.0.11 y.pigsty yum.pigsty
<       - 172.21.0.11 c.pigsty consul.pigsty
<       - 172.21.0.11 g.pigsty grafana.pigsty
<       - 172.21.0.11 p.pigsty prometheus.pigsty
<       - 172.21.0.11 a.pigsty alertmanager.pigsty
<       - 172.21.0.11 n.pigsty ntp.pigsty
<       - 172.21.0.11 h.pigsty haproxy.pigsty
---
>       - 10.10.10.2  pg-meta                       # sandbox vip for pg-meta
>       - 10.10.10.3  pg-test                       # sandbox vip for pg-test
>       - 10.10.10.10 meta-1                        # sandbox node meta-1 (node-0)
>       - 10.10.10.11 node-1                        # sandbox node node-1
>       - 10.10.10.12 node-2                        # sandbox node node-2
>       - 10.10.10.13 node-3                        # sandbox node node-3
>       - 10.10.10.10 pigsty
>       - 10.10.10.10 y.pigsty yum.pigsty
>       - 10.10.10.10 c.pigsty consul.pigsty
>       - 10.10.10.10 g.pigsty grafana.pigsty
>       - 10.10.10.10 p.pigsty prometheus.pigsty
>       - 10.10.10.10 a.pigsty alertmanager.pigsty
>       - 10.10.10.10 n.pigsty ntp.pigsty
>       - 10.10.10.10 h.pigsty haproxy.pigsty
442c442
<     grafana_url: http://admin:admin@172.21.0.11:3000 # grafana url
---
>     grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
478,480c478,480
<       meta-1: 172.21.0.11                         # you could use existing dcs cluster
<       # meta-2: 172.21.0.3                       # host which have their IP listed here will be init as server
<       # meta-3: 172.21.0.4                       # 3 or 5 dcs nodes are recommend for production environment
---
>       meta-1: 10.10.10.10                         # you could use existing dcs cluster
>       # meta-2: 10.10.10.11                       # host which have their IP listed here will be init as server
>       # meta-3: 10.10.10.12                       # 3 or 5 dcs nodes are recommend for production environment
692c692
<           - host    all     all                         172.21.0.11/32      md5
---
>           - host    all     all                         10.10.10.10/32      md5

执行剧本

您可以使用同样的 沙箱初始化 来完成 基础设施和数据库集群的初始化。

其输出结果除了IP地址,与沙箱并无区别。参考输出

访问Demo

现在,您可以通过公网IP访问元节点上的服务了!请注意做好信息安全工作。

与沙箱环境不同的是,如果您需要从公网访问Pigsty管理界面,需要自己把定义的域名写入/etc/hosts中,或者使用真正申请的域名。

否则就只能通过IP端口直连的方式访问,例如: http://<meta_node_public_ip>:3000

Nginx监听的域名可以通过可以通过 nginx_upstream 选项。

nginx_upstream:
  - { name: home,          host: pigsty.cc,   url: "127.0.0.1:3000"}
  - { name: consul,        host: c.pigsty.cc, url: "127.0.0.1:8500" }
  - { name: grafana,       host: g.pigsty.cc, url: "127.0.0.1:3000" }
  - { name: prometheus,    host: p.pigsty.cc, url: "127.0.0.1:9090" }
  - { name: alertmanager,  host: a.pigsty.cc, url: "127.0.0.1:9093" }
  - { name: haproxy,       host: h.pigsty.cc, url: "127.0.0.1:9091" }

5.4.3 - 生产环境部署

基于高规格硬件执行生产环境部署

本样例将基于一个真实生产环境作为样例。

该环境包括了200台高规格 x86 物理机:Dell R740 64核CPU / 400GB内存 / 4TB PCI-E SSD / 双万兆网卡

资源准备

调整配置

执行剧本

访问服务

5.4.4 - 集成阿里云MyBase

如何单独部署Pigsty监控系统,监控阿里云针MyBase for PostgreSQL

Pigsty内置了数据库供给方案,但也可以单纯作为监控系统与外部供给方案集成,例如阿里云MyBase for PostgreSQL。

与外部系统集成时,用户只需要部署一个元节点,用于设置监控基础设施。同时在监控目标机器上,需要安装Node Exporter与PG Exporter采集指标。

Pigsty提供了静态服务发现机制与Exporter二进制部署模式,以减少对外部系统的侵入。

下面将以一个实际例子介绍如何使用Pigsty监控阿里云MyBase。

资源申请

部署监控基础设施

部署监控Exporter

管理实例身份

更新实例列表

5.5 - 仅监控部署

如何将Pigsty与外部供给方案相集成,只使用Pigsty的监控系统部分。

如果用户只希望使用Pigsty的监控系统部分,比如希望使用Pigsty监控系统监控已有的PostgreSQL实例,那么可以使用 仅监控部署(monitor only) 模式。

仅监控模式的部署流程与标准模式大体上保持一致,但省略了很多步骤

  • 元节点上完成基础设施初始化的部分,与标准流程一致
  • 修改配置文件,在仅监控模式中,通常只需要修改监控系统部分的参数。
  • 使用专用的剧本在数据库节点上完成仅监控部署./pgsql-monitor.yml

部署说明

监控用户

Pigsty在 PG供给 的阶段会创建监控用户,仅监控模式跳过了这些步骤,因此用户需要自行创建用于监控的用户。

用户需要自行在目标数据库集群上创建监控用户,并创建重要的监控模式与扩展(只有pg_stat_statements是必选项)。在待监控数据库实例上执行以下SQL以创建监控用户。

-- 创建监控用户
CREATE USER "dbuser_monitor" ;
ALTER ROLE "dbuser_monitor" PASSWORD 'DBUser.Monitor';
ALTER USER "dbuser_monitor" CONNECTION LIMIT 16;
GRANT "pg_monitor" TO "dbuser_monitor";
GRANT "dbrole_readonly" TO "dbuser_monitor";

-- 创建监控模式与扩展
CREATE SCHEMA IF NOT EXISTS monitor;
GRANT USAGE ON SCHEMA monitor TO "dbuser_monitor";
CREATE EXTENSION IF NOT EXISTS "pg_stat_statements" WITH SCHEMA "monitor";

-- 额外的监控函数,用于监控共享内存指标,只有PG13及以上版本才需要。
CREATE OR REPLACE FUNCTION monitor.pg_shmem() RETURNS SETOF
    pg_shmem_allocations AS $$ SELECT * FROM pg_shmem_allocations;$$ LANGUAGE SQL SECURITY DEFINER;
COMMENT ON FUNCTION monitor.pg_shmem() IS 'security wrapper for pg_shmem';

监控连接串

默认情况下,Pigsty会尝试使用以下规则生成数据库与连接池的连接串。

PG_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pg_port }}/{{ pg_default_database }}?host={{ pg_localhost }}&sslmode=disable'
PGBOUNCER_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pgbouncer_port }}/pgbouncer?host={{ pg_localhost }}&sslmode=disable'

如果用户使用的监控角色连接串无法通过该规则生成,则可以使用以下参数直接配置数据库与连接池的连接信息:

作为样例,沙箱环境中元节点连接至数据库的连接串为:

PG_EXPORTER_URL='postgres://dbuser_monitor:DBUser.Monitor@:5432/meta?host=/var/run/postgresql&sslmode=disable'

懒人方案

如果不怎么关心安全性与权限,也可以直接使用dbsu ident认证的方式,例如postgres用户进行监控。

pg_exporter 默认以 dbsu 的用户执行,如果允许dbsu通过本地ident认证免密访问数据库(Pigsty默认配置),则可以直接使用超级用户监控数据库。Pigsty非常不推荐这种部署方式,但它确实很方便,既不用创建新用户,也不用配置权限。

PG_EXPORTER_URL='postgres:///postgres?host=/var/run/postgresql&sslmode=disable'

相关参数

使用仅监控部署时,只会用到Pigsty参数的一个子集。

基础设施部分

基础设施与元节点仍然与常规部署保持一致,除了以下两个参数必须强制使用指定的配置选项。

service_registry: none            # 须关闭服务注册,因为目标环境可能没有DCS基础设施。
prometheus_sd_method: static      # 须使用静态文件服务发现,因为目标实例可能并没有使用服务发现与服务注册

目标节点部分

目标节点的身份参数仍然为必选项,除此之外,通常只有监控系统参数需要调整。

---
#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
exporter_install: none                        # none|yum|binary, none by default
exporter_repo_url: ''                         # if set, repo will be added to /etc/yum.repos.d/ before yum installation

# - collect - #
exporter_metrics_path: /metrics               # default metric path for pg related exporter

# - node exporter - #
node_exporter_enabled: true                   # setup node_exporter on instance
node_exporter_port: 9100                      # default port for node exporter
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'

# - pg exporter - #
pg_exporter_config: pg_exporter-demo.yaml     # default config files for pg_exporter
pg_exporter_enabled: true                     # setup pg_exporter on instance
pg_exporter_port: 9630                        # default port for pg exporter
pg_exporter_url: ''                           # optional, if not set, generate from reference parameters

# - pgbouncer exporter - #
pgbouncer_exporter_enabled: true              # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
pgbouncer_exporter_port: 9631                 # default port for pgbouncer exporter
pgbouncer_exporter_url: ''                    # optional, if not set, generate from reference parameters

# - postgres variables reference - #
pg_dbsu: postgres
pg_port: 5432                                 # postgres port (5432 by default)
pgbouncer_port: 6432                          # pgbouncer port (6432 by default)
pg_localhost: /var/run/postgresql             # localhost unix socket dir for connection
pg_default_database: postgres                 # default database will be used as primary monitor target
pg_monitor_username: dbuser_monitor           # system monitor username, for postgres and pgbouncer
pg_monitor_password: DBUser.Monitor           # system monitor user's password
service_registry: consul                      # none | consul | etcd | both
...

通常来说,需要调整的参数包括:

exporter_install: binary          # none|yum|binary 建议使用拷贝二进制的方式安装Exporter
pgbouncer_exporter_enabled: false # 如果目标实例没有关联的Pgbouncer实例,则需关闭Pgbouncer监控
pg_exporter_url: ''               # 连接至 Postgres  的URL,如果不采用默认的URL拼合规则,则可使用此参数
pgbouncer_exporter_url: ''        # 连接至 Pgbouncer 的URL,如果不采用默认的URL拼合规则,则可使用此参数

局限性

Pigsty监控系统 与 Pigsty供给方案 配合紧密,原装的总是最好的。尽管Pigsty并不推荐拆分使用,但这样做确实是可行的,只是存在一些局限性。

指标缺失

Pigsty会集成多种来源的指标,包括机器节点,数据库,Pgbouncer连接池,Haproxy负载均衡器。如果用户自己的供给方案中缺少这些组件,则相应指标也会发生缺失。

通常Node与PG的监控指标总是存在,而PGbouncer与Haproxy的缺失通常会导致100~200个不等的指标损失。

特别是,Pgbouncer监控指标中包含极其重要的PG QPS,TPS,RT,而这些指标是无法从PostgreSQL本身获取的。

工作假设

Pigsty监控系统 如果要与外部供给方案配合,监控已有数据库集群,需要一些工作假设

  • 数据库采用独占式部署,与节点存在一一对应关系。只有这样,节点指标才能有意义地与数据库指标关联。
  • 目标节点可以被Ansible管理(NOPASS SSH与NOPASS SUDO),一些云厂商RDS产品并不允许这样做。
  • 数据库需要创建可用于访问监控指标的监控用户,安装必须的监控模式与扩展,并合理配置其访问控制权限。

服务发现

外部供给方案通常拥有自己的身份管理机制,因此Pigsty不会越俎代庖地部署DCS用于服务发现。这意味着用户只能采用 静态配置文件 的方式管理监控对象的身份,通常这并不是一个问题。

在Pigsty沙箱中,当实例的角色身份发生变化时,系统会通过回调函数与反熵过程及时修正实例的角色信息,如将primary修改为replica,将其他角色修改为primary

pg_up{cls="pg-meta", ins="pg-meta-1", instance="10.10.10.10:9630", ip="10.10.10.10", job="pg", role="primary", svc="pg-meta-primary"}

但与外部供给方案集成时,除非用户显式通知或回调 监控系统,根据最新角色定义生成配置文件,否则监控系统无法意识到主从发生了切换。上面的样例监控指标中,rolesvc标签会因为不及时的角色调整受到影响,这意味着Service级别的监控数据准确性会受到影响(即pg:svc:*系列指标,例如服务的QPS)。但其他层次的监控指标与图表不受主从切换影响,因此影响不大,且有其他办法解决。

管理权限

Pigsty的监控指标依赖 node_exporterpg_exporter 获取。

尽管pg_exporter可以采用exporter拉取远程数据库实例信息的方式部署,但node_exporter必须部署在数据库所属的节点上。

这意味着,用户必须拥有数据库所在机器的SSH登陆与sudo权限才能完成部署。换句话说,目标节点必须可以被Ansible纳入管理,而云厂商RDS通常不会给出此类权限。

6 - 配置

Pigsty提供的配置参数与定制选项

Pigsty采用声明式配置:用户配置描述状态,而Pigsty负责将真实组件调整至所期待的状态。

Pigsty配置文件遵循Ansible规则,采用YAML格式,详见配置文件

Pigsty包含了168个配置项,分为十类五级,详见配置项

绝大多数配置参数无需修改,可直接使用默认值;定义新数据库集群只有三个必选身份参数

No 类目 英文 大类 数量 功能
1 连接参数 connect 基础设施 1 代理服务器配置,管理对象的连接信息
2 本地仓库 repo 基础设施 11 定制本地Yum源,离线安装包
3 节点供给 node 基础设施 30 在普通节点上配置基础设施
4 基础设施 meta 基础设施 23 在元节点上安装启用基础设施服务
5 元数据库 dcs 基础设施 8 在所有节点上配置DCS服务(consul/etcd)
6 PG安装 pg-install 数据库 11 安装PostgreSQL数据库
7 PG供给 pg-provision 数据库 30 拉起PostgreSQL数据库集群
8 PG模板 pg-template 数据库 19 定制PostgreSQL数据库内容
9 监控系统 monitor 数据库 18 安装Pigsty数据库监控系统
10 服务供给 service 数据库 17 通过Haproxy或VIP对外暴露数据库服务

6.1 - 配置文件

Pigsty配置文件的结构,内容,合并与拆分方式。

Pigsty配置文件遵循Ansible规则,采用YAML格式,默认使用单一配置文件,参考范例

Pigsty的配置文件默认为 pigsty.yml ,配置文件需要与Ansible 配合使用,这是一个流行的DevOps工具。

用户可以在当前目录的 ansible.cfg 中指定默认配置文件路径,或在执行剧本时通过命令行参数:-i pigsty.yml 的方式显式指定配置文件路径。

配置文件结构

Pigsty的配置文件采用Ansible YAML Inventory格式,顶层结构如下:


all:                      # 顶层对象 all
  vars: <123 keys>        # 全局配置 all.vars
  children:               # 分组定义:all.children 每一个项目定义了一个数据库集群 
    meta: <2 keys>...
    pg-meta: <2 keys>...
    pg-test: <2 keys>...  # 一个具体的数据库集群 pg-test 的详细定义
...

每一个具体的数据库集群,以Ansible Group的形式存在,如下所示:

pg-test:                 # 数据库集群名称默认作为群组名称
  vars:                  # 数据库集群级别变量
    pg_cluster: pg-test  # 一个定义在集群级别的必选配置项,在整个pg-test中保持一致。 
  hosts:                 # 数据库集群成员
    10.10.10.11: {pg_seq: 1, pg_role: primary} # 数据库实例成员
    10.10.10.12: {pg_seq: 2, pg_role: replica} # 必须定义身份参数 pg_role 与 pg_seq
    10.10.10.13: {pg_seq: 3, pg_role: offline} # 可以在此指定实例级别的变量

配置项

在Pigsty的配置文件中,配置项 可以出现在三种位置:

层级 范围 优先级 说明 位置
Global 全局 在同一套部署环境内一致 all.vars.xxx
Cluster 集群 在同一套集群内保持一致 all.children.<cls>.vars.xxx
Instance 实例 最细粒度的配置层次 all.children.<cls>.hosts.<ins>.xxx

每一个配置项都由一对键值组成。键是配置项的名称,值是配置项的内容。值的类型各异,详情请参考 配置项

集群vars中定义的配置项会以同名键覆盖的方式覆盖全局配置项实例中定义的配置项又会覆盖集群配置项与全局配置项。因此用户可以有的放矢,可以在不同层次,不同粒度上针对具体集群与具体实例进行精细配置。

分立式配置文件

有时候用户希望采用每个数据库集群一个配置文件的方式使用Pigsty,而不是共用一个巨大的配置清单。

这样做的好处是如果发生误操作,影响范围会局限在这个集群中,避免全局恶性事件。例如,下线某个集群时,错误地指定执行范围,有可能产生误删整个环境中所有数据库。

用户可以使用任何满足Ansible规则与和Pigsty变量层次语义的配置方式,但Pigsty推荐采用以下形式的配置文件拆分规则:

  • group_vars/all.yml : 在这里定义所有全局变量
  • group_vars/<pg_cluster>.yml :在这里定义数据库集群<pg_cluster>的集群变量。
  • pgsql/<pg_cluster>.yml:在这里定义数据库集群<pg_cluster>的实例成员,以及实例变量。
  • host_vars/<pg_instance>.yml:如果单个实例的配置项非常复杂,可在此列为独立配置文件。

采用分立式配置文件的Pigsty沙箱目录结构如下所示:

pigsty
 |
 ^- group_vars               # 全局/集群 配置项定义 (此目录名称固定)
 |     ^------ all.yml       # 全局配置项
 |     ^------ meta.yml      # 元节点配置项
 |     ^------ pg-meta.yml   # pg-meta集群配置项     (覆盖全局定义)
 |     ^------ pg-test.yml   # pg-test集群配置项     (覆盖全局定义)
 |     ^------ <cluster>.yml # <pg_cluster>集群配置项(覆盖全局定义)
 |
 ^- host_vars                # 【可选】抽离实例级变量定义
 |.    ^------ 10.10.10.10.  # 定义了10.10.10.10的实例级配置项 (覆盖全局/集群配置项定义)
 |
 ^- pgsql                    # 集群成员定义/实例级配置项(此目录名称随意)
       ^------ pg-meta.yml   # pg-meta成员与实例配置项(覆盖全局/集群配置项定义)
       ^------ pg-test.yml   # pg-test成员与实例配置项(覆盖全局/集群配置项定义)
       ^------ <cluster>.yml # <pg_cluster>成员与实例配置项(覆盖全局/集群配置项定义)

6.2 - 配置项

介绍Pigsty中的配置项及其分类

配置项分类

Pigsty的配置项总计168个,按照领域分为以下10大类。

No 类目 英文 大类 数量 功能
1 连接参数 connect 基础设施 1 代理服务器配置,管理对象的连接信息
2 本地仓库 repo 基础设施 1 定制本地Yum源,离线安装包
3 节点供给 node 基础设施 30 在普通节点上配置基础设施
4 基础设施 meta 基础设施 23 在元节点上安装启用基础设施服务
5 元数据库 dcs 基础设施 8 在所有节点上配置DCS服务(consul/etcd)
6 PG安装 pg-install 数据库 11 安装PostgreSQL数据库
7 PG供给 pg-provision 数据库 31 拉起PostgreSQL数据库集群
8 PG模板 pg-template 数据库 19 定制PostgreSQL数据库内容
9 监控系统 monitor 数据库 18 安装Pigsty数据库监控系统
10 服务供给 service 数据库 17 通过Haproxy或VIP对外暴露数据库服务

配置项粒度

Pigsty的参数可以在不同的粒度进行配置。

Pigsty默认提供三种粒度:全局集群实例

在Pigsty的配置文件中,配置项 可以出现在三种位置。

粒度 范围 优先级 说明 位置
Global 全局 在同一套部署环境内一致 all.vars.xxx
Cluster 集群 在同一套集群内保持一致 all.children.<cls>.vars.xxx
Instance 实例 最细粒度的配置层次 all.children.<cls>.hosts.<ins>.xxx

每一个配置项都由一对键值组成。键是配置项的名称,值是配置项的内容。值的类型各异

集群vars中定义的配置项会以同名键覆盖的方式覆盖全局配置项实例中定义的配置项又会覆盖集群配置项与全局配置项。因此用户可以有的放矢,可以在不同层次,不同粒度上针对具体集群与具体实例进行精细配置。

除了配置项粒度中指定的三种配置粒度,Pigsty配置项目中还有两种额外的优先级。

  • 默认:当一个配置项在全局/集群/实例级别都没有出现时,将使用默认配置项。默认值的优先级最低,所有配置项都有默认值。
  • 参数:当用户通过命令行传入参数时,参数指定的配置项具有最高优先级,将覆盖一切层次的配置。一些配置项只能通过命令行参数的方式指定与使用。
层级 来源 优先级 说明 位置
Default 默认 最低 代码逻辑定义的默认值 roles/<role>/default/main.yml
Global 全局 在同一套部署环境内一致 all.vars.xxx
Cluster 集群 在同一套集群内保持一致 all.children.<cls>.vars.xxx
Instance 实例 最细粒度的配置层次 all.children.<cls>.hosts.<ins>.xxx
Argument 参数 最高 通过命令行参数传入 -e

配置项列表

类目 名称 类型 层级 说明
连接参数 proxy_env dict G 代理服务器配置
本地仓库 repo_enabled bool G 是否启用本地源
本地仓库 repo_name string G 本地源名称
本地仓库 repo_address string G 本地源外部访问地址
本地仓库 repo_port number G 本地源端口
本地仓库 repo_home string G 本地源文件根目录
本地仓库 repo_rebuild bool A 是否重建Yum源
本地仓库 repo_remove bool A 是否移除已有Yum源
本地仓库 repo_upstreams object[] G Yum源的上游来源
本地仓库 repo_packages string[] G Yum源需下载软件列表
本地仓库 repo_url_packages string[] G 通过URL直接下载的软件
节点供给 nodename string I 若指定,覆盖机器HOSTNAME
节点供给 node_dns_hosts string[] G 写入机器的静态DNS解析
节点供给 node_dns_server enum G 如何配置DNS服务器?
节点供给 node_dns_servers string[] G 配置动态DNS服务器
节点供给 node_dns_options string[] G 配置/etc/resolv.conf
节点供给 node_repo_method enum G 节点使用Yum源的方式
节点供给 node_repo_remove bool G 是否移除节点已有Yum源
节点供给 node_local_repo_url string[] G 本地源的URL地址
节点供给 node_packages string[] G 节点安装软件列表
节点供给 node_extra_packages string[] C/I/A 节点额外安装的软件列表
节点供给 node_meta_packages string[] G 元节点所需的软件列表
节点供给 node_disable_numa bool G 关闭节点NUMA
节点供给 node_disable_swap bool G 关闭节点SWAP
节点供给 node_disable_firewall bool G 关闭节点防火墙
节点供给 node_disable_selinux bool G 关闭节点SELINUX
节点供给 node_static_network bool G 是否使用静态DNS服务器
节点供给 node_disk_prefetch string G 是否启用磁盘预读
节点供给 node_kernel_modules string[] G 启用的内核模块
节点供给 node_tune enum G 节点调优模式
节点供给 node_sysctl_params dict G 操作系统内核参数
节点供给 node_admin_setup bool G 是否创建管理员用户
节点供给 node_admin_uid number G 管理员用户UID
节点供给 node_admin_username string G 管理员用户名
节点供给 node_admin_ssh_exchange bool G 在实例间交换管理员SSH密钥
节点供给 node_admin_current_pk bool A 将当前用户的公钥加入管理员账户
节点供给 node_admin_pks string[] G 可登陆管理员的公钥列表
节点供给 node_ntp_service enum G NTP服务类型:ntp或chrony
节点供给 node_ntp_config bool G 是否配置NTP服务?
节点供给 node_timezone string G NTP时区设置
节点供给 node_ntp_servers string[] G NTP服务器列表
基础设施 ca_method enum G CA的创建方式
基础设施 ca_subject string G 自签名CA主题
基础设施 ca_homedir string G CA证书根目录
基础设施 ca_cert string G CA证书
基础设施 ca_key string G CA私钥名称
基础设施 nginx_upstream object[] G Nginx上游服务器
基础设施 dns_records string[] G 动态DNS解析记录
基础设施 prometheus_data_dir string G Prometheus数据库目录
基础设施 prometheus_options string G Prometheus命令行参数
基础设施 prometheus_reload bool A Reload而非Recreate
基础设施 prometheus_sd_method enum G 服务发现机制:static|consul
基础设施 prometheus_scrape_interval interval G Prom抓取周期
基础设施 prometheus_scrape_timeout interval G Prom抓取超时
基础设施 prometheus_sd_interval interval G Prom服务发现刷新周期
基础设施 grafana_url string G Grafana地址
基础设施 grafana_admin_password string G Grafana管理员密码
基础设施 grafana_plugin enum G 如何安装Grafana插件
基础设施 grafana_cache string G Grafana插件缓存地址
基础设施 grafana_customize bool G 是否定制Grafana
基础设施 grafana_plugins string[] G 安装的Grafana插件列表
基础设施 grafana_git_plugins string[] G 从Git安装的Grafana插件
基础设施 loki_clean bool A 是否在安装Loki时清理数据库目录
基础设施 loki_data_dir string G Loki的数据目录
元数据库 service_registry enum G/C/I 服务注册的位置
元数据库 dcs_type enum G 使用的DCS类型
元数据库 dcs_name string G DCS集群名称
元数据库 dcs_servers dict G DCS服务器名称:IP列表
元数据库 dcs_exists_action enum G/A 若DCS实例存在如何处理
元数据库 dcs_disable_purge bool G/C/I 完全禁止清理DCS实例
元数据库 consul_data_dir string G Consul数据目录
元数据库 etcd_data_dir string G Etcd数据目录
PG安装 pg_dbsu string G/C PG操作系统超级用户
PG安装 pg_dbsu_uid number G/C 超级用户UID
PG安装 pg_dbsu_sudo enum G/C 超级用户的Sudo权限
PG安装 pg_dbsu_home string G/C 超级用户的家目录
PG安装 pg_dbsu_ssh_exchange bool G/C 是否交换超级用户密钥
PG安装 pg_version string G/C 安装的数据库大版本
PG安装 pgdg_repo bool G/C 是否添加PG官方源?
PG安装 pg_add_repo bool G/C 是否添加PG相关源?
PG安装 pg_bin_dir string G/C PG二进制目录
PG安装 pg_packages string[] G/C 安装的PG软件包列表
PG安装 pg_extensions string[] G/C 安装的PG插件列表
PG供给 pg_cluster string C PG数据库集群名称
PG供给 pg_seq number I PG数据库实例序号
PG供给 pg_role enum I PG数据库实例角色
PG供给 pg_hostname bool G/C 将PG实例名称设为HOSTNAME
PG供给 pg_nodename bool G/C 将PG实例名称设为Consul节点名
PG供给 pg_exists bool A 标记位,PG是否已存在
PG供给 pg_exists_action enum G/A PG存在时如何处理
PG供给 pg_disable_purge enum G/C/I 禁止清除存在的PG实例
PG供给 pg_data string G PG数据目录
PG供给 pg_fs_main string G PG主数据盘挂载点
PG供给 pg_fs_bkup path G PG备份盘挂载点
PG供给 pg_listen ip G PG监听的IP地址
PG供给 pg_port number G PG监听的端口
PG供给 pg_localhost string G/C PG使用的UnixSocket地址
PG供给 pg_upstream string I 实例的复制上游节点
PG供给 pg_backup bool I 是否在实例上存储备份
PG供给 pg_delay interval I 若实例为延迟从库,采用的延迟时长
PG供给 patroni_mode enum G/C Patroni配置模式
PG供给 pg_namespace string G/C Patroni使用的DCS命名空间
PG供给 patroni_port string G/C Patroni服务端口
PG供给 patroni_watchdog_mode enum G/C Patroni Watchdog模式
PG供给 pg_conf enum G/C Patroni使用的配置模板
PG供给 pg_encoding string G/C PG字符集编码
PG供给 pg_locale enum G/C PG使用的本地化规则
PG供给 pg_lc_collate enum G/C PG使用的本地化排序规则
PG供给 pg_lc_ctype enum G/C PG使用的本地化字符集定义
PG供给 pgbouncer_port number G/C Pgbouncer端口
PG供给 pgbouncer_poolmode enum G/C Pgbouncer池化模式
PG供给 pgbouncer_max_db_conn number G/C Pgbouncer最大单DB连接数
PG模板 pg_init string G/C 自定义PG初始化脚本
PG模板 pg_replication_username string G PG复制用户
PG模板 pg_replication_password string G PG复制用户的密码
PG模板 pg_monitor_username string G PG监控用户
PG模板 pg_monitor_password string G PG监控用户密码
PG模板 pg_admin_username string G PG管理用户
PG模板 pg_admin_password string G PG管理用户密码
PG模板 pg_default_roles role[] G 默认创建的角色与用户
PG模板 pg_default_privilegs string[] G 数据库默认权限配置
PG模板 pg_default_schemas string[] G 默认创建的模式
PG模板 pg_default_extensions extension[] G 默认安装的扩展
PG模板 pg_offline_query bool I 是否允许离线查询
PG模板 pg_reload bool A 是否重载数据库配置(HBA)
PG模板 pg_hba_rules rule[] G 全局HBA规则
PG模板 pg_hba_rules_extra rule[] C/I 集群/实例特定的HBA规则
PG模板 pgbouncer_hba_rules rule[] G/C Pgbouncer全局HBA规则
PG模板 pgbouncer_hba_rules_extra rule[] G/C Pgbounce特定HBA规则
PG模板 pg_databases database[] G/C 业务数据库定义
PG模板 pg_users user[] G/C 业务用户定义
监控系统 exporter_install enum G/C 安装监控组件的方式
监控系统 exporter_repo_url string G/C 监控组件的YumRepo
监控系统 exporter_metrics_path string G/C 监控暴露的URL Path
监控系统 node_exporter_enabled bool G/C 启用节点指标收集器
监控系统 node_exporter_port number G/C 节点指标暴露端口
监控系统 node_exporter_options string G/C 节点指标采集选项
监控系统 pg_exporter_config string G/C PG指标定义文件
监控系统 pg_exporter_enabled bool G/C 启用PG指标收集器
监控系统 pg_exporter_port number G/C PG指标暴露端口
监控系统 pg_exporter_url string G/C 采集对象数据库的连接串(覆盖)
监控系统 pgbouncer_exporter_enabled bool G/C 启用PGB指标收集器
监控系统 pgbouncer_exporter_port number G/C PGB指标暴露端口
监控系统 pgbouncer_exporter_url string G/C 采集对象连接池的连接串
监控系统 promtail_enabled bool G/C 是否启用Promtail日志收集服务
监控系统 promtail_clean bool G/C/A 是否在安装promtail时移除已有状态信息
监控系统 promtail_port number G/C promtail使用的默认端口
监控系统 promtail_status_path string G/C 保存Promtail状态信息的文件位置
监控系统 promtail_send_url string G/C 用于接收日志的loki服务endpoint
服务供给 pg_weight number I 实例在负载均衡中的相对权重
服务供给 pg_services service[] G 全局通用服务定义
服务供给 pg_services_extra service[] C 集群专有服务定义
服务供给 haproxy_enabled bool G/C/I 是否启用Haproxy
服务供给 haproxy_reload bool A 是否重载Haproxy配置
服务供给 haproxy_admin_auth_enabled bool G/C 是否对Haproxy管理界面启用认证
服务供给 haproxy_admin_username string G/C HAproxy管理员名称
服务供给 haproxy_admin_password string G/C HAproxy管理员密码
服务供给 haproxy_exporter_port number G/C HAproxy指标暴露器端口
服务供给 haproxy_client_timeout interval G/C HAproxy客户端超时
服务供给 haproxy_server_timeout interval G/C HAproxy服务端超时
服务供给 vip_mode enum G/C VIP模式:none
服务供给 vip_reload bool G/C 是否重载VIP配置
服务供给 vip_address string G/C 集群使用的VIP地址
服务供给 vip_cidrmask number G/C VIP地址的网络CIDR掩码
服务供给 vip_interface string G/C VIP使用的网卡

6.3 - 连接参数

Pigsty中与连接、代理有关的参数

参数概览

名称 类型 层级 说明
proxy_env dict G 代理服务器配置

参数详解

proxy_env

在某些受到“互联网封锁”的地区,有些软件的下载会受到影响。

例如,从中国大陆访问PostgreSQL的官方源,下载速度可能只有几KB每秒。但如果使用了合适的HTTP代理,则可以达到几MB每秒。因此如果用户有代理服务器,请通过proxy_env进行配置,样例如下:

proxy_env: # global proxy env when downloading packages
  http_proxy: 'http://username:password@proxy.address.com'
  https_proxy: 'http://username:password@proxy.address.com'
  all_proxy: 'http://username:password@proxy.address.com'
  no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"

ansible_host

如果用户的环境使用了跳板机,或者进行了某些定制化修改,无法通过简单的ssh <ip>方式访问,那么可以考虑使用Ansible的连接参数。ansible_host是ansiblel连接参数中最典型的一个。

Ansible中关于SSH连接的参数

  • ansible_host

    The name of the host to connect to, if different from the alias you wish to give to it.

  • ansible_port

    The ssh port number, if not 22

  • ansible_user

    The default ssh user name to use.

  • ansible_ssh_pass

    The ssh password to use (never store this variable in plain text; always use a vault. See Variables and Vaults)

  • ansible_ssh_private_key_file

    Private key file used by ssh. Useful if using multiple keys and you don’t want to use SSH agent.

  • ansible_ssh_common_args

    This setting is always appended to the default command line for sftp, scp, and ssh. Useful to configure a ProxyCommand for a certain host (or group).

  • ansible_sftp_extra_args

    This setting is always appended to the default sftp command line.

  • ansible_scp_extra_args

    This setting is always appended to the default scp command line.

  • ansible_ssh_extra_args

    This setting is always appended to the default ssh command line.

  • ansible_ssh_pipelining

    Determines whether or not to use SSH pipelining. This can override the pipelining setting in ansible.cfg.

最简单的用法是将ssh alias配置为ansible_host,只要用户可以通过 ssh <name>的方式访问目标机器,那么将ansible_host配置为<name>即可。

注意这些变量都是实例级别的变量。

Caveat

请注意,沙箱环境的默认配置使用了 SSH 别名 作为连接参数,这是因为vagrant宿主机访问虚拟机时使用了SSH别名配置。生产环境建议直接使用IP连接。

pg-meta:
  hosts:
    10.10.10.10: {pg_seq: 1, pg_role: primary, ansible_host: meta}

6.4 - 本地仓库

Pigsty中关于本地Yum源的配置项

Pigsty是一个复杂的软件系统,为了确保系统的稳定,Pigsty会在初始化过程中从互联网下载所有依赖的软件包并建立本地Yum源。

所有依赖的软件总大小约1GB左右,下载速度取决于您的网络情况。尽管Pigsty已经尽量使用镜像源以加速下载,但少量包的下载仍可能受到防火墙的阻挠,可能出现非常慢的情况。您可以通过proxy_env配置项设置下载代理以完成首次下载,或直接下载预先打包好的离线安装包

建立本地Yum源时,如果{{ repo_home }}/{{ repo_name }}目录已经存在,而且里面有repo_complete的标记文件,Pigsty会认为本地Yum源已经初始化完毕,因此跳过软件下载阶段,显著加快速度。离线安装包即是把{{ repo_home }}/{{ repo_name }}目录整个打成压缩包。

参数概览

名称 类型 层级 说明
repo_enabled bool G 是否启用本地源
repo_name string G 本地源名称
repo_address string G 本地源外部访问地址
repo_port number G 本地源端口
repo_home string G 本地源文件根目录
repo_rebuild bool A 是否重建Yum源
repo_remove bool A 是否移除已有Yum源
repo_upstreams object[] G Yum源的上游来源
repo_packages string[] G Yum源需下载软件列表
repo_url_packages string[] G 通过URL直接下载的软件

默认参数

repo_enabled: true                            # 是否启用本地源功能
repo_name: pigsty                             # 本地源名称
repo_address: yum.pigsty                      # 外部可访问的源地址 (ip:port 或 url)
repo_port: 80                                 # 源HTTP服务器监听地址
repo_home: /www                               # 默认根目录
repo_rebuild: false                           # 强制重新下载软件包
repo_remove: true                             # 移除已有的yum源
repo_upstreams: [...]                         # 上游Yum源
repo_packages: [...]                          # 需要下载的软件包
repo_url_packages: [...]                      # 通过URL下载的软件

参数详解

repo_enabled

如果为true(默认情况),执行正常的本地yum源创建流程,否则跳过构建本地yum源的操作。

repo_name

本地yum源的名称,默认为pigsty,您可以改为自己喜欢的名称,例如pgsql-rhel7等。

repo_address

本地yum源对外提供服务的地址,可以是域名也可以是IP地址,默认为yum.pigsty

如果使用域名,您必须确保在当前环境中该域名会解析到本地源所在的服务器,也就是元节点。

如果您的本地yum源没有使用标准的80端口,您需要在地址中加入端口,并与repo_port变量保持一致。

您可以通过节点参数中的静态DNS配置来为环境中的所有节点写入Pigsty本地源的域名,沙箱环境中即是采用这种方式来解析默认的yum.pigsty域名。

repo_port

本地yum源使用的HTTP端口,默认为80端口。

repo_home

本地yum源的根目录,默认为www

该目录将作为HTTP服务器的根对外暴露。

repo_rebuild

如果为false(默认情况),什么都不发生,如果为true,那么在任何情况下都会执行Repo重建的工作。

repo_remove

在执行本地源初始化的过程中,是否移除/etc/yum.repos.d中所有已有的repo?默认为true

原有repo文件会备份至/etc/yum.repos.d/backup中。

因为操作系统已有的源内容不可控,建议强制移除并通过repo_upstreams进行显式配置。

repo_upstream

所有添加到/etc/yum.repos.d中的Yum源,Pigsty将从这些源中下载软件。

Pigsty默认使用阿里云的CentOS7镜像源,清华大学Grafana镜像源,PackageCloud的Prometheus源,PostgreSQL官方源,以及SCLo,Harbottle,Nginx, Haproxy等软件源。

- name: base
  description: CentOS-$releasever - Base - Aliyun Mirror
  baseurl:
    - http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
    - http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
    - http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
  gpgcheck: no
  failovermethod: priority

- name: updates
  description: CentOS-$releasever - Updates - Aliyun Mirror
  baseurl:
    - http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
    - http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
    - http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
  gpgcheck: no
  failovermethod: priority

- name: extras
  description: CentOS-$releasever - Extras - Aliyun Mirror
  baseurl:
    - http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
    - http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
    - http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
  gpgcheck: no
  failovermethod: priority

- name: epel
  description: CentOS $releasever - EPEL - Aliyun Mirror
  baseurl: http://mirrors.aliyun.com/epel/$releasever/$basearch
  gpgcheck: no
  failovermethod: priority

- name: grafana
  description: Grafana - TsingHua Mirror
  gpgcheck: no
  baseurl: https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm

- name: prometheus
  description: Prometheus and exporters
  gpgcheck: no
  baseurl: https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch

- name: pgdg-common
  description: PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch
  gpgcheck: no
  baseurl: https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-$releasever-$basearch

- name: pgdg13
  description: PostgreSQL 13 for RHEL/CentOS $releasever - $basearch - Updates testing
  gpgcheck: no
  baseurl: https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-$releasever-$basearch

- name: centos-sclo
  description: CentOS-$releasever - SCLo
  gpgcheck: no
  mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo

- name: centos-sclo-rh
  description: CentOS-$releasever - SCLo rh
  gpgcheck: no
  mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh

- name: nginx
  description: Nginx Official Yum Repo
  skip_if_unavailable: true
  gpgcheck: no
  baseurl: http://nginx.org/packages/centos/$releasever/$basearch/

- name: haproxy
  description: Copr repo for haproxy
  skip_if_unavailable: true
  gpgcheck: no
  baseurl: https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/

# for latest consul & kubernetes
- name: harbottle
  description: Copr repo for main owned by harbottle
  skip_if_unavailable: true
  gpgcheck: no
  baseurl: https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/

repo_packages

需要下载的rpm安装包列表,默认下载的软件包如下所示:

# - what to download - #
repo_packages:
  # repo bootstrap packages
  - epel-release nginx wget yum-utils yum createrepo                                      # bootstrap packages

  # node basic packages
  - ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned   # basic system util
  - readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates     # basic pg dependency
  - numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet  # system utils

  # dcs & monitor packages
  - grafana prometheus2 pushgateway alertmanager                                          # monitor and ui
  - node_exporter postgres_exporter nginx_exporter blackbox_exporter                      # exporter
  - consul consul_exporter consul-template etcd                                           # dcs

  # python3 dependencies
  - ansible python python-pip python-psycopg2                                             # ansible & python
  - python3 python3-psycopg2 python36-requests python3-etcd python3-consul                # python3
  - python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography               # python3 patroni extra deps

  # proxy and load balancer
  - haproxy keepalived dnsmasq                                                            # proxy and dns

  # postgres common Packages
  - patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity               # major components
  - pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel                # other common utils
  - barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail

  # postgres 13 packages
  - postgresql13* postgis31* citus_13 pgrouting_13                                          # postgres 13 and postgis 31
  - pg_repack13 pg_squeeze13                                                                # maintenance extensions
  - pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13                        # stats extensions
  - plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13      # PL extensions                                      # pl extensions
  - hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13             # FDW extensions
  - wal2json13 count_distinct13 ddlx_13 geoip13 orafce13                                    # MISC extensions
  - rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13
  - pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13
  - pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13
  - pguint13 pguri13 prefix13  safeupdate_13 semver13  table_version13 tdigest13

repo_url_packages

采用URL直接下载,而非yum下载的软件包。您可以将自定义的软件包连接添加到这里。

Pigsty默认会通过URL下载三款软件:

  • pg_exporter(必须,监控系统核心组件)
  • vip-manager(可选,启用VIP时必须)
  • polysh(可选,多机管理便捷工具)
repo_url_packages:
  - https://github.com/Vonng/pg_exporter/releases/download/v0.3.1/pg_exporter-0.3.1-1.el7.x86_64.rpm
  - https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm
  - http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm

6.5 - 节点供给

Pigsty中关于机器与操作系统、基础设施的配置参数

Pigsty中关于机器与操作系统、基础设施的配置参数

参数概览

名称 类型 层级 说明
nodename string I 若指定,覆盖机器HOSTNAME
node_dns_hosts string[] G 写入机器的静态DNS解析
node_dns_server enum G 如何配置DNS服务器?
node_dns_servers string[] G 配置动态DNS服务器
node_dns_options string[] G 配置/etc/resolv.conf
node_repo_method enum G 节点使用Yum源的方式
node_repo_remove bool G 是否移除节点已有Yum源
node_local_repo_url string[] G 本地源的URL地址
node_packages string[] G 节点安装软件列表
node_extra_packages string[] C/I/A 节点额外安装的软件列表
node_meta_packages string[] G 元节点所需的软件列表
node_disable_numa bool G 关闭节点NUMA
node_disable_swap bool G 关闭节点SWAP
node_disable_firewall bool G 关闭节点防火墙
node_disable_selinux bool G 关闭节点SELINUX
node_static_network bool G 是否使用静态DNS服务器
node_disk_prefetch string G 是否启用磁盘预读
node_kernel_modules string[] G 启用的内核模块
node_tune enum G 节点调优模式
node_sysctl_params dict G 操作系统内核参数
node_admin_setup bool G 是否创建管理员用户
node_admin_uid number G 管理员用户UID
node_admin_username string G 管理员用户名
node_admin_ssh_exchange bool G 在实例间交换管理员SSH密钥
node_admin_pks string[] G 可登陆管理员的公钥列表
node_admin_pk_current bool A 是否将当前用户的公钥加入管理员账户
node_ntp_service enum G NTP服务类型:ntp或chrony
node_ntp_config bool G 是否配置NTP服务?
node_timezone string G NTP时区设置
node_ntp_servers string[] G NTP服务器列表

默认配置

#------------------------------------------------------------------------------
# NODE PROVISION
#------------------------------------------------------------------------------
# this section defines how to provision nodes
# nodename:                                   # if defined, node's hostname will be overwritten

# - node dns - #
node_dns_hosts: # static dns records in /etc/hosts
  - 10.10.10.10 yum.pigsty
node_dns_server: add                          # add (default) | none (skip) | overwrite (remove old settings)
node_dns_servers:                             # dynamic nameserver in /etc/resolv.conf
  - 10.10.10.10
node_dns_options:                             # dns resolv options
  - options single-request-reopen timeout:1 rotate
  - domain service.consul

# - node repo - #
node_repo_method: local                       # none|local|public (use local repo for production env)
node_repo_remove: true                        # whether remove existing repo
node_local_repo_url:                          # local repo url (if method=local, make sure firewall is configured or disabled)
  - http://yum.pigsty/pigsty.repo

# - node packages - #
node_packages:                                # common packages for all nodes
  - wget,yum-utils,sshpass,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
  - numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
  - python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
  - python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
  - node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
node_extra_packages:                          # extra packages for all nodes
  - patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
node_meta_packages:                           # packages for meta nodes only
  - grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
  - dnsmasq,nginx,ansible,pgbadger,polysh

# build & devel packages (add to repo_packages too if you want build database & extensions from source)
# - gcc,gcc-c++,clang,coreutils,diffutils,rpm-build,rpm-devel,rpmlint,rpmdevtools
# - zlib-devel,openssl-libs,openssl-devel,pam-devel,libxml2-devel,libxslt-devel,openldap-devel,systemd-devel,tcl-devel,python-devel


# - node features - #
node_disable_numa: false                      # disable numa, important for production database, reboot required
node_disable_swap: false                      # disable swap, important for production database
node_disable_firewall: true                   # disable firewall (required if using kubernetes)
node_disable_selinux: true                    # disable selinux  (required if using kubernetes)
node_static_network: true                     # keep dns resolver settings after reboot
node_disk_prefetch: false                     # setup disk prefetch on HDD to increase performance

# - node kernel modules - #
node_kernel_modules:
  - softdog
  - br_netfilter
  - ip_vs
  - ip_vs_rr
  - ip_vs_rr
  - ip_vs_wrr
  - ip_vs_sh
  - nf_conntrack_ipv4

# - node tuned - #
node_tune: tiny                               # install and activate tuned profile: none|oltp|olap|crit|tiny
node_sysctl_params: {}                        # set additional sysctl parameters, k:v format
# net.bridge.bridge-nf-call-iptables: 1     # example kv parameters

# - node user - #
node_admin_setup: true                        # setup an default admin user ?
node_admin_uid: 88                            # uid and gid for admin user
node_admin_username: dba                      # default admin user: dba
node_admin_ssh_exchange: true                 # exchange admin's ssh key among cluster ?
node_admin_pk_current: false                  # add current user's ~/.ssh/id_rsa.pub to admin pk
node_admin_pks:                               # ssh public keys to be added to admin user
  - 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com'

# - node ntp - #
node_ntp_service: ntp                         # ntp or chrony
node_ntp_config: true                         # overwrite existing ntp config?
node_timezone: Asia/Shanghai                  # default node timezone
node_ntp_servers:                             # default NTP servers
  - pool cn.pool.ntp.org iburst
  - pool pool.ntp.org iburst
  - pool time.pool.aliyun.com iburst
  - server 10.10.10.10 iburst

参数详解

nodename

如果配置了该参数,那么实例的HOSTNAM将会被该名称覆盖。

该选项可用于为节点显式指定名称。如果要使用PG的实例名称作为节点名称,可以使用pg_hostname选项

node_dns_hosts

机器节点的默认静态DNS解析记录,每一条记录都会在机器节点初始化时写入/etc/hosts中,特别适合用于配置基础设施地址。

node_dns_hosts是一个数组,每一个元素都是形如ip domain_name的字符串,代表一条DNS解析记录。

默认情况下,Pigsty会向/etc/hosts中写入10.10.10.10 yum.pigsty,这样可以在DNS Nameserver启动之前,采用域名的方式访问本地yum源。

node_dns_server

机器节点默认的动态DNS服务器的配置方式,有三种模式:

  • add:将node_dns_servers中的记录追加至/etc/resolv.conf,并保留已有DNS服务器。(默认)
  • overwrite:使用将node_dns_servers中的记录覆盖/etc/resolv.conf
  • none:跳过DNS服务器配置

node_dns_servers

如果node_dns_server配置为addoverwrite,则node_dns_servers中的记录会被追加或覆盖至/etc/resolv.conf中。具体格式请参考Linux文档关于/etc/resolv.conf的说明。

Pigsty默认会添加元节点作为DNS Server,元节点上的DNSMASQ会响应环境中的DNS请求。

node_dns_servers: # dynamic nameserver in /etc/resolv.conf
  - 10.10.10.10

node_dns_options

如果node_dns_server配置为addoverwrite,则node_dns_options中的记录会被追加或覆盖至/etc/resolv.conf中。具体格式请参考Linux文档关于/etc/resolv.conf的说明

Pigsty默认添加的解析选项为:

- options single-request-reopen timeout:1 rotate
- domain service.consul

node_repo_method

机器节点Yum软件源的配置方式,有三种模式:

  • local:使用元节点上的本地Yum源,默认行为,推荐。
  • public:直接使用互联网源安装,将repo_upstream中的公共repo写入/etc/yum.repos.d/
  • none:不对本地源进行配置与修改。

node_repo_remove

原有Yum源的处理方式,是否移除节点上原有的Yum源?

Pigsty默认会移除/etc/yum.repos.d中原有的配置文件,并备份至/etc/yum.repos.d/backup

node_local_repo_url

如果node_repo_method配置为local,则这里列出的Repo文件URL会被下载至/etc/yum.repos.d

这里是一个Repo File URL 构成的数组,Pigsty默认会将元节点上的本地Yum源加入机器的源配置中。

node_local_repo_url:
  - http://yum.pigsty/pigsty.repo

node_packages

通过yum安装的软件包列表。

软件包列表为数组,但每个元素可以包含由逗号分隔的多个软件包,Pigsty默认安装的软件包列表如下:

node_packages: # common packages for all nodes
  - wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
  - numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
  - python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
  - python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
  - node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager

node_extra_packages

通过yum安装的额外软件包列表。

node_packages类似,但node_packages通常是全局统一配置,而node_extra_packages则是针对具体节点进行例外处理。例如,您可以为运行PG的节点安装额外的工具包。该变量通常在集群和实例级别进行覆盖定义。

Pigsty默认安装的额外软件包列表如下:

- patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity

node_meta_packages

通过yum安装的元节点软件包列表。

node_packagesnode_extra_packages类似,但node_meta_packages中列出的软件包只会在元节点上安装。因此通常都是监控软件,管理工具,构建工具等。

Pigsty默认安装的元节点软件包列表如下:

node_meta_packages:                           # packages for meta nodes only
  - grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
  - dnsmasq,nginx,ansible,pgbadger,polysh

node_disable_numa

是否关闭Numa,注意,该选项需要重启后生效!默认不关闭,但生产环境强烈建议关闭NUMA。

node_disable_swap

是否禁用SWAP,如果您有足够的内存,且数据库采用独占式部署,建议直接关闭SWAP提高性能,默认关闭。

node_disable_firewall

是否关闭防火墙,这个东西非常讨厌,建议关闭,默认关闭。

node_disable_selinux

是否关闭SELinux,这个东西非常讨厌,建议关闭,默认关闭。

node_static_network

是否采用静态网络配置,默认启用

启用静态网络,意味着您的DNS Resolv配置不会因为机器重启与网卡变动被覆盖。建议启用。

node_disk_prefetch

是否启用磁盘预读?

针对HDD部署的实例可以优化吞吐量,默认关闭。

node_kernel_modules

需要安装的内核模块

Pigsty默认会启用以下内核模块

node_kernel_modules: [softdog, ip_vs, ip_vs_rr, ip_vs_rr, ip_vs_wrr, ip_vs_sh]

node_tune

针对机器进行调优的预制方案

node_sysctl_params

修改sysctl系统参数

字典KV结构

node_admin_setup

是否在每个节点上创建管理员用户(免密sudo与ssh),默认会创建。

Pigsty默认会创建名为admin (uid=88)的管理用户,可以从元节点上通过SSH免密访问环境中的其他节点并执行免密sudo。

node_admin_uid

管理员用户的uid,默认为88

node_admin_username

管理员用户的名称,默认为admin

node_admin_ssh_exchange

是否在当前执行命令的机器之间相互交换管理员用户的SSH密钥?

默认会执行交换,这样管理员可以在机器间快速跳转。

node_admin_pks

写入到管理员~/.ssh/authorized_keys中的密钥

持有对应私钥的用户可以以管理员身份登陆。

node_admin_current_pk

布尔类型,通常用作命令行参数。用于将当前用户的SSH公钥(~/.ssh/id_rsa.pub)拷贝至管理员用户的authorized_keys中。默认不拷贝。

node_ntp_service

指明系统使用的NTP服务类型:

  • ntp:传统NTP服务
  • chrony:CentOS 7/8默认使用的时间服务

node_ntp_config

是否覆盖现有NTP配置?

布尔选项,默认覆盖。

node_timezone

默认使用的时区

Pigsty默认使用Asia/Shanghai,请根据您的实际情况调整。

node_ntp_servers

NTP服务器地址

Pigsty默认会使用以下NTP服务器

- pool cn.pool.ntp.org iburst
- pool pool.ntp.org iburst
- pool time.pool.aliyun.com iburst
- server 10.10.10.10 iburst

6.6 - 基础设施

Pigsty中关于基础设施的配置参数:CA,DNS,Nginx,Prometheus,Grafana

这一节定义了部署于元节点上的 基础设施 ,包括:

参数概览

名称 类型 层级 说明
ca_method enum G CA的创建方式
ca_subject string G 自签名CA主题
ca_homedir string G CA证书根目录
ca_cert string G CA证书
ca_key string G CA私钥名称
nginx_upstream object[] G Nginx上游服务器
dns_records string[] G 动态DNS解析记录
prometheus_data_dir string G Prometheus数据库目录
prometheus_options string G Prometheus命令行参数
prometheus_reload bool A Reload而非Recreate
prometheus_sd_method enum G 服务发现机制:static|consul
prometheus_scrape_interval interval G Prom抓取周期
prometheus_scrape_timeout interval G Prom抓取超时
prometheus_sd_interval interval G Prom服务发现刷新周期
grafana_url string G Grafana地址
grafana_admin_password string G Grafana管理员密码
grafana_plugin enum G 如何安装Grafana插件
grafana_cache string G Grafana插件缓存地址
grafana_customize bool G 是否定制Grafana
grafana_plugins string[] G 安装的Grafana插件列表
grafana_git_plugins string[] G 从Git安装的Grafana插件
loki_clean bool A 是否在安装Loki时清理数据库目录
loki_data_dir string G Loki的数据目录

默认参数

#------------------------------------------------------------------------------
# META PROVISION
#------------------------------------------------------------------------------
# - ca - #
ca_method: create                             # create|copy|recreate
ca_subject: "/CN=root-ca"                     # self-signed CA subject
ca_homedir: /ca                               # ca cert directory
ca_cert: ca.crt                               # ca public key/cert
ca_key: ca.key                                # ca private key

# - nginx - #
nginx_upstream:
  - { name: home,          host: pigsty,   url: "127.0.0.1:3000"}
  - { name: consul,        host: c.pigsty, url: "127.0.0.1:8500" }
  - { name: grafana,       host: g.pigsty, url: "127.0.0.1:3000" }
  - { name: prometheus,    host: p.pigsty, url: "127.0.0.1:9090" }
  - { name: alertmanager,  host: a.pigsty, url: "127.0.0.1:9093" }
  - { name: haproxy,       host: h.pigsty, url: "127.0.0.1:9091" }

# - nameserver - #
dns_records: # dynamic dns record resolved by dnsmasq
  - 10.10.10.2  pg-meta                       # sandbox vip for pg-meta
  - 10.10.10.3  pg-test                       # sandbox vip for pg-test
  - 10.10.10.10 meta-1                        # sandbox node meta-1 (node-0)
  - 10.10.10.11 node-1                        # sandbox node node-1
  - 10.10.10.12 node-2                        # sandbox node node-2
  - 10.10.10.13 node-3                        # sandbox node node-3
  - 10.10.10.10 pigsty
  - 10.10.10.10 y.pigsty yum.pigsty
  - 10.10.10.10 c.pigsty consul.pigsty
  - 10.10.10.10 g.pigsty grafana.pigsty
  - 10.10.10.10 p.pigsty prometheus.pigsty
  - 10.10.10.10 a.pigsty alertmanager.pigsty
  - 10.10.10.10 n.pigsty ntp.pigsty
  - 10.10.10.10 h.pigsty haproxy.pigsty

# - prometheus - #
prometheus_data_dir: /export/prometheus/data  # prometheus data dir
prometheus_options: '--storage.tsdb.retention=30d'
prometheus_reload: false                      # reload prometheus instead of recreate it
prometheus_sd_method: consul                  # service discovery method: static|consul|etcd
prometheus_scrape_interval: 5s                # global scrape & evaluation interval
prometheus_scrape_timeout: 4s                 # scrape timeout
prometheus_sd_interval: 5s                    # service discovery refresh interval

# - grafana - #
grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
grafana_admin_password: admin                  # default grafana admin user password
grafana_plugin: install                        # none|install|reinstall
grafana_cache: /www/pigsty/grafana/plugins.tar.gz # path to grafana plugins tarball
grafana_customize: false                       # customize grafana resources
grafana_plugins: # default grafana plugins list
  - redis-datasource
  - simpod-json-datasource
  - fifemon-graphql-datasource
  - sbueringer-consul-datasource
  - camptocamp-prometheus-alertmanager-datasource
  - ryantxu-ajax-panel
  - marcusolsson-hourly-heatmap-panel
  - michaeldmoore-multistat-panel
  - marcusolsson-treemap-panel
  - pr0ps-trackmap-panel
  - dalvany-image-panel
  - magnesium-wordcloud-panel
  - cloudspout-button-panel
  - speakyourcode-button-panel
  - jdbranham-diagram-panel
  - grafana-piechart-panel
  - snuids-radar-panel
  - digrich-bubblechart-panel
grafana_git_plugins:
  - https://github.com/Vonng/grafana-echarts

# - loki - #
loki_clean: false                 # whether remove existing loki data
loki_data_dir: /export/loki       # default loki data dir

参数详解

ca_method

  • create:创建新的公私钥用于CA
  • copy:拷贝现有的CA公私钥用于构建CA

(Pigsty开源版暂未使用CA基础设施高级安全特性)

ca_subject

CA自签名的主题

默认主题为:

"/CN=root-ca"

ca_homedir

CA文件的根目录

默认为/ca

ca_cert

CA公钥证书名称

默认为:ca.crt

ca_key

CA私钥文件名称

默认为ca.key

nginx_upstream

Nginx上游服务的URL与域名

Nginx会通过Host进行流量转发,因此确保访问Pigsty基础设施服务时,配置有正确的域名。

不要修改name 部分的定义。

nginx_upstream:
- { name: home,          host: pigsty,   url: "127.0.0.1:3000"}
- { name: consul,        host: c.pigsty, url: "127.0.0.1:8500" }
- { name: grafana,       host: g.pigsty, url: "127.0.0.1:3000" }
- { name: prometheus,    host: p.pigsty, url: "127.0.0.1:9090" }
- { name: alertmanager,  host: a.pigsty, url: "127.0.0.1:9093" }
- { name: haproxy,       host: h.pigsty, url: "127.0.0.1:9091" }

dns_records

动态DNS解析记录

每一条记录都会写入元节点的/etc/hosts中,并由元节点上的域名服务器提供解析。

prometheus_data_dir

Prometheus数据目录

默认位于/export/prometheus/data

prometheus_options

Prometheus命令行参数

默认参数为:--storage.tsdb.retention=30d,即保留30天的监控数据

参数prometheus_retention的功能被此参数覆盖,于v0.6后弃用。

prometheus_reload

如果为true,执行Prometheus任务时不会清除已有数据目录。

默认为:false,即执行prometheus剧本时会清除已有监控数据。

prometheus_sd_method

Prometheus使用的服务发现机制,默认为consul,可选项:

  • consul:基于Consul进行服务发现
  • static:基于本地配置文件进行服务发现

Pigsty建议使用consul服务发现,当服务器发生Failover时,监控系统会自动更正目标实例所注册的身份。

static服务发现依赖/etc/prometheus/targets/*.yml中的配置进行服务发现。采用这种方式的优势是不依赖Consul。当Pigsty监控系统与外部管控方案集成时,这种模式对原系统的侵入性较小。同时,当集群内发生主从切换时,您需要自行维护实例角色信息。

手动维护时,可以根据以下命令从配置文件生成Prometheus所需的监控对象配置文件。

./infra.yml --tags=prometheus_targtes,prometheus_reload

详细信息请参考:服务发现

prometheus_sd_target(过时)

目前Pigsty中Prometheus的服务发现对象统一采用集群模式管理,不再提供配置

prometheus_sd_method == 'static' 时,监控目标定义文件管理的方式:

  • batch:使用批量管理的单一配置文件:/etc/prometheus/targets/all.yml
  • single:使用每个实例一个的配置文件:/etc/prometheus/targets/{{ pg_instance }}.yml

使用批量管理的单一配置文件管理简单,但用户必须使用默认的单一配置文件方式(即所有数据库集群的定义都在同一个配置文件中),才可以使用这种管理方式。

当使用分立式的配置文件(每个集群一个配置文件)时,用户需要使用 single 管理模式。每一个新数据库实例都会在元节点的 /etc/prometheus/targets/ 目录下创建一个新的定义文件。

prometheus_scrape_interval

Prometheus抓取周期

默认为2s,建议在生产环境中使用15s

prometheus_scrape_timeout

Prometheus抓取超时

默认为1s,建议在生产环境中使用10s,或根据实际需求进行配置。

prometheus_sd_interval

Prometheus刷新服务发现列表的周期

默认为5s,建议在生产环境中使用更长的间隔,或根据实际需求进行配置。

prometheus_metrics_path (弃用)

Prometheus 抓取指标暴露器的URL路径,默认为/metrics

已经被外部变量引用exporter_metrics_path所替代,不再生效。

prometheus_retention(弃用)

Prometheus数据保留期限,默认配置30天

参数prometheus_retention的功能被参数prometheus_options覆盖,于v0.6后弃用。

grafana_url

Grafana对外提供服务的端点,需要带上用户名与密码。

Grafana Provision的过程中会使用该URL调用Grafana API

grafana_admin_password

Grafana管理用户的密码

默认为admin

grafana_plugin

Grafana插件的供给方式

  • none:不安装插件
  • install: 安装Grafana插件(默认)
  • reinstall: 强制重新安装Grafana插件

Grafana需要访问互联网以下载若干扩展插件,如果您的元节点没有互联网访问,离线安装包中已经包含了所有下载好的Grafana插件。Pigsty会在插件下载完成后重新制作新的插件缓存安装包。

grafana_cache

Grafana插件缓存文件地址

离线安装包中已经包含了所有下载并打包好的Grafana插件,如果插件包目录已经存在,Pigsty就不会尝试从互联网重新下载Grafana插件。

默认的离线插件缓存地址为:/www/pigsty/grafana/plugins.tar.gz (假设本地Yum源名为pigsty

grafana_customize

标记,是否要定制Grafana

如果选择是,Grafana的Logo会被替换为Pigsty,你懂的。

grafana_plugins

Grafana插件列表

数组,每个元素是一个插件名称。

插件会通过grafana-cli plugins install的方式进行安装。

默认安装的插件有:

grafana_plugins: # default grafana plugins list
  - redis-datasource
  - simpod-json-datasource
  - fifemon-graphql-datasource
  - sbueringer-consul-datasource
  - camptocamp-prometheus-alertmanager-datasource
  - ryantxu-ajax-panel
  - marcusolsson-hourly-heatmap-panel
  - michaeldmoore-multistat-panel
  - marcusolsson-treemap-panel
  - pr0ps-trackmap-panel
  - dalvany-image-panel
  - magnesium-wordcloud-panel
  - cloudspout-button-panel
  - speakyourcode-button-panel
  - jdbranham-diagram-panel
  - grafana-piechart-panel
  - snuids-radar-panel
  - digrich-bubblechart-panel

grafana_git_plugins

Grafana的Git插件

一些插件无法通过官方命令行下载,但可以通过Git Clone的方式下载,则可以考虑使用本参数。

数组,每个元素是一个插件名称。

插件会通过cd /var/lib/grafana/plugins && git clone 的方式进行安装。

默认会下载一个可视化插件:

grafana_git_plugins:
  - https://github.com/Vonng/grafana-echarts

loki_clean

bool类型,命令行参数,用于指明安装Loki时是否先清理Loki数据目录?

Loki不属于默认安装的监控组件,该参数目前只会被 infra-loki.yml 剧本使用。

loki_data_dir

字符串类型,文件系统路径,用于指定Loki数据目录位置。

默认位于/export/loki/

Loki不属于默认安装的监控组件,该参数目前只会被 infra-loki.yml 剧本使用。

6.7 - 元数据库

Pigsty中关于元数据库(Consul/Etcd)的配置参数

Pigsty使用DCS(Distributive Configuration Storage)作为元数据库。DCS有三个重要作用:

  • 主库选举:Patroni基于DCS进行选举与切换
  • 配置管理:Patroni使用DCS管理Postgres的配置
  • 身份管理:监控系统基于DCS管理并维护数据库实例的身份信息。

DCS对于数据库的稳定至关重要,Pigsty出于演示目的提供了基本的Consul与Etcd支持,在元节点部署了DCS服务。建议在生产环境中使用专用机器部署专用DCS集群。

参数概览

名称 类型 层级 说明
service_registry enum G/C/I 服务注册的位置
dcs_type enum G 使用的DCS类型
dcs_name string G DCS集群名称
dcs_servers dict G DCS服务器名称:IP列表
dcs_exists_action enum G/A 若DCS实例存在如何处理
dcs_disable_purge bool G/C/I 完全禁止清理DCS实例
consul_data_dir string G Consul数据目录
etcd_data_dir string G Etcd数据目录

默认参数

#------------------------------------------------------------------------------
# DCS PROVISION
#------------------------------------------------------------------------------
service_registry: consul                      # where to register services: none | consul | etcd | both
dcs_type: consul                              # consul | etcd | both
dcs_name: pigsty                              # consul dc name | etcd initial cluster token
dcs_servers:                                  # dcs server dict in name:ip format
  meta-1: 10.10.10.10                         # you could use existing dcs cluster
  # meta-2: 10.10.10.11                       # host which have their IP listed here will be init as server
  # meta-3: 10.10.10.12                       # 3 or 5 dcs nodes are recommend for production environment
dcs_exists_action: skip                       # abort|skip|clean if dcs server already exists
dcs_disable_purge: false                      # set to true to disable purge functionality for good (force dcs_exists_action = abort)
consul_data_dir: /var/lib/consul              # consul data dir (/var/lib/consul by default)
etcd_data_dir: /var/lib/etcd                  # etcd data dir (/var/lib/consul by default)

参数详解

service_registry

服务注册的地址,被多个组件引用。

  • none:不执行服务注册(当执行仅监控部署时,必须指定none模式)
  • consul:将服务注册至Consul中
  • etcd:将服务注册至Etcd中(尚未支持)

dcs_type

DCS类型,有两种选项:

  • Consul

  • Etcd (支持尚不完善)

dcs_name

DCS集群名称

默认为pigsty

在Consul中代表 DataCenter名称

dcs_servers

DCS服务器名称与地址,采用字典格式,Key为DCS服务器实例名称,Value为对应的IP地址。

可以使用外部的已有DCS服务器,也可以在目标机器上初始化新的DCS服务器。

如果采用初始化新DCS实例的方式,建议先在所有DCS Server(通常也是元节点)上完成DCS初始化。

尽管您也可以一次性初始化所有的DCS Server与DCS Agent,但必须在完整初始化时将所有Server囊括在内。此时所有IP地址匹配dcs_servers项的目标机器将会在DCS初始化过程中,初始化为DCS Server。

强烈建议使用奇数个DCS Server,演示环境可使用单个DCS Server,生产环境建议使用3~5个确保DCS可用性。

您必须根据实际情况显式配置DCS Server,例如在沙箱环境中,您可以选择启用1个或3个DCS节点。

dcs_servers:
  meta-1: 10.10.10.10
  meta-2: 10.10.10.11 
  meta-3: 10.10.10.12 

dcs_exists_action

安全保险,当Consul实例已经存在时,系统应当执行的动作

  • abort: 中止整个剧本的执行(默认行为)
  • clean: 抹除现有DCS实例并继续(极端危险)
  • skip: 忽略存在DCS实例的目标(中止),在其他目标机器上继续执行。

如果您真的需要强制清除已经存在的DCS实例,建议先使用pgsql-rm.yml完成集群与实例的下线与销毁,在重新执行初始化。否则,则需要通过命令行参数-e dcs_exists_action=clean完成覆写,强制在初始化过程中抹除已有实例。

dcs_disable_purge

双重安全保险,默认为false。如果为true,强制设置dcs_exists_action变量为abort

等效于关闭dcs_exists_action的清理功能,确保任何情况下DCS实例都不会被抹除。

consul_data_dir

Consul数据目录地址

默认为/var/lib/consul

etcd_data_dir

Etcd数据目录地址

默认为/var/lib/etcd

6.8 - PG安装

Pigsty中关于Postgres安装的相关参数

PG Install 部分负责在一台装有基本软件的机器上完成所有PostgreSQL依赖项的安装。用户可以配置数据库超级用户的名称、ID、权限、访问,配置安装所用的源,配置安装地址,安装的版本,所需的软件包与扩展插件。

这里的大多数参数只需要在整体升级数据库大版本时修改,用户可以通过pg_version指定需要安装的软件版本,并在集群层面进行覆盖,为不同的集群安装不同的数据库版本。

参数概览

名称 类型 层级 说明
pg_dbsu string G/C PG操作系统超级用户
pg_dbsu_uid number G/C 超级用户UID
pg_dbsu_sudo enum G/C 超级用户的Sudo权限
pg_dbsu_home string G/C 超级用户的家目录
pg_dbsu_ssh_exchange bool G/C 是否交换超级用户密钥
pg_version string G/C 安装的数据库大版本
pgdg_repo bool G/C 是否添加PG官方源?
pg_add_repo bool G/C 是否添加PG相关源?
pg_bin_dir string G/C PG二进制目录
pg_packages string[] G/C 安装的PG软件包列表
pg_extensions string[] G/C 安装的PG插件列表

默认参数

#------------------------------------------------------------------------------
# POSTGRES INSTALLATION
#------------------------------------------------------------------------------
# - dbsu - #
pg_dbsu: postgres                             # os user for database, postgres by default (change it is not recommended!)
pg_dbsu_uid: 26                               # os dbsu uid and gid, 26 for default postgres users and groups
pg_dbsu_sudo: limit                           # none|limit|all|nopass (Privilege for dbsu, limit is recommended)
pg_dbsu_home: /var/lib/pgsql                  # postgresql binary
pg_dbsu_ssh_exchange: false                   # exchange ssh key among same cluster

# - postgres packages - #
pg_version: 13                                # default postgresql version
pgdg_repo: false                              # use official pgdg yum repo (disable if you have local mirror)
pg_add_repo: false                            # add postgres related repo before install (useful if you want a simple install)
pg_bin_dir: /usr/pgsql/bin                    # postgres binary dir
pg_packages:
  - postgresql${pg_version}*
  - postgis31_${pg_version}*
  - pgbouncer patroni pg_exporter pgbadger
  - patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity
  - python3 python3-psycopg2 python36-requests python3-etcd python3-consul
  - python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography

pg_extensions:
  - pg_repack${pg_version} pg_qualstats${pg_version} pg_stat_kcache${pg_version} wal2json${pg_version}
  # - ogr_fdw${pg_version} mysql_fdw_${pg_version} redis_fdw_${pg_version} mongo_fdw${pg_version} hdfs_fdw_${pg_version}
  # - count_distinct${version}  ddlx_${version}  geoip${version}  orafce${version}                                   # popular features
  # - hypopg_${version}  ip4r${version}  jsquery_${version}  logerrors_${version}  periods_${version}  pg_auto_failover_${version}  pg_catcheck${version}
  # - pg_fkpart${version}  pg_jobmon${version}  pg_partman${version}  pg_prioritize_${version}  pg_track_settings${version}  pgaudit15_${version}
  # - pgcryptokey${version}  pgexportdoc${version}  pgimportdoc${version}  pgmemcache-${version}  pgmp${version}  pgq-${version}  pgquarrel pgrouting_${version}
  # - pguint${version}  pguri${version}  prefix${version}   safeupdate_${version}  semver${version}   table_version${version}  tdigest${version}


参数详解

pg_dbsu

数据库默认使用的操作系统用户(超级用户)的用户名称,默认为postgres,不建议修改。

pg_dbsu_uid

数据库默认使用的操作系统用户(超级用户)的UID,默认为26

与CentOS下PostgreSQL官方RPM包的配置一致,不建议修改。

pg_dbsu_sudo

数据库超级用户的默认权限:

  • none:没有sudo权限
  • limit:有限的sudo权限,可以执行数据库相关组件的systemctl命令,默认
  • all:带有完整sudo权限,但需要密码。
  • nopass:不需要密码的完整sudo权限(不建议)

pg_dbsu_home

数据库超级用户的家目录,默认为/var/lib/pgsql

pg_dbsu_ssh_exchange

是否在执行的机器之间交换超级用户的SSH公私钥

pg_version

希望安装的PostgreSQL版本,默认为13

建议在集群级别按需覆盖此变量。

pgdg_repo

标记,是否使用PostgreSQL官方源?默认不使用

使用该选项,可以在没有本地源的情况下,直接从互联网官方源下载安装PostgreSQL相关软件包。

pg_add_repo

如果使用,则会在安装PostgreSQL前添加PGDG的官方源

启用此选项,则可以在未执行基础设施初始化的前提下直接执行数据库初始化,尽管可能会很慢,但对于缺少基础设施的场景尤为实用。

pg_bin_dir

PostgreSQL二进制目录

默认为/usr/pgsql/bin/,这是一个安装过程中手动创建的软连接,指向安装的具体Postgres版本目录。

例如/usr/pgsql -> /usr/pgsql-13

pg_packages

默认安装的PostgreSQL软件包

软件包中的${pg_version}会被替换为实际安装的PostgreSQL版本。

pg_extensions

需要安装的PostgreSQL扩展插件软件包

软件包中的${pg_version}会被替换为实际安装的PostgreSQL版本。

默认安装的插件包括:

pg_repack${pg_version}
pg_qualstats${pg_version}
pg_stat_kcache${pg_version}
wal2json${pg_version}

按需启用,但强烈建议安装pg_repack扩展。

6.9 - PG供给

Pigsty中关于如何拉起一套数据库集群的定义参数

PG供给,是在一台安装完Postgres的机器上,创建并拉起一套数据库的过程,包括:

  • 集群身份定义,清理现有实例,创建目录结构,拷贝工具与脚本,配置环境变量
  • 渲染Patroni模板配置文件,使用Patroni拉起主库,使用Patroni拉起从库
  • 配置Pgbouncer,初始化业务用户与数据库,将数据库与数据源服务注册至DCS。

参数概览

名称 类型 层级 说明
pg_cluster string C PG数据库集群名称身份参数
pg_seq number I PG数据库实例序号身份参数
pg_role enum I PG数据库实例角色身份参数
pg_shard string C PG数据库分片集簇名身份参数
pg_sindex number C PG数据库分片集簇号身份参数
pg_hostname bool G/C 将PG实例名称设为HOSTNAME
pg_nodename bool G/C 将PG实例名称设为Consul节点名
pg_exists bool A 标记位,PG是否已存在
pg_exists_action enum G/A PG存在时如何处理
pg_disable_purge enum G/C/I 禁止清除存在的PG实例
pg_data string G PG数据目录
pg_fs_main string G PG主数据盘挂载点
pg_fs_bkup path G PG备份盘挂载点
pg_listen ip G PG监听的IP地址
pg_port number G PG监听的端口
pg_localhost string G/C PG使用的UnixSocket地址
pg_upstream string I 实例的复制上游节点
pg_backup bool I 是否在实例上存储备份
pg_delay interval I 若实例为延迟从库,采用的延迟时长
patroni_mode enum G/C Patroni配置模式
pg_namespace string G/C Patroni使用的DCS命名空间
patroni_port string G/C Patroni服务端口
patroni_watchdog_mode enum G/C Patroni Watchdog模式
pg_conf enum G/C Patroni使用的配置模板
pg_encoding string G/C PG字符集编码
pg_locale enum G/C PG使用的本地化规则
pg_lc_collate enum G/C PG使用的本地化排序规则
pg_lc_ctype enum G/C PG使用的本地化字符集定义
pgbouncer_port number G/C Pgbouncer端口
pgbouncer_poolmode enum G/C Pgbouncer池化模式
pgbouncer_max_db_conn number G/C Pgbouncer最大单DB连接数

默认参数

#------------------------------------------------------------------------------
# POSTGRES PROVISION
#------------------------------------------------------------------------------
# - identity - #
# pg_cluster:                                 # [REQUIRED] cluster name (validated during pg_preflight)
# pg_seq: 0                                   # [REQUIRED] instance seq (validated during pg_preflight)
# pg_role: replica                            # [REQUIRED] service role (validated during pg_preflight)
pg_hostname: false                            # overwrite node hostname with pg instance name
pg_nodename: true                             # overwrite consul nodename with pg instance name

# - retention - #
# pg_exists_action, available options: abort|clean|skip
#  - abort: abort entire play's execution (default)
#  - clean: remove existing cluster (dangerous)
#  - skip: end current play for this host
# pg_exists: false                            # auxiliary flag variable (DO NOT SET THIS)
pg_exists_action: clean
pg_disable_purge: false                       # set to true to disable pg purge functionality for good (force pg_exists_action = abort)

# - storage - #
pg_data: /pg/data                             # postgres data directory
pg_fs_main: /export                           # data disk mount point     /pg -> {{ pg_fs_main }}/postgres/{{ pg_instance }}
pg_fs_bkup: /var/backups                      # backup disk mount point   /pg/* -> {{ pg_fs_bkup }}/postgres/{{ pg_instance }}/*

# - connection - #
pg_listen: '0.0.0.0'                          # postgres listen address, '0.0.0.0' by default (all ipv4 addr)
pg_port: 5432                                 # postgres port (5432 by default)
pg_localhost: /var/run/postgresql             # localhost unix socket dir for connection

# - patroni - #
# patroni_mode, available options: default|pause|remove
#   - default: default ha mode
#   - pause:   into maintenance mode
#   - remove:  remove patroni after bootstrap
patroni_mode: default                         # pause|default|remove
pg_namespace: /pg                             # top level key namespace in dcs
patroni_port: 8008                            # default patroni port
patroni_watchdog_mode: automatic              # watchdog mode: off|automatic|required
pg_conf: tiny.yml                             # user provided patroni config template path

# - localization - #
pg_encoding: UTF8                             # default to UTF8
pg_locale: C                                  # default to C
pg_lc_collate: C                              # default to C
pg_lc_ctype: en_US.UTF8                       # default to en_US.UTF8

# - pgbouncer - #
pgbouncer_port: 6432                          # pgbouncer port (6432 by default)
pgbouncer_poolmode: transaction               # pooling mode: (transaction pooling by default)
pgbouncer_max_db_conn: 100                    # important! do not set this larger than postgres max conn or conn limit

身份参数

名称 类型 层级 说明
pg_cluster string C PG数据库集群名称
pg_seq number I PG数据库实例序号
pg_role enum I PG数据库实例角色
pg_shard string C PG数据库分片集簇名 (可选)
pg_sindex number C PG数据库分片集簇号 (可选)

pg_clusterpg_rolepg_seq 属于 身份参数

除了IP地址外,这三个参数是定义一套新的数据库集群的最小必须参数集,如下面的配置所示。

其他参数都可以继承自全局配置或默认配置,但身份参数必须显式指定手工分配

  • pg_cluster 标识了集群的名称,在集群层面进行配置。
  • pg_role 在实例层面进行配置,标识了实例的角色,只有primary角色会进行特殊处理,如果不填,默认为replica角色,此外,还有特殊的delayedoffline角色。
  • pg_seq 用于在集群内标识实例,通常采用从0或1开始递增的整数,一旦分配不再更改。
  • {{ pg_cluster }}-{{ pg_seq }} 被用于唯一标识实例,即pg_instance
  • {{ pg_cluster }}-{{ pg_role }} 用于标识集群内的服务,即pg_service
pg-test:
  hosts:
    10.10.10.11: {pg_seq: 1, pg_role: replica}
    10.10.10.12: {pg_seq: 2, pg_role: primary}
    10.10.10.13: {pg_seq: 3, pg_role: replica}
  vars:
    pg_cluster: pg-test

参数详解

pg_cluster

PG数据库集群的名称,将用作集群内资源的命名空间。

集群命名需要遵循特定命名规则:[a-z][a-z0-9-]*,以兼容不同约束对身份标识的要求。

身份参数,必填参数,集群级参数

pg_seq

数据库实例的序号,在集群内部唯一,用于区别与标识集群内的不同实例,从0或1开始分配。

身份参数,必填参数,实例级参数

pg_role

数据库实例的角色,默认角色包括:primary, replica

后续可选角色包括:offlinedelayed

身份参数,必填参数,实例级参数

pg_shard

只有分片集群需要设置此参数。

当多个数据库集群以水平分片的方式共同服务于同一个 业务时,Pigsty将这一组集群称为 分片集簇(Sharding Cluster)pg_shard是数据库集群所属分片集簇的名称,一个分片集簇可以指定任意名称,但Pigsty建议采用具有意义的命名规则。

例如参与分片集簇的集群,可以使用 分片集簇名 pg_shard + shard + 集群所属分片编号pg_sindex构成集群名称:

shard:  test
pg-testshard1
pg-testshard2
pg-testshard3
pg-testshard4

身份参数,可选参数,集群级参数

pg_sindex

集群在分片集簇中的编号,通常从0或1开始依次分配。

只有分片集群需要设置此参数。

身份参数,选填参数,集群级参数

pg_hostname

是否将PG实例的名称pg_instance 注册为主机名,默认禁用。

pg_nodename

是否将PG实例的名称注册为Consul中的节点名称,默认启用。

pg_exists

PG实例是否存在的标记位,不可配置。

pg_exists_action

安全保险,当PostgreSQL实例已经存在时,系统应当执行的动作

  • abort: 中止整个剧本的执行(默认行为)
  • clean: 抹除现有实例并继续(极端危险)
  • skip: 忽略存在实例的目标(中止),在其他目标机器上继续执行。

如果您真的需要强制清除已经存在的数据库实例,建议先使用pgsql-rm.yml完成集群与实例的下线与销毁,在重新执行初始化。否则,则需要通过命令行参数-e pg_exists_action=clean完成覆写,强制在初始化过程中抹除已有实例。

pg_disable_purge

双重安全保险,默认为false。如果为true,强制设置pg_exists_action变量为abort

等效于关闭pg_exists_action的清理功能,确保任何情况下Postgres实例都不会被抹除。

这意味着您需要通过专用下线脚本pgsql-rm.yml来完成已有实例的清理,然后才可以在清理干净的节点上重新完成数据库的初始化。

pg_data

默认数据目录,默认为/pg/data

pg_fs_main

主数据盘目录,默认为/export

Pigsty的默认目录结构假设系统中存在一个主数据盘挂载点,用于盛放数据库目录。

pg_fs_bkup

归档与备份盘目录,默认为/var/backups

Pigsty的默认目录结构假设系统中存在一个备份数据盘挂载点,用于盛放备份与归档数据。备份盘并不是必选项,如果系统中不存在备份盘,用户也可以指定一个主数据盘上的子目录作为备份盘根目录挂载点。

pg_listen

数据库监听的IP地址,默认为所有IPv4地址0.0.0.0,如果要包括所有IPv6地址,可以使用*

pg_port

数据库监听的端口,默认端口为5432,不建议修改。

pg_localhost

Unix Socket目录,用于盛放PostgreSQL与Pgbouncer的Unix socket文件。

默认为/var/run/postgresql

pg_upstream

实例级配置项,内容为IP地址或主机名,用于指明流复制上游节点。

当为集群的从库配置该参数时,填入的IP地址必须为集群内的其他节点。实例会从该节点进行流复制,此选项可用于构建级连复制

当为集群的主库配置该参数时,意味着整个集群将以 备份集群(Standby Cluster) 的形式运行,从上游节点接受变更。集群中的primary将扮演standby leader 的角色。

pg_backup

标记,实例级配置项,带有该标记的实例会用于存储基础备份(开源版Pigsty不提供此功能)

pg_delay

若实例为延迟从库,采用的延迟时长。(开源版Pigsty不提供此功能)。

使用PG接受的时间区间字符串格式,如1h30min等。

patroni_mode

Patroni的工作模式:

  • default: 启用Patroni
  • pause: 启用Patroni,但在完成初始化后自动进入维护模式(不自动执行主从切换)
  • remove: 依然使用Patroni初始化集群,但初始化完成后移除Patroni

pg_namespace

Patroni在DCS中使用的KV存储顶层命名空间

默认为pg

patroni_port

Patroni API服务器默认监听的端口

默认端口为8008

patroni_watchdog_mode

当发生主从切换时,Patroni会尝试在提升从库前关闭主库。如果指定超时时间内主库仍未成功关闭,Patroni会根据配置使用Linux内核功能softdog进行fencing关机。

  • off:不使用watchdog
  • automatic:如果内核启用了softdog,则启用watchdog,不强制,默认行为。
  • required:强制使用watchdog,如果系统未启用softdog则拒绝启动。

pg_conf

拉起Postgres集群所用的Patroni模板。Pigsty预制了4种模板

  • oltp.yml 常规OLTP模板,默认配置
  • olap.yml OLAP模板,提高并行度,针对吞吐量优化,针对长时间运行的查询进行优化。
  • crit.yml) 核心业务模板,基于OLTP模板针对安全性,数据完整性进行优化,采用同步复制,强制启用数据校验和。
  • tiny.yml 微型数据库模板,针对低资源场景进行优化,例如运行于虚拟机中的演示数据库集群。

pg_encoding

PostgreSQL实例初始化时,使用的字符集编码。

默认为UTF8,如果没有特殊需求,不建议修改此参数。

pg_locale

PostgreSQL实例初始化时,使用的本地化规则。

默认为C,如果没有特殊需求,不建议修改此参数。

pg_lc_collate

PostgreSQL实例初始化时,使用的本地化字符串排序规则。

默认为C,如果没有特殊需求,强烈不建议修改此参数。用户总是可以通过COLLATE表达式实现本地化排序相关功能,错误的本地化排序规则可能导致某些操作产生成倍的性能损失,请在真的有本地化需求的情况下修改此参数。

pg_lc_ctype

PostgreSQL实例初始化时,使用的本地化字符集定义

默认为en_US.UTF8,因为一些PG扩展(pg_trgm)需要额外的字符分类定义才可以针对国际化字符正常工作,因此Pigsty默认会使用en_US.UTF8字符集定义,不建议修改此参数。

pgbouncer_port

Pgbouncer连接池默认监听的端口

默认为6432

pgbouncer_poolmode

Pgbouncer连接池默认使用的Pool模式

默认为transaction,即事务级连接池。其他可选项包括:session|statemente

pgbouncer_max_db_conn

允许连接池与单个数据库之间建立的最大连接数

默认值为100

使用事务Pooling模式时,活跃服务端连接数通常处于个位数。如果采用会话Pooling,可以适当增大此参数。

6.10 - PG模板

Pigsty中关于定制Postgres模板的相关参数

PG Provision负责拉起一套全新的Postgres集群,而PG Template负责在PG Provision的基础上,在这套全新的数据库集群中创建默认的对象,包括

  • 基本角色:只读角色,读写角色、管理角色
  • 基本用户:复制用户、超级用户、监控用户、管理用户
  • 模板数据库中的默认权限
  • 默认 模式
  • 默认 扩展
  • HBA黑白名单规则

关于定制数据库集群的更多信息,请参考 定制集群

参数概览

名称 类型 层级 说明
pg_init string G/C 自定义PG初始化脚本
pg_replication_username string G PG复制用户
pg_replication_password string G PG复制用户的密码
pg_monitor_username string G PG监控用户
pg_monitor_password string G PG监控用户密码
pg_admin_username string G PG管理用户
pg_admin_password string G PG管理用户密码
pg_default_roles role[] G 默认创建的角色与用户
pg_default_privilegs string[] G 数据库默认权限配置
pg_default_schemas string[] G 默认创建的模式
pg_default_extensions extension[] G 默认安装的扩展
pg_offline_query string I 是否允许离线查询
pg_reload bool A 是否重载数据库配置(HBA)
pg_hba_rules rule[] G 全局HBA规则
pg_hba_rules_extra rule[] C/I 集群/实例特定的HBA规则
pgbouncer_hba_rules rule[] G/C Pgbouncer全局HBA规则
pgbouncer_hba_rules_extra rule[] G/C Pgbounce特定HBA规则
pg_databases database[] C 业务数据库定义
pg_users user[] C 业务用户定义

默认参数

#------------------------------------------------------------------------------
# POSTGRES TEMPLATE
#------------------------------------------------------------------------------
# - template - #
pg_init: pg-init                              # init script for cluster template

# - system roles - #
pg_replication_username: replicator           # system replication user
pg_replication_password: DBUser.Replicator    # system replication password
pg_monitor_username: dbuser_monitor           # system monitor user
pg_monitor_password: DBUser.Monitor           # system monitor password
pg_admin_username: dbuser_admin               # system admin user
pg_admin_password: DBUser.Admin               # system admin password

# - default roles - #
# chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
pg_default_roles:

  # common production readonly user
  - name: dbrole_readonly                 # production read-only roles
    login: false
    comment: role for global readonly access

  # common production read-write user
  - name: dbrole_readwrite                # production read-write roles
    login: false
    roles: [dbrole_readonly]             # read-write includes read-only access
    comment: role for global read-write access

  # offline have same privileges as readonly, but with limited hba access on offline instance only
  # for the purpose of running slow queries, interactive queries and perform ETL tasks
  - name: dbrole_offline
    login: false
    comment: role for restricted read-only access (offline instance)

  # admin have the privileges to issue DDL changes
  - name: dbrole_admin
    login: false
    bypassrls: true
    comment: role for object creation
    roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]

  # dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
  - name: postgres
    superuser: true
    comment: system superuser

  # default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
  - name: replicator
    replication: true
    roles: [pg_monitor, dbrole_readonly]
    comment: system replicator

  # default replication user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
  - name: dbuser_monitor
    connlimit: 16
    comment: system monitor user
    roles: [pg_monitor, dbrole_readonly]

  # default admin user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
  - name: dbuser_admin
    bypassrls: true
    comment: system admin user
    roles: [dbrole_admin]

  # default stats user, for ETL and slow queries
  - name: dbuser_stats
    password: DBUser.Stats
    comment: business offline user for offline queries and ETL
    roles: [dbrole_offline]


# - privileges - #
# object created by dbsu and admin will have their privileges properly set
pg_default_privileges:
  - GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly
  - GRANT SELECT                        ON TABLES    TO dbrole_readonly
  - GRANT SELECT                        ON SEQUENCES TO dbrole_readonly
  - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
  - GRANT USAGE                         ON SCHEMAS   TO dbrole_offline
  - GRANT SELECT                        ON TABLES    TO dbrole_offline
  - GRANT SELECT                        ON SEQUENCES TO dbrole_offline
  - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline
  - GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite
  - GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite
  - GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin
  - GRANT CREATE                        ON SCHEMAS   TO dbrole_admin

# - schemas - #
pg_default_schemas: [monitor]                 # default schemas to be created

# - extension - #
pg_default_extensions:                        # default extensions to be created
  - { name: 'pg_stat_statements',  schema: 'monitor' }
  - { name: 'pgstattuple',         schema: 'monitor' }
  - { name: 'pg_qualstats',        schema: 'monitor' }
  - { name: 'pg_buffercache',      schema: 'monitor' }
  - { name: 'pageinspect',         schema: 'monitor' }
  - { name: 'pg_prewarm',          schema: 'monitor' }
  - { name: 'pg_visibility',       schema: 'monitor' }
  - { name: 'pg_freespacemap',     schema: 'monitor' }
  - { name: 'pg_repack',           schema: 'monitor' }
  - name: postgres_fdw
  - name: file_fdw
  - name: btree_gist
  - name: btree_gin
  - name: pg_trgm
  - name: intagg
  - name: intarray

# - hba - #
pg_offline_query: false                       # set to true to enable offline query on instance
pg_hba_rules:                                 # postgres host-based authentication rules
  - title: allow meta node password access
    role: common
    rules:
      - host    all     all                         10.10.10.10/32      md5

  - title: allow intranet admin password access
    role: common
    rules:
      - host    all     +dbrole_admin               10.0.0.0/8          md5
      - host    all     +dbrole_admin               172.16.0.0/12       md5
      - host    all     +dbrole_admin               192.168.0.0/16      md5

  - title: allow intranet password access
    role: common
    rules:
      - host    all             all                 10.0.0.0/8          md5
      - host    all             all                 172.16.0.0/12       md5
      - host    all             all                 192.168.0.0/16      md5

  - title: allow local read/write (local production user via pgbouncer)
    role: common
    rules:
      - local   all     +dbrole_readonly                                md5
      - host    all     +dbrole_readonly           127.0.0.1/32         md5

  - title: allow offline query (ETL,SAGA,Interactive) on offline instance
    role: offline
    rules:
      - host    all     +dbrole_offline               10.0.0.0/8        md5
      - host    all     +dbrole_offline               172.16.0.0/12     md5
      - host    all     +dbrole_offline               192.168.0.0/16    md5

pg_hba_rules_extra: []                        # extra hba rules (for cluster/instance overwrite)

pgbouncer_hba_rules:                          # pgbouncer host-based authentication rules
  - title: local password access
    role: common
    rules:
      - local  all          all                                     md5
      - host   all          all                     127.0.0.1/32    md5

  - title: intranet password access
    role: common
    rules:
      - host   all          all                     10.0.0.0/8      md5
      - host   all          all                     172.16.0.0/12   md5
      - host   all          all                     192.168.0.0/16  md5

pgbouncer_hba_rules_extra: []                 # extra pgbouncer hba rules (for cluster/instance overwrite)

参数详解

pg_init

用于初始化数据库模板的Shell脚本位置,默认为pg-init,该脚本会被拷贝至/pg/bin/pg-init后执行。

默认的pg-init 只是预渲染SQL命令的包装:

# system default roles
psql postgres -qAXwtf /pg/tmp/pg-init-roles.sql

# system default template
psql template1 -qAXwtf /pg/tmp/pg-init-template.sql

# make postgres same as templated database (optional)
psql postgres  -qAXwtf /pg/tmp/pg-init-template.sql

用户可以在自定义的pg-init脚本中添加自己的集群初始化逻辑。

pg_replication_username

用于执行PostgreSQL流复制的数据库用户名

默认为replicator

pg_replication_password

用于执行PostgreSQL流复制的数据库用户密码,必须使用明文

默认为DBUser.Replicator,强烈建议修改!

pg_monitor_username

用于执行PostgreSQL与Pgbouncer监控任务的数据库用户名

默认为dbuser_monitor

pg_monitor_password

用于执行PostgreSQL与Pgbouncer监控任务的数据库用户密码,必须使用明文

默认为DBUser.Monitor,强烈建议修改!

pg_admin_username

用于执行PostgreSQL数据库管理任务(DDL变更)的数据库用户名

默认为dbuser_admin

pg_admin_password

用于执行PostgreSQL数据库管理任务(DDL变更)的数据库用户密码,必须使用明文

默认为DBUser.Admin,强烈建议修改!

pg_default_roles

定义了PostgreSQL中默认的角色与用户,形式为对象数组,每一个对象定义一个用户或角色。

每一个用户或角色必须指定 name ,其余字段均为可选项。

  • password是可选项,如果留空则不设置密码,可以使用MD5密文密码。

  • login, superuser, createdb, createrole, inherit, replication, bypassrls 都是布尔类型,用于设置用户属性。如果不设置,则采用系统默认值。

  • 用户通过CREATE USER创建,所以默认具有login属性,如果创建的是角色,需要指定login: false

  • expire_atexpire_in用于控制用户过期时间,expire_at使用形如YYYY-mm-DD的日期时间戳。expire_in使用从现在开始的过期天数,如果expire_in存在则会覆盖expire_at选项。

  • 新用户默认不会添加至Pgbouncer用户列表中,必须显式定义pgbouncer: true,该用户才会被加入到Pgbouncer用户列表。

  • 用户/角色会按顺序创建,后面定义的用户可以属于前面定义的角色。

pg_users:
  # complete example of user/role definition for production user
  - name: dbuser_meta               # example production user have read-write access
    password: DBUser.Meta           # example user's password, can be encrypted
    login: true                     # can login, true by default (should be false for role)
    superuser: false                # is superuser? false by default
    createdb: false                 # can create database? false by default
    createrole: false               # can create role? false by default
    inherit: true                   # can this role use inherited privileges?
    replication: false              # can this role do replication? false by default
    bypassrls: false                # can this role bypass row level security? false by default
    connlimit: -1                   # connection limit, -1 disable limit
    expire_at: '2030-12-31'         # 'timestamp' when this role is expired
    expire_in: 365                  # now + n days when this role is expired (OVERWRITE expire_at)
    roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly|dbrole_offline
    pgbouncer: true                 # add this user to pgbouncer? false by default (true for production user)
    parameters:                     # user's default search path
      search_path: public
    comment: test user

Pigsty定义了由四个默认角色与四个默认用户组成的基本访问控制系统,详细信息请参考 访问控制

pg_default_privileges

定义数据库模板中的默认权限。

任何由{{ dbsu」}}{{ pg_admin_username }}创建的对象都会具有以下默认权限:

pg_default_privileges:
  - GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly
  - GRANT SELECT                        ON TABLES    TO dbrole_readonly
  - GRANT SELECT                        ON SEQUENCES TO dbrole_readonly
  - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
  - GRANT USAGE                         ON SCHEMAS   TO dbrole_offline
  - GRANT SELECT                        ON TABLES    TO dbrole_offline
  - GRANT SELECT                        ON SEQUENCES TO dbrole_offline
  - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline
  - GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite
  - GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite
  - GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin
  - GRANT CREATE                        ON SCHEMAS   TO dbrole_admin

详细信息请参考 访问控制

pg_default_schemas

创建于模版数据库的默认模式

Pigsty默认会创建名为monitor的模式用于安装监控扩展。

pg_default_schemas: [monitor]                 # default schemas to be created

pg_default_extensions

默认安装于模板数据库的扩展,对象数组。

如果没有指定schema字段,扩展会根据当前的search_path安装至对应模式中。

pg_default_extensions:
  - { name: 'pg_stat_statements',  schema: 'monitor' }
  - { name: 'pgstattuple',         schema: 'monitor' }
  - { name: 'pg_qualstats',        schema: 'monitor' }
  - { name: 'pg_buffercache',      schema: 'monitor' }
  - { name: 'pageinspect',         schema: 'monitor' }
  - { name: 'pg_prewarm',          schema: 'monitor' }
  - { name: 'pg_visibility',       schema: 'monitor' }
  - { name: 'pg_freespacemap',     schema: 'monitor' }
  - { name: 'pg_repack',           schema: 'monitor' }
  - name: postgres_fdw
  - name: file_fdw
  - name: btree_gist
  - name: btree_gin
  - name: pg_trgm
  - name: intagg
  - name: intarray

pg_offline_query

实例级变量,布尔类型,默认为false

设置为true时,无论当前实例的角色为何,用户组dbrole_offline都可以连接至该实例并执行离线查询。

对于实例数量较少(例如一主一从)的情况较为实用,用户可以将唯一的从库标记为pg_offline_query = true,从而接受ETL,慢查询与交互式访问。详细信息请参考 访问控制-离线用户

pg_reload

命令行参数,布尔类型,默认为true

设置为true时,Pigsty会在生成HBA规则后立刻执行pg_ctl reload应用。

当您希望生成pg_hba.conf文件,并手工比较后再应用生效时,可以指定-e pg_reload=false来禁用它。

pg_hba_rules

设置数据库的客户端IP黑白名单规则。对象数组,每一个对象都代表一条规则。

每一条规则由三部分组成:

  • title,规则标题,会转换为HBA文件中的注释
  • role,应用角色,common代表应用至所有实例,其他取值(如replica, offline)则仅会安装至匹配的角色上。例如role='replica'代表这条规则只会应用到pg_role == 'replica' 的实例上。
  • rules,字符串数组,每一条记录代表一条最终写入pg_hba.conf的规则。

作为一个特例,role == 'offline' 的HBA规则,还会额外安装至 pg_offline_query == true 的实例上。

pg_hba_rules:
  - title: allow meta node password access
    role: common
    rules:
      - host    all     all                         10.10.10.10/32      md5

  - title: allow intranet admin password access
    role: common
    rules:
      - host    all     +dbrole_admin               10.0.0.0/8          md5
      - host    all     +dbrole_admin               172.16.0.0/12       md5
      - host    all     +dbrole_admin               192.168.0.0/16      md5

  - title: allow intranet password access
    role: common
    rules:
      - host    all             all                 10.0.0.0/8          md5
      - host    all             all                 172.16.0.0/12       md5
      - host    all             all                 192.168.0.0/16      md5

  - title: allow local read-write access (local production user via pgbouncer)
    role: common
    rules:
      - local   all     +dbrole_readwrite                               md5
      - host    all     +dbrole_readwrite           127.0.0.1/32        md5

  - title: allow read-only user (stats, personal) password directly access
    role: replica
    rules:
      - local   all     +dbrole_readonly                               md5
      - host    all     +dbrole_readonly           127.0.0.1/32        md5

建议在全局配置统一的pg_hba_rules,针对特定集群使用pg_hba_rules_extra进行额外定制。

pg_hba_rules_extra

pg_hba_rules类似,但通常用于集群层面的HBA规则设置。

pg_hba_rules_extra 会以同样的方式 追加pg_hba.conf中。

如果用户需要彻底覆写集群的HBA规则,即不想继承全局HBA配置,则应当在集群层面配置pg_hba_rules并覆盖全局配置。

pgbouncer_hba_rules

pg_hba_rules类似,用于Pgbouncer的HBA规则设置。

默认的Pgbouncer HBA规则很简单,用户可以按照自己的需求进行定制。

默认的Pgbouncer HBA规则较为宽松:

  1. 允许从本地使用密码登陆
  2. 允许从内网网断使用密码登陆
pgbouncer_hba_rules:
  - title: local password access
    role: common
    rules:
      - local  all          all                                     md5
      - host   all          all                     127.0.0.1/32    md5

  - title: intranet password access
    role: common
    rules:
      - host   all          all                     10.0.0.0/8      md5
      - host   all          all                     172.16.0.0/12   md5
      - host   all          all                     192.168.0.0/16  md5

pgbouncer_hba_rules_extra

pg_hba_rules_extras类似,用于在集群层次对Pgbouncer的HBA规则进行额外配置。

业务模板

以下两个参数属于业务模板,用户应当在这里定义所需的业务用户与业务数据库。

在这里定义的用户与数据库,会在以下两个步骤中完成应用,不仅仅包括数据库中的用户与DB,还有Pgbouncer连接池中的对应配置。

./pgsql.yml --tags=pg_biz_init,pg_biz_pgbouncer

pg_users

通常用于在数据库集群层面定义业务用户,与 pg_default_roles 采用相同的形式。

对象数组,每个对象定义一个业务用户。用户名name字段为必选项,密码可以使用MD5密文密码

用户可以通过roles字段为业务用户添加默认权限组:

  • dbrole_readonly:默认生产只读用户,具有全局只读权限。(只读生产访问)
  • dbrole_offline:默认离线只读用户,在特定实例上具有只读权限。(离线查询,个人账号,ETL)
  • dbrole_readwrite:默认生产读写用户,具有全局CRUD权限。(常规生产使用)
  • dbrole_admin:默认生产管理用户,具有执行DDL变更的权限。(管理员)

应当为生产账号配置 pgbouncer: true,允许其通过连接池访问,普通用户不应当通过连接池访问数据库。

下面是一个创建业务账号的例子:

pg_users:
  # complete example of user/role definition for production user
  - name: dbuser_meta               # example production user have read-write access
    password: DBUser.Meta           # example user's password, can be encrypted
    login: true                     # can login, true by default (should be false for role)
    superuser: false                # is superuser? false by default
    createdb: false                 # can create database? false by default
    createrole: false               # can create role? false by default
    inherit: true                   # can this role use inherited privileges?
    replication: false              # can this role do replication? false by default
    bypassrls: false                # can this role bypass row level security? false by default
    connlimit: -1                   # connection limit, -1 disable limit
    expire_at: '2030-12-31'         # 'timestamp' when this role is expired
    expire_in: 365                  # now + n days when this role is expired (OVERWRITE expire_at)
    roles: [dbrole_readwrite]       # dborole_admin|dbrole_readwrite|dbrole_readonly
    pgbouncer: true                 # add this user to pgbouncer? false by default (true for production user)
    parameters:                     # user's default search path
      search_path: public
    comment: test user

  # simple example for personal user definition
  - name: dbuser_vonng2              # personal user example which only have limited access to offline instance
    password: DBUser.Vonng          # or instance with explict mark `pg_offline_query = true`
    roles: [dbrole_offline]         # personal/stats/ETL user should be grant with dbrole_offline
    expire_in: 365                  # expire in 365 days since creation
    pgbouncer: false                # personal user should NOT be allowed to login with pgbouncer
    comment: example personal user for interactive queries

pg_databases

对象数组,每个对象定义一个业务数据库。每个数据库定义中,数据库名称 name 为必选项,其余均为可选项。

  • name:数据库名称,必选项
  • owner:数据库属主,默认为postgres
  • template:数据库创建时使用的模板,默认为template1
  • encoding:数据库默认字符编码,默认为UTF8,默认与实例保持一致。建议不要配置与修改。
  • locale:数据库默认的本地化规则,默认为C,建议不要配置,与实例保持一致。
  • lc_collate:数据库默认的本地化字符串排序规则,默认与实例设置相同,建议不要修改,必须与模板数据库一致。强烈建议不要配置,或配置为C
  • lc_ctype:数据库默认的LOCALE,默认与实例设置相同,建议不要修改或设置,必须与模板数据库一致。建议配置为C或en_US.UTF8
  • allowconn:是否允许连接至数据库,默认为true,不建议修改。
  • revokeconn:是否回收连接至数据库的权限?默认为false。如果为true,则数据库上的PUBLIC CONNECT权限会被回收。只有默认用户(dbsu|monitor|admin|replicator|owner)可以连接。此外,admin|owner 会拥有GRANT OPTION,可以赋予其他用户连接权限。
  • tablespace:数据库关联的表空间,默认为pg_default
  • connlimit:数据库连接数限制,默认为-1,即没有限制。
  • extensions:对象数组 ,每一个对象定义了一个数据库中的扩展,以及其安装的模式
  • parameters:KV对象,每一个KV定义了一个需要针对数据库通过ALTER DATABASE修改的参数。
  • pgbouncer:布尔选项,是否将该数据库加入到Pgbouncer中。所有数据库都会加入至Pgbouncer,除非显式指定pgbouncer: false
  • comment:数据库备注信息。
pg_databases:
  - name: meta                      # name is the only required field for a database
    owner: postgres                 # optional, database owner
    template: template1             # optional, template1 by default
    encoding: UTF8                  # optional, UTF8 by default
    locale: C                       # optional, C by default
    lc_collate: C                   # optional, C by default , must same as template database, leave blank to set to db default
    lc_ctype: C                     # optional, C by default , must same as template database, leave blank to set to db default
    allowconn: true                 # optional, true by default, false disable connect at all
    revokeconn: false               # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
    tablespace: pg_default          # optional, 'pg_default' is the default tablespace
    connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
    extensions:                     # optional, extension name and where to create
      - {name: postgis, schema: public}
    parameters:                     # optional, extra parameters with ALTER DATABASE
      enable_partitionwise_join: true
    pgbouncer: true                 # optional, add this database to pgbouncer list? true by default
    comment: pigsty meta database   # optional, comment string for database

6.11 - 监控系统

Pigsty中与监控系统有关的参数

Pigsty的监控系统包含两个组件:Node Exporter , PG Exporter

Node Exporter用于暴露机器节点的监控指标,PG Exporter用于拉取数据库与Pgbouncer连接池的监控指标;此外,Haproxy将直接通过管理端口对外暴露监控指标。

默认情况下,所有监控Exporter都会被注册至Consul,Prometheus会通过服务发现的方式管理这些任务。但用户可以通过配置 prometheus_sd_methodstatic 改用静态服务发现,通过配置文件的方式管理所有Exporter。监控已有数据库实例时,建议采用这种方式。

Promtail用于收集Postgres,Patroni,Pgbouncer日志,目前处于beta状态,是可选的额外安装组件。

参数概览

名称 类型 层级 说明
exporter_install enum G/C 安装监控组件的方式
exporter_repo_url string G/C 监控组件的YumRepo
exporter_metrics_path string G/C 监控暴露的URL Path
node_exporter_enabled bool G/C 启用节点指标收集器
node_exporter_port number G/C 节点指标暴露端口
node_exporter_options string G/C 节点指标采集选项
pg_exporter_config string G/C PG指标定义文件
pg_exporter_enabled bool G/C 启用PG指标收集器
pg_exporter_port number G/C PG指标暴露端口
pg_exporter_url string G/C 采集对象数据库的连接串(覆盖)
pgbouncer_exporter_enabled bool G/C 启用PGB指标收集器
pgbouncer_exporter_port number G/C PGB指标暴露端口
pgbouncer_exporter_url string G/C 采集对象连接池的连接串
promtail_enabled bool G/C 是否启用Promtail日志收集服务?
promtail_clean bool G/C/A 是否在安装promtail时移除已有状态信息?
promtail_port number G/C promtail使用的默认端口
promtail_status_path string G/C 保存Promtail状态信息的文件位置
promtail_send_url string G/C 用于接收日志的loki服务endpoint

默认参数

#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
exporter_install: none                        # none|yum|binary, none by default
exporter_repo_url: ''                         # if set, repo will be added to /etc/yum.repos.d/ before yum installation

# - collect - #
exporter_metrics_path: /metrics               # default metric path for pg related exporter

# - node exporter - #
node_exporter_enabled: true                   # setup node_exporter on instance
node_exporter_port: 9100                      # default port for node exporter
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'

# - pg exporter - #
pg_exporter_config: pg_exporter-demo.yaml     # default config files for pg_exporter
pg_exporter_enabled: true                     # setup pg_exporter on instance
pg_exporter_port: 9630                        # default port for pg exporter
pg_exporter_url: ''                           # optional, if not set, generate from reference parameters

# - pgbouncer exporter - #
pgbouncer_exporter_enabled: true              # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
pgbouncer_exporter_port: 9631                 # default port for pgbouncer exporter
pgbouncer_exporter_url: ''                    # optional, if not set, generate from reference parameters

# - promtail - #                              # promtail is a beta feature which requires manual deployment
promtail_enabled: true                        # enable promtail logging collector?
promtail_clean: false                         # remove promtail status file? false by default
promtail_port: 9080                           # default listen address for promtail
promtail_status_file: /tmp/promtail-status.yml
promtail_send_url: http://10.10.10.10:3100/loki/api/v1/push  # loki url to receive logs

参数详解

exporter_install

指明安装Exporter的方式:

  • none:不安装,(默认行为,Exporter已经在先前由 node.pkgs 任务完成安装)
  • yum:使用yum安装(如果启用yum安装,在部署Exporter前执行yum安装 node_exporterpg_exporter
  • binary:使用拷贝二进制的方式安装(从files中直接拷贝node_exporterpg_exporter 二进制)

使用yum安装时,如果指定了exporter_repo_url(不为空),在执行安装时会首先将该URL下的REPO文件安装至/etc/yum.repos.d中。这一功能可以在不执行节点基础设施初始化的环境下直接进行Exporter的安装。

使用binary安装时,用户需要确保已经将 node_exporterpg_exporter 的Linux二进制程序放置在files目录中。

<meta>:<pigsty>/files/node_exporter ->  <target>:/usr/bin/node_exporter
<meta>:<pigsty>/files/pg_exporter   ->  <target>:/usr/bin/pg_exporter

exporter_binary_install(弃用)

该参数已被expoter_install 参数覆盖

是否采用复制二进制文件的方式安装Node Exporter与PG Exporter,默认为false

该选项主要用于集成外部供给方案时,减少对原有系统的工作假设。启用该选项将直接将Linux二进制文件复制至目标机器。

<meta>:<pigsty>/files/node_exporter ->  <target>:/usr/bin/node_exporter
<meta>:<pigsty>/files/pg_exporter   ->  <target>:/usr/bin/pg_exporter

用户需要通过files/download-exporter.sh从Github下载Linux二进制程序至files目录,方可启用该选项。

exporter_metrics_path

所有Exporter对外暴露指标的URL PATH,默认为/metrics

该变量被外部角色prometheus引用,Prometheus会根据这里的配置,针对job = pg的监控对象应用此配置。

node_exporter_enabled

是否安装并配置node_exporter,默认为true

node_exporter_port

node_exporter监听的端口

默认端口9100

node_exporter_options

node_exporter 使用的额外命令行选项。

该选项主要用于定制 node_exporter 启用的指标收集器,Node Exporter支持的收集器列表可以参考:Node Exporter Collectors

该选项的默认值为:

node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'

pg_exporter_config

pg_exporter使用的默认配置文件,定义了Pigsty中的指标。

Pigsty默认提供了两个配置文件:

  • pg_exporter-demo.yaml 用于沙箱演示环境,缓存TTL更低(1s),监控实时性更好,但性能冲击更大。

  • pg_exporter.yaml,用于生产环境,有着正常的缓存TTL(10s),显著降低多个Prometheus同时抓取的负载。

如果用户采用了不同的Prometheus架构,建议对pg_exporter的配置文件进行检查与调整。

Pigsty使用的PG Exporter配置文件默认从PostgreSQL 10.0 开始提供支持,目前支持至最新的PG 13版本

pg_exporter_enabled

是否安装并配置pg_exporter,默认为true

pg_exporter_url

PG Exporter用于连接至数据库的PGURL

可选参数,默认为空字符串。

Pigsty默认使用以下规则生成监控的目标URL,如果配置了pg_exporter_url选项,则会直接使用该URL作为连接串。

PG_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pg_port }}/{{ pg_default_database }}?host={{ pg_localhost }}&sslmode=disable'

该选项以环境变量的方式配置于 /etc/default/pg_exporter 中。

pgbouncer_exporter_enabled

是否安装并配置pgbouncer_exporter,默认为true

pg_exporter_port

pg_exporter监听的端口

默认端口9630

pgbouncer_exporter_port

pgbouncer_exporter监听的端口

默认端口9631

pgbouncer_exporter_url

PGBouncer Exporter用于连接至数据库的URL

可选参数,默认为空字符串。

Pigsty默认使用以下规则生成监控的目标URL,如果配置了pgbouncer_exporter_url选项,则会直接使用该URL作为连接串。

PG_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pgbouncer_port }}/pgbouncer?host={{ pg_localhost }}&sslmode=disable'

该选项以环境变量的方式配置于 /etc/default/pgbouncer_exporter 中。

promtail_enabled

布尔类型,全局|集群变量,是否启用Promtail日志收集服务?默认启用。 但需要注意Loki与Promtail目前属于额外选装模块,不会在pgsql.yml的Monitor部分安装,目前只会在pgsql-promtail.yml 剧本中使用。

promtail_clean

布尔类型,命令行参数。

是否在安装promtail时移除已有状态信息?状态文件记录在promtail_status_file 中,记录了所有日志的消费偏移量,默认不会清理。

promtail_port

promtail使用的默认端口,默认为9080

promtail_status_file

字符串类型,集群|全局变量,内容为保存Promtail状态信息的文件位置,默认为 /tmp/promtail-status.yml

promtail_send_url

HTTP URL,用于接收日志的loki服务endpoint

6.12 - 服务供给

Pigsty中关于流量代理与负载均衡相关的参数

参数概览

名称 类型 层级 说明
pg_weight number I 实例在负载均衡中的相对权重
pg_services service[] G 全局通用服务定义
pg_services_extra service[] C 集群专有服务定义
haproxy_enabled bool G/C/I 是否启用Haproxy
haproxy_reload bool A 是否重载Haproxy配置
haproxy_admin_auth_enabled bool G/C 是否对Haproxy管理界面启用认证
haproxy_admin_username string G/C HAproxy管理员名称
haproxy_admin_password string G/C HAproxy管理员密码
haproxy_exporter_port number G/C HAproxy指标暴露器端口
haproxy_client_timeout interval G/C HAproxy客户端超时
haproxy_server_timeout interval G/C HAproxy服务端超时
vip_mode enum G/C VIP模式:none
vip_reload bool G/C 是否重载VIP配置
vip_address string G/C 集群使用的VIP地址
vip_cidrmask number G/C VIP地址的网络CIDR掩码
vip_interface string G/C VIP使用的网卡

默认参数

#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100              # default load balance weight (instance level)

# - service - #
pg_services:                                  # how to expose postgres service in cluster?
  # primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
  - name: primary           # service name {{ pg_cluster }}_primary
    src_ip: "*"
    src_port: 5433
    dst_port: pgbouncer     # 5433 route to pgbouncer
    check_url: /primary     # primary health check, success when instance is primary
    selector: "[]"          # select all instance as primary service candidate

  # replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
  - name: replica           # service name {{ pg_cluster }}_replica
    src_ip: "*"
    src_port: 5434
    dst_port: pgbouncer
    check_url: /read-only   # read-only health check. (including primary)
    selector: "[]"          # select all instance as replica service candidate
    selector_backup: "[? pg_role == `primary`]"   # primary are used as backup server in replica service

  # default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
  - name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
    src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
    src_port: 5436          # bind port, mandatory
    dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
    check_method: http      # health check method: only http is available for now
    check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
    check_url: /primary     # health check url path, / as default
    check_code: 200         # health check http code, 200 as default
    selector: "[]"          # instance selector
    haproxy:                # haproxy specific fields
      maxconn: 3000         # default front-end connection
      balance: roundrobin   # load balance algorithm (roundrobin by default)
      default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

  # offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
  - name: offline           # service name {{ pg_cluster }}_replica
    src_ip: "*"
    src_port: 5438
    dst_port: postgres
    check_url: /replica     # offline MUST be a replica
    selector: "[? pg_role == `offline` || pg_offline_query ]"         # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
    selector_backup: "[? pg_role == `replica` && !pg_offline_query]"  # replica are used as backup server in offline service

pg_services_extra: []        # extra services to be added

# - haproxy - #
haproxy_enabled: true                         # enable haproxy among every cluster members
haproxy_reload: true                          # reload haproxy after config
haproxy_admin_auth_enabled: false             # enable authentication for haproxy admin?
haproxy_admin_username: admin                 # default haproxy admin username
haproxy_admin_password: admin                 # default haproxy admin password
haproxy_exporter_port: 9101                   # default admin/exporter port
haproxy_client_timeout: 3h                    # client side connection timeout
haproxy_server_timeout: 3h                    # server side connection timeout

# - vip - #
vip_mode: none                                # none | l2 | l4
vip_reload: true                ·              # whether reload service after config
# vip_address: 127.0.0.1                      # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24                            # virtual ip address cidr mask (l2 only)
# vip_interface: eth0                         # virtual ip network interface (l2 only)

参数详解

pg_weight

当执行负载均衡时,数据库实例的相对权重。默认为100

pg_services

由服务定义对象构成的数组,定义了每一个数据库集群中对外暴露的服务。

每一个集群都可以定义多个服务,每个服务包含任意数量的集群成员,服务通过端口进行区分。

每一个服务的定义结构如下例所示:

- name: default           # service's actual name is {{ pg_cluster }}-{{ service.name }}
  src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
  src_port: 5436          # bind port, mandatory
  dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
  check_method: http      # health check method: only http is available for now
  check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
  check_url: /primary     # health check url path, / as default
  check_code: 200         # health check http code, 200 as default
  selector: "[]"          # instance selector
  haproxy:                # haproxy specific fields
    maxconn: 3000         # default front-end connection
    balance: roundrobin   # load balance algorithm (roundrobin by default)
    default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

必选项目

  • 名称(service.name

    服务名称,服务的完整名称以数据库集群名为前缀,以service.name为后缀,通过-连接。例如在pg-test集群中name=primary的服务,其完整服务名称为pg-test-primary

  • 端口(service.port

    在Pigsty中,服务默认采用NodePort的形式对外暴露,因此暴露端口为必选项。但如果使用外部负载均衡服务接入方案,您也可以通过其他的方式区分服务。

  • 选择器(service.selector

    选择器指定了服务的实例成员,采用JMESPath的形式,从所有集群实例成员中筛选变量。默认的[]选择器会选取所有的集群成员。

可选项目

  • 备份选择器(service.selector

    可选的 备份选择器service.selector_backup会选择或标记用于服务备份的实例列表,即集群中所有其他成员失效时,备份实例才接管服务。例如可以将primary实例加入replica服务的备选集中,当所有从库失效后主库依然可以承载集群的只读流量。

  • 源端IP(service.src_ip

    表示服务对外使用的IP地址,默认为*,即本机所有IP地址。使用vip则会使用vip_address变量取值,或者也可以填入网卡支持的特定IP地址。

  • 宿端口(service.dst_port

    服务的流量将指向目标实例上的哪个端口?postgres 会指向数据库监听的端口,pgbouncer会指向连接池所监听的端口,也可以填入固定的端口号。

  • 健康检查方式(service.check_method:

    服务如何检查实例的健康状态?目前仅支持HTTP

  • 健康检查端口(service.check_port:

    服务检查实例的哪个端口获取实例的健康状态? patroni会从Patroni(默认8008)获取,pg_exporter会从PG Exporter(默认9630)获取,用户也可以填入自定义的端口号。

  • 健康检查路径(service.check_url:

    服务执行HTTP检查时,使用的URL PATH。默认会使用/作为健康检查,PG Exporter与Patroni提供了多样的健康检查方式,可以用于主从流量区分。例如,/primary仅会对主库返回成功,/replica仅会对从库返回成功。/read-only则会对任何支持只读的实例(包括主库)返回成功。

  • 健康检查代码(service.check_code:

    HTTP健康检查所期待的代码,默认为200

  • Haproxy特定配置(service.haproxy

    关于服务供应软件(HAproxy)的专有配置项

pg_services_extra

由服务定义对象构成的数组,在集群层面定义,追加至全局的服务定义中。

如果用户希望为某一个数据库集群创建特殊的服务,例如单独为某一套带有延迟从库的集群创建特殊的服务,则可以使用本配置项。

haproxy_enabled

是否启用Haproxy组件

Pigsty默认会在所有数据库节点上部署Haproxy,您可以通过覆盖实例级变量,仅在特定实例/节点上启用Haproxy负载均衡器。

haproxy_admin_auth_enabled

是否启用为Haproxy管理界面启用基本认证

默认不启用,建议在生产环境启用,或在Nginx或其他接入层添加访问控制。

haproxy_admin_username

启用Haproxy管理界面认证默认用户名,默认为admin

haproxy_admin_password

启用Haproxy管理界面认证默认密码,默认为admin

haproxy_client_timeout

Haproxy客户端连接超时,默认为3小时

haproxy_server_timeout

Haproxy服务端连接超时,默认为3小时

haproxy_exporter_port

Haproxy管理界面与监控指标暴露端点所监听的端口。

默认端口为9101

vip_mode

VIP的模式,枚举类型,可选值包括:

  • none:不设置VIP
  • l2:配置绑定在主库上的二层VIP(需要所有成员位于同一个二层网络广播域中)
  • l4 :通过外部L4负载均衡器进行流量分发。(未纳入Pigsty当前实现中)

VIP用于确保读写服务负载均衡器的高可用,当使用L2 VIP时,Pigsty的VIP由vip-manager托管,会绑定在集群主库上。

这意味着您始终可以通过VIP访问集群主库,或者通过VIP访问主库上的负载均衡器(如果主库的压力很大,这样做可能会有性能压力)。

注意,您必须保证VIP候选实例处于同一个二层网络(VLAN、交换机)下。

vip_address

VIP地址,可用于L2或L4 VIP。

vip_address没有默认值,用户必须为每一个集群显式指定并分配VIP地址

vip_cidrmask

VIP的CIDR网络长度,仅当使用L2 VIP时需要。

vip_cidrmask没有默认值,用户必须为每一个集群显式指定VIP的网络CIDR。

vip_interface

VIP网卡名称,仅当使用L2 VIP时需要。

默认为eth0,用户必须为每一个集群/实例指明VIP使用的网卡名称。

过时参数

这些参数现在定义于服务中,不再使用。

haproxy_policy

haproxy负载均衡所使用的算法,可选策略为roundrobinleastconn

默认为roundrobin

haproxy_check_port

Haproxy对后端PostgreSQL进程执行健康检查的端口。

默认端口为8008,即Patroni的端口。

其他的选项包括9630,即使用pg_exporter作为健康检查的端口。

haproxy_primary_port

Haproxy中集群读写服务默认端口,所有链接至该端口的客户端链接都会被转发至主实例的对应端口。

默认读写服务的端口为5433

haproxy_replica_port

Haproxy中集群只读服务默认端口,所有链接至该端口的客户端链接都会被转发至主从例的对应端口。

默认读写服务的端口为5434

haproxy_backend_port

Haproxy将客户端连接转发至后端的对应端口,可选:5432/6432

默认为6432,即Haproxy会将流量转发至6432连接池端口,修改为5432表示直接将流量转发至数据库。

haproxy_weight

Haproxy进行负载均衡时的标准权重,默认为100,建议在实例层次进行覆盖。

haproxy_weight_fallback

用于控制主库承载只读流量的权重。

如果haproxy_weight_fallback为0,主库不会承担任何只读流量(发送至haproxy_replica_port)。

如果haproxy_weight_fallback为1(或更高的值时),在集群正常工作时,主库会在从库服务集中承担 1/总权重 的微小流量,而当从库集中所有的只读实例故障时,只读流量可以漂移至主库承载。

该配置对于一主一从的情况非常实用,如果您有多台从库,建议将其配置为0。

7 - 任务

高可用演练,数据库试用,一些可以在Pigsty中探索的任务

在配置完Pigsty后,您可以用它做一些有趣的探索与实验。

7.1 - 基于逻辑复制的数据库迁移

本文将基于Pigsty沙箱环境,用实例演示基于PostgreSQL逻辑复制的数据库迁移。

本文基于Pigsty沙箱中的实例,介绍基于逻辑复制进行主从切换与数据库迁移的原理,细节与注意事项。

逻辑复制相关基础知识可参考 Postgres逻辑复制详解 一文。

0 逻辑复制迁移

逻辑复制通常可用于跨大版本跨操作系统在线升级PostgreSQL,例如从PG 10到PG 13,从Windows到Linux。

0.1 逻辑迁移的优点

相比原地pg_upgrade升级与pg_dump升级,逻辑复制的好处有:

  • 在线:迁移可以在线进行,不需要或者只需要极小的停机窗口。
  • 灵活:目标库的结构可以与源库不同,例如普通表改为分区表,加列等。可以跨越大版本使用。
  • 安全:相比物理复制,目标库是可写的,因此在最终切换前,可以随意进行测试并重建。
  • 快速:停机窗口很短,可以控制在秒级到分钟级。

0.2 逻辑迁移的局限性

逻辑复制的局限性主要在于设置相对繁琐,初始时刻拷贝数据较物理复制更慢,对于单实例多DB的情况需要迁移多次。大对象序列号需要在迁移时手动同步。

  • 不能复制DDL变更

  • 不能复制序列号(Sequence)

  • 如果逻辑从库上某张被外键引用的表被Truncate,但因为引用该表的表不在订阅集中(所以无法在不truncate该表的情况下继续,但在订阅集之外的表上执行truncate违反语义),那么就会出现冲突。

  • 大对象无法复制。

  • 只支持普通表的复制,包括分区表。不支持视图,物化视图,外部表。

总体来说都,属于可以解决或可以容忍的问题。

0.3 逻辑迁移的基本流程

整体上讲,基于逻辑复制的迁移遵循以下步骤:

其中准备工作与存量迁移部分耗时较长,但不需要停机,不会对生产业务产生影响。

切换时刻需要短暂的停机窗口,采用自动化的脚本可以将停机时间控制在秒级到分钟级

下面将基于Pigsty沙箱介绍这些步骤涉及到的具体细节

1 准备工作

1.1 准备源宿集群

在进行迁移之前,首先要确定迁移的源端集群与目标集群配置正确。

Pigsty标准沙箱由四个节点与两套数据库集群构成。

两套数据库集群pg-metapg-test将分别作为逻辑复制的源端(SRC)宿端(DST)

本例将pg-meta-1作为发布者,pg-test-1作为订阅者,将pgbench相关表从pg-meta迁移至pg-test

1.1.1 用户

迁移通常需要在原宿两端拥有两个用户,分别用于管理复制

CREATE USER dbuser_admin SUPERUSER;              -- 超级用户用于创建发布与订阅
CREATE USER replicator REPLICATION BYPASSRLS;    -- 复制用户用于订阅变更

1.1.2 HBA规则

同时,还需要配置相应的HBA规则,允许复制用户在原宿集群间相互访问

此外,迁移通常会从中控机发起,应当允许管理用户从中控机访问原/宿集群

因为创建订阅需要超级用户权限,建议为管理用户(永久或临时)配置SUPERUSER权限。

1.1.3 配置项

必选的配置项是wal_level,您必须在源端将wal_level配置为logical,方能启用逻辑复制。

其他一些关于复制的相关参数也需要合理配置,但除了wal_level外的参数默认值都不会影响逻辑复制正常工作,均为可选

推荐在源端与宿端使用相同的配置项,下面是在64核机器上,一些相关配置的参考值:

wal_level: logical                      # MANDATORY!	
max_worker_processes: 64                # default 8 -> 64, set to CPU CORE 64
max_parallel_workers: 32                # default 8 -> 32, limit by max_worker_processes
max_parallel_maintenance_workers: 16    # default 2 -> 16, limit by parallel worker
max_parallel_workers_per_gather: 0      # default 2 -> 0,  disable parallel query on OLTP instance
# max_parallel_workers_per_gather: 16   # default 2 -> 16, enable parallel query on OLAP instance

max_wal_senders: 24                     # 10 -> 24
max_replication_slots: 16               # 10 -> 16 
max_logical_replication_workers: 8      # 4 -> 8, 6 sync worker + 1~2 apply worker
max_sync_workers_per_subscription: 6    # 2 -> 6, 6 sync worker

对于数据库来说,通常还需要关注数据库的 编码(encoding)与 本地化 (locale)配置项是否正确,通常建议统一使用C.UTF8

1.1.4 连接信息

为了执行管理命令,您需要通过连接串访问原/宿集群的主库。

建议不要在连接串中使用明文密码,密码可以通过~/.pgpass~/.pg_service,环境变量等方式管理,下面使用时将不会列出密码。

PGSRC='postgres://dbuser_admin@10.10.10.10/meta'        # 源端发布者 (SU)
PGDST='postgres://dbuser_admin@10.10.10.11/test'        # 宿端订阅者 (SU)

建议在中控机/元节点上执行迁移命令,并在操作过程中保持上面两个变量生效。

1.2 确定迁移对象

相比于物理复制,逻辑复制允许用户对复制的内容与过程施加更为精细的控制。您可以选择数据库内容的一个子集进行复制。不过在这个例子中,我们将进行整库复制

在本例中,我们采用pgbench提供的用例作为迁移标的。因此可以在源端集群使用pgbench初始化相关表项。

pgbench -is64 ${PGSRC}

此外,考虑到测试的覆盖范围,我们还将创建一张额外的测试数据表(用于测试Sequence的迁移)

psql ${PGSRC} -qAXtw <<-EOF
DROP TABLE IF EXISTS pgbench_extras;
CREATE TABLE IF NOT EXISTS pgbench_extras
  (id BIGSERIAL PRIMARY KEY,v  TIMESTAMP NOT NULL UNIQUE);
EOF

要注意,只有 基本表 (包括分区表)可以参与逻辑复制,其他类型的对象,包括 视图,物化视图,外部表,索引,序列号都无法加入到逻辑复制中。使用以下查询,可以列出当前数据库中可以加入逻辑复制的表的完全限定名。

SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')

在准备阶段,您需要筛选出希望进行复制的表。在存量迁移中将这些表的结构定义同步至宿集群中,并建立在这些表上的逻辑复制。

1.3 修复复制标识

并不是所有的表都可以直接纳入逻辑复制中并正常工作。在进行迁移前,您需要对所有待迁移的表进行检查,确认它们都已经正确配置了复制标识

复制身份模式\表上的约束 主键(p) 非空唯一索引(u) 两者皆无(n)
default 有效 x x
index x 有效 x
full 低效 低效 低效
nothing x x x
  • 如果表上有主键,则会默认使用 REPLICA IDENTITY default,这是最好的,不用进行任何修改。

  • 如果表上没有主键,有条件的话请创建一个,没有条件的话,一个建立在非空列集上的唯一索引也可以起到同样的作用。在这种情况下需要显式的为表配置REPLICA IDENTITY USING <tbl_unique_key_idx_name>

  • 如果表上既没有主键,也没有唯一索引,那么您可以为表配置REPLICA IDENTITY FULL,将完整的一行作为复制标识。

    使用FULL身份标识的性能非常差,发布侧和订阅侧的删改操作都会导致顺序扫表,建议只将其作为保底手段使用。

    另一种选择是为表配置REPLICA IDENTITY NOTHING,这样任何在发布端对此表进行UPDATE|DELETE操作都会直接报错中止。

使用以下查询,可以列出所有表的完全限定名,复制标识配置,以及表上是否有主键或唯一索引,

SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name, con.ri AS keys,
       CASE relreplident WHEN 'd' THEN 'default' WHEN 'n' THEN 'nothing' WHEN 'f' THEN 'full' WHEN 'i' THEN 'index' END AS identity
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid, LATERAL (SELECT array_agg(contype) AS ri FROM pg_constraint WHERE conrelid = c.oid) con
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')
ORDER BY 2,3;

以1.2的测试场景为例:

          name           | keys  | identity
-------------------------+-------+----------
 public.spatial_ref_sys  | {c,p} | default
 public.pgbench_accounts | {p}   | default
 public.pgbench_branches | {p}   | default
 public.pgbench_tellers  | {p}   | default
 public.pgbench_extras   | {p,u} | default
 public.pgbench_history  | NULL  | default

如果表上只有唯一索引,例如您需要检查该唯一索引是否满足要求:所有列都为非空,not deferrablenot partial,如果满足,则可以使用以下命令将表的复制身份修改为index模式。

-- 一个例子:即使pgbench_extras上有主键,但也可以使用唯一索引作为身份标识
ALTER TABLE pgbench_extras REPLICA IDENTITY USING INDEX pgbench_extras_v_key;

如果表上没有主键,也没有唯一约束。如上面的pgbench_history表,那就需要通过以下命令将其复制身份设置为FULL|NOTHING

ALTER TABLE pgbench_history REPLICA IDENTITY FULL;

完成修复后,所有表都应当具有合适的复制身份

          name           | keys  | identity
-------------------------+-------+----------
 public.spatial_ref_sys  | {c,p} | default
 public.pgbench_accounts | {p}   | default
 public.pgbench_branches | {p}   | default
 public.pgbench_tellers  | {p}   | default
 public.pgbench_extras   | {p,u} | index
 public.pgbench_history  | NULL  | full

2 存量迁移

2.1 同步数据库模式

2.1.1 转储

使用以下命令转储所有对象定义,并复制到宿端应用。

pg_dump ${PGSRC} --schema-only -n public | psql ${PGDST}

可以通过pg_dump-n-t参数进行灵活控制,只转储所需的对象。例如,如果只需要public模式下pgbench的相关表,则可以通过以下命令转储:

pg_dump ${PGSRC} --schema-only -n public -t 'pgbench_*' | psql ${PGDST}

2.1.2 校验

同步完成后,通常需要进行模式校验。

  • 所有目标表及其索引、序列号是否已经建立
  • 函数、类型、模式、用户、权限是否均符合预期?

数据库模式需要根据用户自己的需求进行同步与校验,没有什么通用的方式。

2.2 在源端创建发布

源端集群主库作为发布者,需要创建发布,将所需的表加入到发布集中。

2.2.1 创建发布的方式

创建发布的语法如下所示:

CREATE PUBLICATION name
    [ FOR TABLE [ ONLY ] table_name [ * ] [, ...]
      | FOR ALL TABLES ]
    [ WITH ( publication_parameter [= value] [, ... ] ) ]

针对所有表创建发布(需要超级用户权限):

CREATE PUBLICATION "pg_meta_pub" FOR ALL TABLES;

注意无论是发布还是订阅,名称都建议遵循PostgreSQL对象标识符命名规则([a-z][0-9a-z_]+),特别是不要在名称中使用-。以免不必要的麻烦,例如创建订阅同名的复制槽因命名不规范而失败。

如果需要控制订阅的事件类型(不常见),可以通过参数publish指定,默认为insert, update, delete, truncate

如果源端上有分区表,有一个参数可以用于控制其复制行为。把分区表当成一张表(使用分区根表的复制标识),还是当成多张子表(使用子表上的复制标识)来处理。启用这个选项可以把分区表在逻辑上看成一张表(分区根表),而不是一系列的分区子表,所以订阅端只需要存在一张分区根表的同名表即可正常复制,这是13版本引入的新选项。该选项默认为false,也就是说逻辑复制分区表时,源端的每一个分区都必须在订阅端存在。

额外的参数可以通过以下的形式传入:

CREATE PUBLICATION "pg_meta_pub" FOR ALL TABLES 
WITH(publish = 'insert', publish_via_partition_root = true);

2.2.2 发布的内容

如果不希望发布所有的表,则可以在发布中具体指定所需的表名称。

例如在这个例子中spatial_ref_sys是一张postgis扩展使用的常量表,并不需要迁移,我们可以将其排除。利用以下SQL,可以直接在数据库中拼接出创建发布的SQL命令:

SELECT E'CREATE PUBLICATION pg_meta_pub FOR TABLE\n' ||
       string_agg(quote_ident(nspname) || '.' || quote_ident(relname), E',\n') || ';' AS sql
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')
AND relname ~ 'pgbench'; -- 只复制表名形如 pgbench* 的表
\gexec    -- 在psql中执行上面命令生成的SQL语句

在这个例子中,实际生成并执行的命令如下:

psql ${PGSRC} -Xtw <<-EOF
  CREATE PUBLICATION pg_meta_pub FOR TABLE
    public.pgbench_accounts,
    public.pgbench_branches,
    public.pgbench_tellers,
    public.pgbench_history,
    public.pgbench_extras;
EOF

2.2.3 确认发布状态

建立完发布后,可以从 pg_publication 视图看到所创建的发布。

$ psql ${PGSRC} -Xxwc 'table pg_publication;'
-[ RECORD 1 ]+------------
oid          | 24679
pubname      | pg_meta_pub
pubowner     | 10
puballtables | f
pubinsert    | t
pubupdate    | t
pubdelete    | t
pubtruncate  | t
pubviaroot   | f

可以从pg_publication_tables确认纳入到发布中的表有哪些。

$ psql ${PGSRC} -Xwc 'table pg_publication_tables;'
   pubname   | schemaname |    tablename
-------------+------------+------------------
 pg_meta_pub | public     | pgbench_history
 pg_meta_pub | public     | pgbench_tellers
 pg_meta_pub | public     | pgbench_accounts
 pg_meta_pub | public     | pgbench_branches
 pg_meta_pub | public     | pgbench_extras

确认无误后,发布端的工作完成。接下来要在宿端集群主库上创建订阅,订阅源端集群主库上的这个发布

2.3 在宿端创建订阅

宿端集群主库作为订阅者,需要创建订阅,从发布者上订阅所需的变更。

2.3.1 创建订阅

创建订阅需要SUPERUSER权限,创建订阅的语法如下所示:

CREATE SUBSCRIPTION subscription_name
    CONNECTION 'conninfo'
    PUBLICATION publication_name [, ...]
    [ WITH ( subscription_parameter [= value] [, ... ] ) ]

创建订阅必须使用CONNECTION子句指定发布者的连接信息,通过PUBLICATION子句指定发布名称。这里使用replicator用户连接发布者,该用户的密码已经写入宿端实例下~/.pgpass,因此这里可以在连接串中省去。

创建订阅还有一些其他的参数,通常只有手动管理复制槽时才需要修改这些参数:

  • copy_data,默认为true,当复制开始时,是否要复制全量数据。
  • create_slot,默认为true,该订阅是否会在发布实例上创建复制槽。
  • enabled,默认为true,是否立即开始订阅。
  • connect,默认为true,是否连接至订阅实例,如果不连接,上面几个选项都会被重置为false

这里,创建订阅的实际命令为:

psql ${PGDST} -Xtw <<-EOF
    CREATE SUBSCRIPTION "pg_test_sub" 
      CONNECTION 'host=10.10.10.10 user=replicator dbname=meta' 
      PUBLICATION "pg_meta_pub";
EOF

2.3.2 订阅状态确认

成功创建订阅后,可以从 pg_subscription 视图看到所创建的发布。

$ psql ${PGDST} -Xxwc 'TABLE pg_subscription;'
-[ RECORD 1 ]---+---------------------------------------------
oid             | 20759
subdbid         | 19351
subname         | pg_test_sub
subowner        | 16390
subenabled      | t
subconninfo     | host=10.10.10.10 user=replicator dbname=meta
subslotname     | pg_test_sub
subsynccommit   | off
subpublications | {pg_meta_pub}

可以从pg_subscription_rel中确认哪些表被纳入到订阅的范围,及其复制状态。

$ psql ${PGDST} -Xwc 'table pg_subscription_rel;'
 srsubid | srrelid | srsubstate |  srsublsn
---------+---------+------------+------------
   20759 |   20742 | r          | 0/B0BC1FB8
   20759 |   20734 | r          | 0/B0BC20B0
   20759 |   20737 | r          | 0/B0BC20B0
   20759 |   20745 | r          | 0/B0BC20B0
   20759 |   20731 | r          | 0/B0BC20B0

2.4 等待逻辑复制同步

创建订阅后,首先必须监控 发布端与订阅端两侧的数据库日志,确保没有错误产生

2.4.1 逻辑复制状态机

如果一切正常,逻辑复制会自动开始,针对每张订阅中的表执行复制状态机逻辑,如下图所示。

当所有的表都完成复制,进入r(ready)状态时,逻辑复制的存量同步阶段便完成了,发布端与订阅端整体进入同步状态。

stateDiagram-v2 [*] --> init : 表被加入到订阅集中 init --> data : 开始同步表的初始快照 data --> sync : 存量数据同步完成 sync --> ready : 同步期间的增量变更应用完毕,进入就绪状态

当创建或刷新订阅时,表会被加入到 订阅集 中,每一张订阅集中的表都会在pg_subscription_rel视图中有一条对应纪录,展示这张表当前的复制状态。刚加入订阅集的表初始状态为i,即initialize初始状态

如果订阅的copy_data选项为真(默认情况),且工作进程池中有空闲的Worker,PostgreSQL会为这张表分配一个同步工作进程,同步这张表上的存量数据,此时表的状态进入d,即拷贝数据中。对表做数据同步类似于对数据库集群进行basebackup,Sync Worker会在发布端创建临时的复制槽,获取表上的快照并通过COPY完成基础数据同步。

当表上的基础数据拷贝完成后,表会进入sync模式,即数据同步,同步进程会追赶同步过程中发生的增量变更。当追赶完成时,同步进程会将这张表标记为r(ready)状态,转交逻辑复制主Apply进程管理变更,表示这张表已经处于正常复制中。

2.4.2 同步进度跟踪

数据同步(d)阶段可能需要花费一些时间,取决于网卡,网络,磁盘,表的大小与分布,逻辑复制的同步worker数量等因素。

作为参考,1TB的数据库,20张表,包含有250GB的大表,双万兆网卡,在6个数据同步worker的负责下大约需要6~8小时完成复制。

在数据同步过程中,每个表同步任务都会源端库上创建临时的复制槽。请确保逻辑复制初始同步期间不要给源端主库施加过大的不必要写入压力,以免WAL撑爆磁盘。

发布侧的 pg_stat_replicationpg_replication_slots,订阅端的pg_stat_subscriptionpg_subscription_rel提供了逻辑复制状态的相关信息,需要关注。

psql ${PGDST} -Xxw <<-'EOF'
    SELECT subname, json_object_agg(srsubstate, cnt) FROM
    pg_subscription s JOIN
      (SELECT srsubid, srsubstate, count(*) AS cnt FROM pg_subscription_rel 
       GROUP BY srsubid, srsubstate) sr
    ON s.oid = sr.srsubid GROUP BY subname;
EOF

可以使用以下SQL确认订阅中表的状态,如果所有表的状态都显示为r,则表示逻辑复制已经成功建立,订阅端可以用于切换。

   subname   | json_object_agg
-------------+-----------------
 pg_test_sub | { "r" : 5 }

当然,最好的方式始终是通过监控系统来跟踪复制状态。

3 切换时刻

3.1 准备工作

一个良好的工程实践是,在搞事情之前,在源端宿端都执行几次存盘操作,避免后续操作因被内存刷盘拖慢。

也可以执行分析命令更新统计信息,便于后续快速对比校验数据完整性。

psql ${PGSRC} -Xxwc 'CHECKPOINT;ANALYZE;CHECKPOINT;'
psql ${PGSRC} -Xxwc 'CHECKPOINT;ANALYZE;CHECKPOINT;'

在此之后的操作,都处于服务不可用状态,因此尽可能快地进行。通常情况下在分钟级内完成较为合适。

3.2 停止源端写入流量

3.2.1 选择合适的停止方式

暂停源端写入有多种方式,请根据实际业务场景选择与组合:

  • 告知业务方停止流量
  • 停止解析源端主库域名
  • 停止或暂停负载均衡器(Haproxy | VIP)的流量转发
  • 停止或暂停连接池Pgbouncer
  • 停止或暂停Postgres实例
  • 修改数据库主库的参数,设置默认事务模式为只读。
  • 修改数据库主库的HBA规则,拒绝业务访问。

通常建议使用修改HBA,修改连接池,修改负载均衡器的方式停止主库的写入流量。

请注意,无论使用何种方式,建议保持PostgreSQL存活,并且管理用户复制用户仍然可以连接到源端主库。

3.2.2 确认源端写入流量停止

当源端主库停止接受写入后,首先执行确认逻辑,通过观察pg_stat_replication,确认逻辑订阅者已经与发布者保持同步。

psql ${PGSRC} -Xxw <<-EOF
    SELECT application_name AS name,
           pg_current_wal_lsn() AS lsn,
           pg_current_wal_lsn() - replay_lsn AS lag 
    FROM pg_stat_replication;
EOF

-[ RECORD 1 ]-----
name | pg_test_sub
lsn  | 0/B0C24918
lag  | 0

重复执行上述命令,如果lsn字段保持不变,lag始终为0,就说明主库的写入流量已经正确停止,且逻辑从库上已经没有复制延迟,可以用于切换。

3.2.3 建立反向逻辑复制(可选)

如果要求迁移失败后业务可以随时回滚,可以在停止源端写入流量后,设置反向的逻辑复制,将后续订阅端(新主库)的变更反向同步至原来的发布端(旧主库)。不过此过程需要重新同步数据,耗时太久。通常情况下,只有在数据非常重要,且数据量不大或停机窗口足够长的情况下才适用于此方法。

首先停止宿端现有的逻辑订阅。必须停止现有逻辑复制才能继续后面的步骤,否则会形成循环复制

停止源端写入流量后,继续维持逻辑复制没有意义,因此可以停止宿端的订阅。但建议保留该订阅,只是禁用它,以备迁移失败回滚。

psql ${PGDST} -qAXtwc 'ALTER SUBSCRIPTION pg_test_sub DISABLE;'

然后依照上述流程重新建立 反向的逻辑复制,这里只给出命令:

# 在宿端创建发布:pg_test_pub
psql ${PGDST} -Xtw <<-EOF
  CREATE PUBLICATION pg_test_pub FOR TABLE
    public.pgbench_accounts,
    public.pgbench_branches,
    public.pgbench_tellers,
    public.pgbench_history,
    public.pgbench_extras;
  TABLE pg_publication;
EOF

# 在源端创建订阅
psql ${PGSRC} -Xtw <<-EOF
    CREATE SUBSCRIPTION "pg_meta_sub" 
      CONNECTION 'host=10.10.10.11 user=replicator dbname=test' 
      PUBLICATION "pg_test_pub";
    TABLE pg_subscription;
EOF

# 清空源端所有相关表(危险),等待/或者不等待同步完成
psql ${PGSRC} -Xtw <<-EOF
  TRUNCATE TABLE
    public.pgbench_accounts,
    public.pgbench_branches,
    public.pgbench_tellers,
    public.pgbench_history,
    public.pgbench_extras;
  TABLE pg_publication;
EOF

3.3 同步序列号与其他对象

逻辑复制不复制序列号(Sequence),因此基于逻辑复制做Failover时,必须在切换前手工同步序列号的值。

3.3.1 从源端同步序列号值

如果您的序列号都是从表上的SERIAL列定义自动创建的,而且宿端库也单纯只从源端订阅,那么同步序列号比较简单。从订阅端找出所有需要同步的序列号:

PGSRC='postgres://dbuser_admin@10.10.10.10/meta'        # 源端发布者 (SU)
PGDST='postgres://dbuser_admin@10.10.10.11/test'        # 宿端订阅者 (SU)

-- 查询订阅端,生成的用于同步SEQUENCE的shell命令
psql ${PGDST} -qAXtw <<-'EOF'
    SELECT 'pg_dump ${PGSRC} -a ' ||
    string_agg('-t ' || quote_ident(schemaname) || '.' || quote_ident(sequencename), ' ') ||
    ' | grep setval | psql -qAXtw ${PGDST}'
    FROM pg_sequences;
EOF

在本例中,只有pgbench_extras.id上有一个对应的SEQUENCE pgbench_extras_id_seq。这里生成的同步语句为

pg_dump ${PGSRC} -a -t public.pgbench_extras_id_seq | grep setval | psql -qAXtw ${PGDST}

比较复杂的情况,需要您手工生成这条命令,通过-t依次指定需要转储的序列号。

3.3.2 基于业务数据设置序列号值

另一种管理序列号的方式是直接根据表中的数据设置序列号的值,而无需从源端同步

例如,表pgbench_extras.id的最大值为100,那么将订阅端端pgbench_extras_id_seq直接设置为一个足够大的值,例如100+10000 = 10100,就可以保证迁移后使用该序列号分配的新id不会与已有数据冲突。

采用这种方式,可以直接在故障切换前进行序列号的设置,减少迁移切换所需的停机时间。但这样可能会导致业务数据序列号分配出现空洞,对于一些边界条件与特殊的序列号使用场景需要特别小心。例如:序列号从未被使用过,序列号的增长步长为负数,采用函数发号器调用Sequence等。

直接设置序列号的命令如下所示:

psql ${PGDST} -qAXtw <<-'EOF'
  SELECT pg_catalog.setval('public.pgbench_extras_id_seq', (SELECT max(id) + 1000 FROM pgbench_extras));
EOF

3.3.3 其他对象的同步

某些无法被逻辑复制处理的对象,也需要在这里一并进行同步。

例如:刷新物化视图,手工迁移大对象等。但这些功能很少有人会用到,所以在此不详细展开。

3.4 校验数据一致性

如果逻辑复制工作正常,通常不用校验数据,您可以在第二步中间执行多次对比校验以增强对逻辑复制的信心。

在停机窗口期间,建议只进行简单基本的数据校验,例如,比较表中的行数,主键的最大最小值是否一致。

以下函数用于执行这一校验

function compare_relation(){
	local relname=$1
	local identity=${2-'id'}
	psql ${3-${PGSRC}} -AXtwc "SELECT count(*) AS cnt, max($identity) AS max, min($identity) AS min FROM ${relname};"
	psql ${4-${PGDST}} -AXtwc "SELECT count(*) AS cnt, max($identity) AS max, min($identity) AS min FROM ${relname};"
}
compare_relation pgbench_accounts aid
compare_relation pgbench_branches bid
compare_relation pgbench_history  tid
compare_relation pgbench_tellers  tid
function compare_relation() {
    local src_url=${1}
    local dst_url=${2}
    local relname=${3}
    res1=$(psql "${src_url}" -AXtwc "SELECT count(*) AS cnt FROM ${relname};")
    res2=$(psql "${dst_url}" -AXtwc "SELECT count(*) AS cnt FROM ${relname};")
    if [[ "${res1}" == "${res2}" ]]; then
        echo -e "[ok] ${relname}\t\t\t${res1}\t${res2}"
    else
        echo -e "[xx] ${relname}\t\t\t${res1}\t${res2}"
    fi
}

function compare_all() {
    local src_url=${1}
    local dst_url=${2}
    tables=$(psql ${src_url} -AXtwc "SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')")
    for tbl in $tables; do
        result=$(compare_relation "${src_url}" "${dst_url}" ${tbl})
        echo ${result}
    done
}

compare_all ${PGSRC} ${PGDST}

同时,也可以过一遍3.3中同步的序列号,确认其配置是否相同。

psql ${PGSRC} -qwXtc "SELECT schemaname || '.' || sequencename AS name, last_value AS v FROM pg_sequences;"
psql ${PGDST} -qwXtc "SELECT schemaname || '.' || sequencename AS name, last_value AS v FROM pg_sequences;"

其他在3.3.3中手工同步的对象请按需自行校验。如果需要进行其他业务侧的校验,也在这里进行。但停机窗口时间宝贵,花费在这里的时间越长,服务不可用时间也越久。

校验完成后,就可以进行最终的流量切换了。

3.5 流量切换与善后

完成数据校验后就可以进行流量切换。

流量切换的方式取决于您所使用的访问方式,通常与3.2中停流量的方式对偶。例如:

  • 修改应用端连接串,并应用生效
  • 将源端主库域名解析至新主库
  • 将负载均衡器(Haproxy | VIP)的流量转发至新主库
  • 将原主库上Pgbouncer连接池的流量转发至新主库

通过监控系统或其他方式,确认写入流量已经正确应用订阅端的新主库后,基于逻辑复制的迁移就完成了。

不要忘记一些善后清理工作停用并删除订阅端的订阅删除发布端的发布

同时,应当继续确保原主库拒绝新的写入,以免有未清理干净的流量因为配置失误错漏仍然向旧主库访问。

# 删除订阅侧的 订阅
psql ${PGDST} -qAXtw <<-'EOF'
    ALTER SUBSCRIPTION pg_test_sub DISABLE;
    DROP SUBSCRIPTION pg_test_sub;
EOF

# 删除发布侧的 发布
psql ${PGSRC} -qAXtw <<-'EOF'
    DROP PUBLICATION pg_meta_sub;
EOF

至此,基于逻辑复制的完整迁移结束。

7.2 - 慢查询优化

使用Pigsty优化慢查询的一个例子

下面以Pigsty自带的沙箱环境为例,介绍一个使用Pigsty监控系统处理慢查询的过程。

慢查询:模拟

因为没有实际的业务系统,这里我们以一种简单快捷的方式模拟系统中的慢查询。即pgbench自带的tpc-c

在主库上执行以下命令

ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey ;

该命令会移除 pgbench_accounts 表上的主键,导致相关查询变慢,系统瞬间雪崩过载。

图1:单个从库实例的QPS从500下降至7,Query RT下降至300ms

图2:系统负载达到200%,触发机器负载过大,与查询响应时间过长的报警规则。

慢查询:定位

首先,使用PG Cluster面板定位慢查询所在的具体实例,这里以 pg-test-2为例

然后,使用PG Query面板定位具体的慢查询:编号为 -6041100154778468427

图3:从查询总览中发现异常慢查询

该查询表现出:

  • 响应时间显著上升: 17us 升至 280ms
  • QPS 显著下降: 从500下降到 7
  • 花费在该查询上的时间占比显著增加

可以确定,就是这个查询变慢了!

接下来,利用PG Stat Statements面板或PG Query Detail,根据查询ID定位慢查询的具体语句。

图4:定位的查询是SELECT abalance FROM pgbench_accounts WHERE aid = $1

慢查询:猜想

接下来,我们需要推断慢查询产生的原因。

SELECT abalance FROM pgbench_accounts WHERE aid = $1

该查询以 aid 作为过滤条件查询 pgbench_accounts 表,如此简单的查询变慢,大概率是这张表上的索引出了问题。

用屁股想都知道是索引少了,因为就是我们自己删掉的嘛!

分析查询后提出猜想: 该查询变慢是pgbench_accounts表上aid列缺少索引

下一步,查阅 PG Table Detail 面板,检查 pgbench_accounts 表上的访问,来验证我们的猜想

图5: pgbench_accounts 表上的访问情况

通过观察,我们发现表上的索引扫描归零,与此同时顺序扫描却有相应增长。这印证了我们的猜想!

慢查询:解决

确定了问题根源后,我们将着手解决。

尝试在 pgbench_accounts 表上为 aid 列添加索引,看看能否解决这个问题。

加上索引后,神奇的事情发生了。

图6:可以看到,查询的响应时间与QPS已经恢复正常。

图7:系统的负载也恢复正常

慢查询:样例

通过这篇教程,您已经掌握了慢查询优化的一般方法论。

图8:一个慢查询优化的实际例子,将系统的饱和度从40%降到了4%

7.3 - 高可用演练

模拟几种生产环境的常见故障,以测试Pigsty高可用数据库集群的自愈能力。

模拟几种生产环境的常见故障,以测试Pigsty高可用数据库集群的自愈能力。

Patroni快速上手

使用patronictl 对数据库集群进行控制,Pigsty已经创建了快捷方式pt

alias pt='patronictl -c /pg/bin/patroni.yml'

alias pt-up='sudo systemctl start patroni'     # 启动Patroni
alias pt-dw='sudo systemctl stop  patroni'     # 停止Patroni
alias pt-st='systemctl status patroni'         # 汇报Patroni抓昂泰
alias pt-ps='ps aux | grep patroni'            # 查看Patroni进程
alias pt-log='tail -f /pg/log/patroni.log'     # 监控Patroni日志

Patroni相关命令需要使用数据库超级用户(dbsu = postgres) 执行

$ pt --help
Usage: patronictl [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT  Configuration file
  -d, --dcs TEXT          Use this DCS
  -k, --insecure          Allow connections to SSL sites without certs
  --help                  Show this message and exit.

Commands:
  configure    Create configuration file
  dsn          Generate a dsn for the provided member,...
  edit-config  Edit cluster configuration
  failover     Failover to a replica
  flush        Discard scheduled events
  history      Show the history of failovers/switchovers
  list         List the Patroni members for a given Patroni
  pause        Disable auto failover
  query        Query a Patroni PostgreSQL member
  reinit       Reinitialize cluster member
  reload       Reload cluster member configuration
  remove       Remove cluster from DCS
  restart      Restart cluster member
  resume       Resume auto failover
  scaffold     Create a structure for the cluster in DCS
  show-config  Show cluster configuration
  switchover   Switchover to a replica
  topology     Prints ASCII topology for given cluster
  version      Output version of patronictl command or a...

场景一:Switchover

Switch是主动切换集群领导者

$ pt switchover
Master [pg-test-3]: pg-test-3
Candidate ['pg-test-1', 'pg-test-2'] []: pg-test-1
When should the switchover take place (e.g. 2020-10-23T17:06 )  [now]: now
Current cluster topology
+ Cluster: pg-test (6886641621295638555) -----+----+-----------+-----------------+
| Member    | Host        | Role    | State   | TL | Lag in MB | Tags            |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Replica | running |  2 |         0 | clonefrom: true |
| pg-test-2 | 10.10.10.12 | Replica | running |  2 |         0 | clonefrom: true |
| pg-test-3 | 10.10.10.13 | Leader  | running |  2 |           | clonefrom: true |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Are you sure you want to switchover cluster pg-test, demoting current master pg-test-3? [y/N]: y
2020-10-23 16:06:11.76252 Successfully switched over to "pg-test-1"

场景二:Failover

# run as postgres @ any member of cluster `pg-test`
$ pt failover
Candidate ['pg-test-2', 'pg-test-3'] []: pg-test-3
Current cluster topology
+ Cluster: pg-test (6886641621295638555) -----+----+-----------+-----------------+
| Member    | Host        | Role    | State   | TL | Lag in MB | Tags            |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Leader  | running |  1 |           | clonefrom: true |
| pg-test-2 | 10.10.10.12 | Replica | running |  1 |         0 | clonefrom: true |
| pg-test-3 | 10.10.10.13 | Replica | running |  1 |         0 | clonefrom: true |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Are you sure you want to failover cluster pg-test, demoting current master pg-test-1? [y/N]: y
+ Cluster: pg-test (6886641621295638555) -----+----+-----------+-----------------+
| Member    | Host        | Role    | State   | TL | Lag in MB | Tags            |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Replica | running |  2 |         0 | clonefrom: true |
| pg-test-2 | 10.10.10.12 | Replica | running |  2 |         0 | clonefrom: true |
| pg-test-3 | 10.10.10.13 | Leader  | running |  2 |           | clonefrom: true |
+-----------+-------------+---------+---------+----+-----------+-----------------+

场景三:从库Patroni/Postgres宕机

场景四:主库Patroni/Postgres宕机

场景五:DCS不可用

场景六:维护模式

问题探讨

关键问题:DCS的SLA如何保障?

==在自动切换模式下,如果DCS挂了,当前主库会在retry_timeout 后Demote成从库,导致所有集群不可写==。

作为分布式共识数据库,Consul/Etcd是相当稳健的,但仍必须确保DCS的SLA高于DB的SLA。

解决方法:配置一个足够大的retry_timeout,并通过几种以下方式从管理上解决此问题。

  1. SLA确保DCS一年的不可用时间短于该时长
  2. 运维人员能确保在retry_timeout之内解决DCS Service Down的问题。
  3. DBA能确保在retry_timeout之内将关闭集群的自动切换功能(打开维护模式)。

可以优化的点? 添加绕开DCS的P2P检测,如果主库意识到自己所处的分区仍为Major分区,不触发操作。

关键问题:HA策略,RPO优先或RTO优先?

可用性与一致性谁优先?例如,普通库RTO优先,金融支付类RPO优先。

普通库允许紧急故障切换时丢失极少量数据(阈值可配置,例如最近1M写入)

与钱相关的库不允许丢数据,相应地在故障切换时需要更多更审慎的检查或人工介入。

关键问题:Fencing机制,是否允许关机?

在正常情况下,Patroni会在发生Leader Change时先执行Primary Fencing,通过杀掉PG进程的方式进行。

但在某些极端情况下,比如vm暂停,软件Bug,或者极高负载,有可能没法成功完成这一点。那么就需要通过重启机器的方式一了百了。是否可以接受?在极端环境下会有怎样的表现?

关键操作:选主之后

选主之后要记得存盘。手工做一次Checkpoint确保万无一失。

关键问题:流量切换怎样做,2层,4层,7层

  • 2层:VIP漂移
  • 4层:Haproxy分发
  • 7层:DNS域名解析

关键问题:一主一从的特殊场景

  • 2层:VIP漂移
  • 4层:Haproxy分发
  • 7层:DNS域名解析

HA Procedure

Failure Detection

https://patroni.readthedocs.io/en/latest/SETTINGS.html#dynamic-configuration-settings

Fencing

Configure Watchdog

https://patroni.readthedocs.io/en/latest/watchdog.html

Bad Cases

Traffic Routing

DNS

VIP

HAproxy

Pgbouncer

7.4 - 数据库应用

以ISD数据集为例,展现如何将数据导入数据库中

如果您拥有数据库后不知道干点什么,不妨参考作者的另一个开源项目:Vonng/isd

您可以直接复用监控系统Grafana,以交互式的方式查阅近30000个地面气象站过去120年间的亚小时级气象数据。

ISD —— Intergrated Surface Data

这里包含了下载、解析、处理、可视化NOAA ISD数据集所需的所有工具。 能让您查阅近30000个地面气象站过去120年间的亚小时级气象数据。并充分体验PostgreSQL带来的强大的数据分析与处理能力!

SYNOPSIS

Download, Parse, Visualize Intergrated Suface Dataset.

Including 30000 meteorology station, sub-hourly observation records, from 1900-2020.

Quick Started

  1. Clone repo

    git clone https://github.com/Vonng/isd && cd isd 
    
  2. Prepare a postgres database

    Connect via something like isd or postgres://user:pass@host/dbname)

    # skip this if you already have a viable database
    PGURL=postgres
    psql ${PGURL} -c 'CREATE DATABASE isd;'
    
    # database connection string, something like `isd` or `postgres://user:pass@host/dbname`
    PGURL='isd'
    psql ${PGURL} -AXtwc 'CREATE EXTENSION postgis;'
    
    # create tables, partitions, functions
    psql ${PGURL} -AXtwf 'sql/schema.sql'
    
  3. Download data

    • ISD Station: Station metadata, id, name, location, country, etc…
    • ISD History: Station observation records: observation count per month
    • ISD Hourly: Yearly archived station (sub-)hourly observation records
    • ISD Daily: Yearly archvied station daily aggregated summary
    git clone https://github.com/Vonng/isd && cd isd
    bin/get-isd-station.sh         # download isd station from noaa (proxy makes it faster)
    bin/get-isd-history.sh         # download isd history observation from noaa
    bin/get-isd-hourly.sh <year>   # download isd hourly data (yearly tarball 1901-2020)
    bin/get-isd-daily.sh <year>    # download isd daily data  (yearly tarball 1929-2020) 
    
  4. Build Parser

    There are two ISD dataset parsers written in Golang : isdh for isd hourly dataset and isdd for isd daily dataset.

    make isdh and make isdd will build it and copy to bin. These parsers are required for loading data into database.

    You can download pre-compiled binary to bin/ dir to skip this phase.

  5. Load data

    Metadata includes world_fences, china_fences, isd_elements, isd_mwcode, isd_station, isd_history. These are gzipped csv file lies in data/meta/. world_fences, china_fences, isd_elements, isd_mwcode are constant dict table. But isd_station and isd_history are frequently updated. You’ll have to download it from noaa before loading it.

    # load metadata: fences, dicts, station, history,...
    bin/load-meta.sh 
    
    # load a year's daily data to database 
    bin/load-isd-daily <year> 
    
    # load a year's hourly data to database
    bin/laod-isd-hourly <year>
    

    Note that the original isd_daily dataset has some un-cleansed data, refer caveat for detail.

Data

Dataset

数据集 样本 文档 备注
ISD Hourly isd-hourly-sample.csv isd-hourly-document.pdf (Sub-) Hour oberservation records
ISD Daily isd-daily-sample.csv isd-daily-format.txt Daily summary
ISD Monthly N/A isd-gsom-document.pdf Not used, gen from daily
ISD Yearly N/A isd-gsoy-document.pdf Not used, gen from monthly

Hourly Data: Oringinal tarball size 105GB, Table size 1TB (+600GB Indexes).

Daily Data: Oringinal tarball size 3.2GB, table size 24 GB

It is recommended to have 2TB storage for a full installation, and at least 40GB for daily data only installation.

Schema

Data schema definition

Station

CREATE TABLE public.isd_station
(
    station    VARCHAR(12) PRIMARY KEY,
    usaf       VARCHAR(6) GENERATED ALWAYS AS (substring(station, 1, 6)) STORED,
    wban       VARCHAR(5) GENERATED ALWAYS AS (substring(station, 7, 5)) STORED,
    name       VARCHAR(32),
    country    VARCHAR(2),
    province   VARCHAR(2),
    icao       VARCHAR(4),
    location   GEOMETRY(POINT),
    longitude  NUMERIC GENERATED ALWAYS AS (Round(ST_X(location)::NUMERIC, 6)) STORED,
    latitude   NUMERIC GENERATED ALWAYS AS (Round(ST_Y(location)::NUMERIC, 6)) STORED,
    elevation  NUMERIC,
    period     daterange,
    begin_date DATE GENERATED ALWAYS AS (lower(period)) STORED,
    end_date   DATE GENERATED ALWAYS AS (upper(period)) STORED
);

Hourly Data

CREATE TABLE public.isd_hourly
(
    station    VARCHAR(11) NOT NULL,
    ts         TIMESTAMP   NOT NULL,
    temp       NUMERIC(3, 1),
    dewp       NUMERIC(3, 1),
    slp        NUMERIC(5, 1),
    stp        NUMERIC(5, 1),
    vis        NUMERIC(6),
    wd_angle   NUMERIC(3),
    wd_speed   NUMERIC(4, 1),
    wd_gust    NUMERIC(4, 1),
    wd_code    VARCHAR(1),
    cld_height NUMERIC(5),
    cld_code   VARCHAR(2),
    sndp       NUMERIC(5, 1),
    prcp       NUMERIC(5, 1),
    prcp_hour  NUMERIC(2),
    prcp_code  VARCHAR(1),
    mw_code    VARCHAR(2),
    aw_code    VARCHAR(2),
    pw_code    VARCHAR(1),
    pw_hour    NUMERIC(2),
    data       JSONB
) PARTITION BY RANGE (ts);

Daily Data

CREATE TABLE public.isd_daily
(
    station     VARCHAR(12) NOT NULL,
    ts          DATE        NOT NULL,
    temp_mean   NUMERIC(3, 1),
    temp_min    NUMERIC(3, 1),
    temp_max    NUMERIC(3, 1),
    dewp_mean   NUMERIC(3, 1),
    slp_mean    NUMERIC(5, 1),
    stp_mean    NUMERIC(5, 1),
    vis_mean    NUMERIC(6),
    wdsp_mean   NUMERIC(4, 1),
    wdsp_max    NUMERIC(4, 1),
    gust        NUMERIC(4, 1),
    prcp_mean   NUMERIC(5, 1),
    prcp        NUMERIC(5, 1),
    sndp        NuMERIC(5, 1),
    is_foggy    BOOLEAN,
    is_rainy    BOOLEAN,
    is_snowy    BOOLEAN,
    is_hail     BOOLEAN,
    is_thunder  BOOLEAN,
    is_tornado  BOOLEAN,
    temp_count  SMALLINT,
    dewp_count  SMALLINT,
    slp_count   SMALLINT,
    stp_count   SMALLINT,
    wdsp_count  SMALLINT,
    visib_count SMALLINT,
    temp_min_f  BOOLEAN,
    temp_max_f  BOOLEAN,
    prcp_flag   CHAR,
    PRIMARY KEY (ts, station)
) PARTITION BY RANGE (ts);

Update

ISD Daily and ISD hourly dataset will rolling update each day. Run following scripts to load latest data into database.

# download, clean, reload latest hourly dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh

# download, clean, reload latest daily dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh

# recalculate latest partition of monthly and yearly
bin/refresh-latest.sh

Parser

There are two parser: isdd and isdh, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consume by PostgreSQL Copy command).

NAME
	isdh -- Intergrated Surface Dataset Hourly Parser

SYNOPSIS
	isdh [-i <input|stdin>] [-o <output|st>] -p -d -c -v

DESCRIPTION
	The isdh program takes isd hourly (yearly tarball file) as input.
	And generate csv format as output

OPTIONS
	-i	<input>		input file, stdin by default
	-o	<output>	output file, stdout by default
	-p	<profpath>	pprof file path (disable by default)	
	-v                verbose progress report
	-d                de-duplicate rows (raw, ts-first, hour-first)
	-c                add comma separated extra columns

UI

ISD Station

ISD Monthly

8 - 参考

Pigsty详细参考信息

8.1 - 配置文件

配置参数详细介绍

以下是用于沙箱环境的默认配置文件:pigsty.yml

---
######################################################################
# File      :   pigsty.yml
# Desc      :   Pigsty Configuration Example
# Note      :   Pigsty Sandbox Demo
# Link      :   https://pigsty.cc/zh/docs/config/
# Ctime     :   2020-05-22
# Mtime     :   2021-04-19
# Copyright (C) 2018-2021 Ruohang Feng
######################################################################


######################################################################
#               Development Environment Inventory                    #
######################################################################


all: # top-level namespace

  #==================================================================#
  #                           Clusters                               #
  #==================================================================#
  # postgres database clusters are defined as kv pair in `all.children`
  # where the key is cluster name and the value is the object consist
  # of cluster members (hosts) and cluster specific variables (vars)
  # meta nodes are defined in special group "meta" with `meta_node=true`

  children:

    #-----------------------------
    # meta controller
    #-----------------------------
    meta:      # special group 'meta' defines the main controller machine
      vars:
        meta_node: true                    # mark node as meta controller
        ansible_group_priority: 99         # meta group has top priority
      hosts:
        10.10.10.10: {}

    #-----------------------------
    # cluster: pg-meta
    #-----------------------------
    # pg-meta is a single-node pgsql cluster deployed on meta node (10.10.10.10)
    pg-meta:
      # - cluster members - #
      hosts:
        10.10.10.10: {pg_seq: 1, pg_role: primary, pg_offline_query: true}

      # - cluster configs - #
      vars:
        pg_cluster: pg-meta                 # define actual cluster name
        pg_version: 13                      # define installed pgsql version
        node_tune: tiny                     # tune node into oltp|olap|crit|tiny mode
        pg_conf: tiny.yml                   # tune pgsql into oltp|olap|crit|tiny mode
        patroni_mode: pause                 # enter maintenance mode, {default|pause|remove}
        patroni_watchdog_mode: off          # disable watchdog (require|automatic|off)
        pg_lc_ctype: en_US.UTF8             # enabled pg_trgm i18n char support

        # - defining business users - #
        pg_users:
          # default production read-write user dbuser_meta
          - name: dbuser_meta                              # user's name is required
            password: md5d3d10d8cad606308bdb180148bf663e1  # md5 password is acceptable
            pgbouncer: true                                # add user to pgbouncer userlist
            roles: [dbrole_readwrite]                      # grant roles to user
            comment: default production read-write user for meta database

          # default production read-only user for grafana direct access
          - name: dbuser_grafana
            password: DBUser.Grafana
            pgbouncer: true
            roles: [dbrole_readonly]
            comment: default readonly access for grafana datasource

          # complete example of user/role definition
          - name: dbuser_pigsty             # pigsty user have admin access (DDL|DML)
            password: DBUser.Pigsty         # example user's password, can be md5 encrypted
            login: true                     # can login, true by default (should be false for role)
            superuser: false                # is superuser? false by default
            createdb: false                 # can create database? false by default
            createrole: false               # can create role? false by default
            inherit: true                   # can this role use inherited privileges?
            replication: false              # can this role do replication? false by default
            bypassrls: false                # can this role bypass row level security? false by default
            pgbouncer: true                 # add this user to pgbouncer? false by default (true for production user)
            connlimit: -1                   # connection limit, -1 disable limit
            expire_in: 3650                 # now + n days when this role is expired (OVERWRITE expire_at)
            expire_at: '2030-12-31'         # 'timestamp' when this role is expired (OVERWRITTEN by expire_in)
            comment: pigsty admin user      # comment on user/role
            roles: [dbrole_admin]           # dbrole_{admin,readonly,readwrite,offline}
            parameters:                     # additional role level parameters with ALTER ROLE SET
              search_path: pigsty,public    # add pigsty schema into search_path

        # - defining business databases - #
        pg_databases:
          - name: meta                      # name is the only required field for a database
            # baseline: metadb/schema.sql   # pigsty meta database baseline
            # owner: postgres               # optional, database owner
            # template: template1           # optional, template1 by default
            # encoding: UTF8                # optional, UTF8 by default , must same as template database, leave blank to set to db default
            # locale: C                     # optional, C by default , must same as template database, leave blank to set to db default
            # lc_collate: C                 # optional, C by default , must same as template database, leave blank to set to db default
            # lc_ctype: C                   # optional, C by default , must same as template database, leave blank to set to db default
            # tablespace: pg_default        # optional, 'pg_default' is the default tablespace
            # allowconn: true               # optional, true by default, false disable connect at all
            # revokeconn: false             # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
            # pgbouncer: true               # optional, add this database to pgbouncer list? true by default
            comment: pigsty meta database   # optional, comment string for database
            connlimit: -1                   # optional, connection limit, -1 or none disable limit (default)
            schemas: [pigsty]               # optional, create additional schema
            extensions:                     # optional, extension name and which schema to create
              - {name: adminpack, schema: pg_catalog}
            parameters:                       # optional, extra parameters with ALTER DATABASE
              search_path: 'pigsty,public'    # add pigsty to search_path

        pg_default_database: meta           # default database will be used as primary monitor target
        vip_mode: l2                        # none|l2|l4, l2 vip are used in sandbox demo
        vip_address: 10.10.10.2             # virtual ip address
        vip_cidrmask: 8                     # cidr network mask length
        vip_interface: eth1                 # interface to add virtual ip


    #-----------------------------
    # cluster: pg-test
    #-----------------------------
    # uncomment this for complete 4-node sandbox demo environment

    #pg-test: # define cluster named 'pg-test'
    #  # - cluster members - #
    #  hosts:
    #    10.10.10.11: {pg_seq: 1, pg_role: primary}
    #    10.10.10.12: {pg_seq: 2, pg_role: replica}
    #    10.10.10.13: {pg_seq: 3, pg_role: offline}
    #
    #  # - cluster configs - #
    #  vars:
    #    # basic settings
    #    pg_cluster: pg-test                 # define actual cluster name
    #    pg_version: 13                      # define installed pgsql version
    #    node_tune: tiny                     # tune node into oltp|olap|crit|tiny mode
    #    pg_conf: tiny.yml                   # tune pgsql into oltp|olap|crit|tiny mode
    #    pg_users:
    #      - name: test                      # admin user for pg-test, have DDL
    #        password: test
    #        roles: [dbrole_admin]
    #        pgbouncer: true
    #        comment: default admin user for test database
    #
    #      - name: dbuser_test               # production rw-user
    #        password: DBUser.Test
    #        roles: [dbrole_readwrite]
    #        pgbouncer: true
    #        comment: default test user for production usage
    #
    #    pg_databases:                       # create a business database 'test'
    #      - name: test                      # use the simplest form
    #        extensions:                     # install postgis to test database
    #          - {name: postgis, schema: public}
    #    pg_default_database: test           # default database will be used as primary monitor target
    #
    #    # extra service settings
    #    pg_services_extra:                  # extra services to be added
    #      - name: standby                   # service name pg-meta-standby
    #        src_ip: "*"
    #        src_port: 5435                  # 5435 routes to sync replica
    #        dst_port: postgres
    #        check_url: /sync                # use /sync health check
    #        selector: "[]"                  # jmespath to filter instances
    #        selector_backup: "[? pg_role == `primary`]"  # primary used as backup server for standby service
    #
    #    # proxy settings
    #    vip_mode: l2                        # enable/disable vip (require members in same LAN)
    #    vip_address: 10.10.10.3             # virtual ip address
    #    vip_cidrmask: 8                     # cidr network mask length
    #    vip_interface: eth1                 # interface to add virtual ip


  #==================================================================#
  #                           Globals                                #
  #==================================================================#
  vars:

    #------------------------------------------------------------------------------
    # CONNECTION PARAMETERS
    #------------------------------------------------------------------------------
    # this section defines connection parameters

    # ansible_user: vagrant                       # admin user with ssh access and sudo privilege

    proxy_env: # global proxy env when downloading packages
      no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn,*.myqcloud.com"
      # http_proxy: ''
      # https_proxy: ''
      # all_proxy: ''


    #------------------------------------------------------------------------------
    # REPO PROVISION
    #------------------------------------------------------------------------------
    # this section defines how to build a local repo

    # - repo basic - #
    repo_enabled: true                            # build local yum repo on meta nodes?
    repo_name: pigsty                             # local repo name
    repo_address: yum.pigsty                      # repo external address (ip:port or url)
    repo_port: 80                                 # listen address, must same as repo_address
    repo_home: /www                               # default repo dir location
    repo_rebuild: false                           # force re-download packages
    repo_remove: true                             # remove existing repos

    # - where to download - #
    repo_upstreams:
      - name: base
        description: CentOS-$releasever - Base - Aliyun Mirror
        baseurl:
          - http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
          - http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
          - http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
        gpgcheck: no
        failovermethod: priority

      - name: updates
        description: CentOS-$releasever - Updates - Aliyun Mirror
        baseurl:
          - http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
          - http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
          - http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
        gpgcheck: no
        failovermethod: priority

      - name: extras
        description: CentOS-$releasever - Extras - Aliyun Mirror
        baseurl:
          - http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
          - http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
          - http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
        gpgcheck: no
        failovermethod: priority

      - name: epel
        description: CentOS $releasever - EPEL - Aliyun Mirror
        baseurl: http://mirrors.aliyun.com/epel/$releasever/$basearch
        gpgcheck: no
        failovermethod: priority

      - name: grafana
        description: Grafana - TsingHua Mirror
        gpgcheck: no
        baseurl: https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm

      - name: prometheus
        description: Prometheus and exporters
        gpgcheck: no
        baseurl: https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch

      # consider using ZJU PostgreSQL mirror in mainland china
      - name: pgdg-common
        description: PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch
        gpgcheck: no
        baseurl: https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-$releasever-$basearch
        # baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch

      - name: pgdg13
        description: PostgreSQL 13 for RHEL/CentOS $releasever - $basearch
        gpgcheck: no
        baseurl: https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-$releasever-$basearch
        # baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch

      - name: centos-sclo
        description: CentOS-$releasever - SCLo
        gpgcheck: no
        mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo

      - name: centos-sclo-rh
        description: CentOS-$releasever - SCLo rh
        gpgcheck: no
        mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh

      - name: nginx
        description: Nginx Official Yum Repo
        skip_if_unavailable: true
        gpgcheck: no
        baseurl: http://nginx.org/packages/centos/$releasever/$basearch/

      - name: haproxy
        description: Copr repo for haproxy
        skip_if_unavailable: true
        gpgcheck: no
        baseurl: https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/

      # for latest consul & kubernetes
      - name: harbottle
        description: Copr repo for main owned by harbottle
        skip_if_unavailable: true
        gpgcheck: no
        baseurl: https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/

    # - what to download - #
    repo_packages:
      # repo bootstrap packages
      - epel-release nginx wget yum-utils yum createrepo sshpass unzip                        # bootstrap packages

      # node basic packages
      - ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget git tuned         # basic system util
      - readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates     # basic pg dependency
      - numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet  # system utils

      # dcs & monitor packages
      - grafana prometheus2 pushgateway alertmanager                                          # monitor and ui
      - node_exporter postgres_exporter nginx_exporter blackbox_exporter                      # exporter
      - consul consul_exporter consul-template etcd                                           # dcs

      # python3 dependencies
      - ansible python python-pip python-psycopg2 audit                                       # ansible & python
      - python3 python3-psycopg2 python36-requests python3-etcd python3-consul                # python3
      - python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography               # patroni extra deps

      # proxy and load balancer
      - haproxy keepalived dnsmasq                                                            # proxy and dns

      # postgres common Packages
      - patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity               # major components
      - pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel                # other common utils
      - barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail

      # postgres 13 packages
      - postgresql13* postgis31* citus_13 timescaledb_13 # pgrouting_13                         # postgres 13 and postgis 31
      - pg_repack13 pg_squeeze13                                                                # maintenance extensions
      - pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13                        # stats extensions
      - plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13      # PL extensions
      - hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13             # FDW extensions
      - wal2json13 count_distinct13 ddlx_13 geoip13 orafce13                                    # MISC extensions
      - rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13
      - pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13
      - pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13
      - pguint13 pguri13 prefix13  safeupdate_13 semver13  table_version13 tdigest13

    repo_url_packages:
      # additional rpm packages
      - https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter-0.3.2-1.el7.x86_64.rpm
      - https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm
      - http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm

      # tar.gz and zip binary packages
      - https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz # monitor binary
      - https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter_v0.3.2_linux-amd64.tar.gz
      - https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip           # loki binary
      - https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip
      - https://github.com/grafana/loki/releases/download/v2.2.1/logcli-linux-amd64.zip
      - https://github.com/grafana/loki/releases/download/v2.2.1/loki-canary-linux-amd64.zip

      # mirror in mainland china (use commented packages to install from official site)
      # - http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/pg_exporter-0.3.2-1.el7.x86_64.rpm
      # - http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/vip-manager_0.6-1_amd64.rpm
      # - http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/polysh-0.4-1.noarch.rpm

    #------------------------------------------------------------------------------
    # NODE PROVISION
    #------------------------------------------------------------------------------
    # this section defines how to provision nodes
    # nodename:                                   # if defined, node's hostname will be overwritten

    # - node dns - #
    node_dns_hosts: # static dns records in /etc/hosts
      - 10.10.10.10 yum.pigsty
    node_dns_server: add                          # add (default) | none (skip) | overwrite (remove old settings)
    node_dns_servers:                             # dynamic nameserver in /etc/resolv.conf
      - 10.10.10.10
    node_dns_options:                             # dns resolv options
      - options single-request-reopen timeout:1 rotate
      - domain service.consul

    # - node repo - #
    node_repo_method: local                       # none|local|public (use local repo for production env)
    node_repo_remove: true                        # whether remove existing repo
    node_local_repo_url:                          # local repo url (if method=local, make sure firewall is configured or disabled)
      - http://yum.pigsty/pigsty.repo

    # - node packages - #
    node_packages:                                # common packages for all nodes
      - wget,yum-utils,sshpass,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
      - numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
      - python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
      - python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
      - node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
    node_extra_packages:                          # extra packages for all nodes
      - patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
    node_meta_packages:                           # packages for meta nodes only
      - grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
      - dnsmasq,nginx,ansible,pgbadger,polysh

    # build & devel packages (add to repo_packages too if you want build database & extensions from source)
    # - gcc,gcc-c++,clang,coreutils,diffutils,rpm-build,rpm-devel,rpmlint,rpmdevtools
    # - zlib-devel,openssl-libs,openssl-devel,pam-devel,libxml2-devel,libxslt-devel,openldap-devel,systemd-devel,tcl-devel,python-devel


    # - node features - #
    node_disable_numa: false                      # disable numa, important for production database, reboot required
    node_disable_swap: false                      # disable swap, important for production database
    node_disable_firewall: true                   # disable firewall (required if using kubernetes)
    node_disable_selinux: true                    # disable selinux  (required if using kubernetes)
    node_static_network: true                     # keep dns resolver settings after reboot
    node_disk_prefetch: false                     # setup disk prefetch on HDD to increase performance

    # - node kernel modules - #
    node_kernel_modules:
      - softdog
      - br_netfilter
      - ip_vs
      - ip_vs_rr
      - ip_vs_rr
      - ip_vs_wrr
      - ip_vs_sh
      - nf_conntrack_ipv4

    # - node tuned - #
    node_tune: tiny                               # install and activate tuned profile: none|oltp|olap|crit|tiny
    node_sysctl_params: {}                        # set additional sysctl parameters, k:v format
    # net.bridge.bridge-nf-call-iptables: 1     # example kv parameters

    # - node user - #
    node_admin_setup: true                        # setup an default admin user ?
    node_admin_uid: 88                            # uid and gid for admin user
    node_admin_username: dba                      # default admin user: dba
    node_admin_ssh_exchange: true                 # exchange admin's ssh key among cluster ?
    node_admin_pk_current: false                  # add current user's ~/.ssh/id_rsa.pub to admin pk
    node_admin_pks:                               # ssh public keys to be added to admin user
      - 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com'

    # - node ntp - #
    node_ntp_service: ntp                         # ntp or chrony
    node_ntp_config: true                         # overwrite existing ntp config?
    node_timezone: Asia/Shanghai                  # default node timezone
    node_ntp_servers:                             # default NTP servers
      - pool cn.pool.ntp.org iburst
      - pool pool.ntp.org iburst
      - pool time.pool.aliyun.com iburst
      - server 10.10.10.10 iburst


    #------------------------------------------------------------------------------
    # META PROVISION
    #------------------------------------------------------------------------------
    # - ca - #
    ca_method: create                             # create|copy|recreate
    ca_subject: "/CN=root-ca"                     # self-signed CA subject
    ca_homedir: /ca                               # ca cert directory
    ca_cert: ca.crt                               # ca public key/cert
    ca_key: ca.key                                # ca private key

    # - nginx - #
    nginx_upstream:
      - { name: home,          host: pigsty,   url: "127.0.0.1:3000"}
      - { name: consul,        host: c.pigsty, url: "127.0.0.1:8500" }
      - { name: grafana,       host: g.pigsty, url: "127.0.0.1:3000" }
      - { name: prometheus,    host: p.pigsty, url: "127.0.0.1:9090" }
      - { name: alertmanager,  host: a.pigsty, url: "127.0.0.1:9093" }
      - { name: haproxy,       host: h.pigsty, url: "127.0.0.1:9091" }

    # - nameserver - #
    dns_records: # dynamic dns record resolved by dnsmasq
      - 10.10.10.2  pg-meta                       # sandbox vip for pg-meta
      - 10.10.10.3  pg-test                       # sandbox vip for pg-test
      - 10.10.10.10 meta-1                        # sandbox node meta-1 (node-0)
      - 10.10.10.11 node-1                        # sandbox node node-1
      - 10.10.10.12 node-2                        # sandbox node node-2
      - 10.10.10.13 node-3                        # sandbox node node-3
      - 10.10.10.10 pigsty
      - 10.10.10.10 y.pigsty yum.pigsty
      - 10.10.10.10 c.pigsty consul.pigsty
      - 10.10.10.10 g.pigsty grafana.pigsty
      - 10.10.10.10 p.pigsty prometheus.pigsty
      - 10.10.10.10 a.pigsty alertmanager.pigsty
      - 10.10.10.10 n.pigsty ntp.pigsty
      - 10.10.10.10 h.pigsty haproxy.pigsty

    # - prometheus - #
    prometheus_data_dir: /export/prometheus/data  # prometheus data dir
    prometheus_options: '--storage.tsdb.retention=30d'
    prometheus_reload: false                      # reload prometheus instead of recreate it
    prometheus_sd_method: consul                  # service discovery method: static|consul|etcd
    prometheus_scrape_interval: 5s                # global scrape & evaluation interval
    prometheus_scrape_timeout: 4s                 # scrape timeout
    prometheus_sd_interval: 5s                    # service discovery refresh interval

    # - grafana - #
    grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
    grafana_admin_password: admin                    # default grafana admin user password
    grafana_plugin: install                          # none|install|reinstall
    grafana_cache: /www/pigsty/grafana/plugins.tgz   # path to grafana plugins tarball
    grafana_customize: false                         # customize grafana resources
    grafana_plugins: # default grafana plugins list
      - redis-datasource
      - simpod-json-datasource
      - fifemon-graphql-datasource
      - sbueringer-consul-datasource
      - camptocamp-prometheus-alertmanager-datasource
      - ryantxu-ajax-panel
      - marcusolsson-hourly-heatmap-panel
      - michaeldmoore-multistat-panel
      - marcusolsson-treemap-panel
      - pr0ps-trackmap-panel
      - dalvany-image-panel
      - magnesium-wordcloud-panel
      - cloudspout-button-panel
      - speakyourcode-button-panel
      - jdbranham-diagram-panel
      - grafana-piechart-panel
      - snuids-radar-panel
      - digrich-bubblechart-panel
    grafana_git_plugins:
      - https://github.com/Vonng/grafana-echarts

    # - loki - #
    loki_clean: false                 # whether remove existing loki data
    loki_data_dir: /export/loki       # default loki data dir


    #------------------------------------------------------------------------------
    # DCS PROVISION
    #------------------------------------------------------------------------------
    service_registry: consul                      # where to register services: none | consul | etcd | both
    dcs_type: consul                              # consul | etcd | both
    dcs_name: pigsty                              # consul dc name | etcd initial cluster token
    dcs_servers:                                  # dcs server dict in name:ip format
      meta-1: 10.10.10.10                         # you could use existing dcs cluster
      # meta-2: 10.10.10.11                       # host which have their IP listed here will be init as server
      # meta-3: 10.10.10.12                       # 3 or 5 dcs nodes are recommend for production environment
    dcs_exists_action: clean                      # abort|skip|clean if dcs server already exists
    dcs_disable_purge: false                      # set to true to disable purge functionality for good (force dcs_exists_action = abort)
    consul_data_dir: /var/lib/consul              # consul data dir (/var/lib/consul by default)
    etcd_data_dir: /var/lib/etcd                  # etcd data dir (/var/lib/consul by default)


    #------------------------------------------------------------------------------
    # POSTGRES INSTALLATION
    #------------------------------------------------------------------------------
    # - dbsu - #
    pg_dbsu: postgres                             # os user for database, postgres by default (change it is not recommended!)
    pg_dbsu_uid: 26                               # os dbsu uid and gid, 26 for default postgres users and groups
    pg_dbsu_sudo: limit                           # none|limit|all|nopass (Privilege for dbsu, limit is recommended)
    pg_dbsu_home: /var/lib/pgsql                  # postgresql binary
    pg_dbsu_ssh_exchange: false                   # exchange ssh key among same cluster

    # - postgres packages - #
    pg_version: 13                                # default postgresql version
    pgdg_repo: false                              # use official pgdg yum repo (disable if you have local mirror)
    pg_add_repo: false                            # add postgres related repo before install (useful if you want a simple install)
    pg_bin_dir: /usr/pgsql/bin                    # postgres binary dir
    pg_packages:
      - postgresql${pg_version}*
      - postgis31_${pg_version}*
      - pgbouncer patroni pg_exporter pgbadger
      - patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity
      - python3 python3-psycopg2 python36-requests python3-etcd python3-consul
      - python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography

    pg_extensions:
      - pg_repack${pg_version} pg_qualstats${pg_version} pg_stat_kcache${pg_version} wal2json${pg_version}
      # - ogr_fdw${pg_version} mysql_fdw_${pg_version} redis_fdw_${pg_version} mongo_fdw${pg_version} hdfs_fdw_${pg_version}
      # - count_distinct${version}  ddlx_${version}  geoip${version}  orafce${version}                                   # popular features
      # - hypopg_${version}  ip4r${version}  jsquery_${version}  logerrors_${version}  periods_${version}  pg_auto_failover_${version}  pg_catcheck${version}
      # - pg_fkpart${version}  pg_jobmon${version}  pg_partman${version}  pg_prioritize_${version}  pg_track_settings${version}  pgaudit15_${version}
      # - pgcryptokey${version}  pgexportdoc${version}  pgimportdoc${version}  pgmemcache-${version}  pgmp${version}  pgq-${version}  pgquarrel pgrouting_${version}
      # - pguint${version}  pguri${version}  prefix${version}   safeupdate_${version}  semver${version}   table_version${version}  tdigest${version}



    #------------------------------------------------------------------------------
    # POSTGRES PROVISION
    #------------------------------------------------------------------------------
    # - identity - #
    # pg_cluster:                                 # [REQUIRED] cluster name (cluster level,  validated during pg_preflight)
    # pg_seq: 0                                   # [REQUIRED] instance seq (instance level, validated during pg_preflight)
    # pg_role: replica                            # [REQUIRED] service role (instance level, validated during pg_preflight)
    # pg_shard:                                   # [OPTIONAL] shard name  (cluster level)
    # pg_sindex:                                  # [OPTIONAl] shard index (cluster level)

    # - identity option -#
    pg_hostname: false                            # overwrite node hostname with pg instance name
    pg_nodename: true                             # overwrite consul nodename with pg instance name

    # - retention - #
    # pg_exists_action, available options: abort|clean|skip
    #  - abort: abort entire play's execution (default)
    #  - clean: remove existing cluster (dangerous)
    #  - skip: end current play for this host
    # pg_exists: false                            # auxiliary flag variable (DO NOT SET THIS)
    pg_exists_action: clean
    pg_disable_purge: false                       # set to true to disable pg purge functionality for good (force pg_exists_action = abort)

    # - storage - #
    pg_data: /pg/data                             # postgres data directory
    pg_fs_main: /export                           # data disk mount point     /pg -> {{ pg_fs_main }}/postgres/{{ pg_instance }}
    pg_fs_bkup: /var/backups                      # backup disk mount point   /pg/* -> {{ pg_fs_bkup }}/postgres/{{ pg_instance }}/*

    # - connection - #
    pg_listen: '0.0.0.0'                          # postgres listen address, '0.0.0.0' by default (all ipv4 addr)
    pg_port: 5432                                 # postgres port (5432 by default)
    pg_localhost: /var/run/postgresql             # localhost unix socket dir for connection
    # pg_upstream:                                # [OPTIONAL] specify replication upstream (set on primary transform cluster into a standby cluster)


    # - patroni - #
    # patroni_mode, available options: default|pause|remove
    #   - default: default ha mode
    #   - pause:   into maintenance mode
    #   - remove:  remove patroni after bootstrap
    patroni_mode: default                         # pause|default|remove
    pg_namespace: /pg                             # top level key namespace in dcs
    patroni_port: 8008                            # default patroni port
    patroni_watchdog_mode: automatic              # watchdog mode: off|automatic|required
    pg_conf: tiny.yml                             # user provided patroni config template path

    # - flags - #
    pg_backup: false                              # store base backup on this node
    pg_delay: 0                                   # apply delay for offline|delayed instance

    # - localization - #
    pg_encoding: UTF8                             # default to UTF8
    pg_locale: C                                  # default to C
    pg_lc_collate: C                              # default to C
    pg_lc_ctype: en_US.UTF8                       # default to en_US.UTF8

    # - pgbouncer - #
    pgbouncer_port: 6432                          # pgbouncer port (6432 by default)
    pgbouncer_poolmode: transaction               # pooling mode: (transaction pooling by default)
    pgbouncer_max_db_conn: 100                    # important! do not set this larger than postgres max conn or conn limit


    #------------------------------------------------------------------------------
    # POSTGRES TEMPLATE
    #------------------------------------------------------------------------------
    # - template - #
    pg_init: pg-init                              # init script for cluster template

    # - system roles - #
    pg_replication_username: replicator           # system replication user
    pg_replication_password: DBUser.Replicator    # system replication password
    pg_monitor_username: dbuser_monitor           # system monitor user
    pg_monitor_password: DBUser.Monitor           # system monitor password
    pg_admin_username: dbuser_dba                 # system admin user
    pg_admin_password: DBUser.DBA                 # system admin password

    # - default roles - #
    # chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
    pg_default_roles:

      # common production readonly user
      - name: dbrole_readonly                 # production read-only roles
        login: false
        comment: role for global readonly access

      # common production read-write user
      - name: dbrole_readwrite                # production read-write roles
        login: false
        roles: [dbrole_readonly]             # read-write includes read-only access
        comment: role for global read-write access

      # offline have same privileges as readonly, but with limited hba access on offline instance only
      # for the purpose of running slow queries, interactive queries and perform ETL tasks
      - name: dbrole_offline
        login: false
        comment: role for restricted read-only access (offline instance)

      # admin have the privileges to issue DDL changes
      - name: dbrole_admin
        login: false
        bypassrls: true
        comment: role for object creation
        roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]

      # dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
      - name: postgres
        superuser: true
        comment: system superuser

      # default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
      - name: replicator
        replication: true                          # for replication user
        bypassrls: true                            # logical replication require bypassrls
        roles: [pg_monitor, dbrole_readonly]       # logical replication require select privileges
        comment: system replicator

      # default monitor user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
      - name: dbuser_monitor
        connlimit: 16
        comment: system monitor user
        roles: [pg_monitor, dbrole_readonly]
        parameters:
          log_min_duration_statement: 1000

      # default admin super user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
      - name: dbuser_dba
        superuser: true
        comment: system admin user
        roles: [dbrole_admin]

      # default stats user, for ETL and slow queries
      - name: dbuser_stats
        password: DBUser.Stats
        comment: business offline user for offline queries and ETL
        roles: [dbrole_offline]


    # - privileges - #
    # object created by dbsu and admin will have their privileges properly set
    pg_default_privileges:
      - GRANT USAGE                         ON SCHEMAS   TO dbrole_readonly
      - GRANT SELECT                        ON TABLES    TO dbrole_readonly
      - GRANT SELECT                        ON SEQUENCES TO dbrole_readonly
      - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_readonly
      - GRANT USAGE                         ON SCHEMAS   TO dbrole_offline
      - GRANT SELECT                        ON TABLES    TO dbrole_offline
      - GRANT SELECT                        ON SEQUENCES TO dbrole_offline
      - GRANT EXECUTE                       ON FUNCTIONS TO dbrole_offline
      - GRANT INSERT, UPDATE, DELETE        ON TABLES    TO dbrole_readwrite
      - GRANT USAGE,  UPDATE                ON SEQUENCES TO dbrole_readwrite
      - GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES    TO dbrole_admin
      - GRANT CREATE                        ON SCHEMAS   TO dbrole_admin

    # - schemas - #
    pg_default_schemas: [monitor]                 # default schemas to be created

    # - extension - #
    pg_default_extensions:                        # default extensions to be created
      - { name: 'pg_stat_statements',  schema: 'monitor' }
      - { name: 'pgstattuple',         schema: 'monitor' }
      - { name: 'pg_qualstats',        schema: 'monitor' }
      - { name: 'pg_buffercache',      schema: 'monitor' }
      - { name: 'pageinspect',         schema: 'monitor' }
      - { name: 'pg_prewarm',          schema: 'monitor' }
      - { name: 'pg_visibility',       schema: 'monitor' }
      - { name: 'pg_freespacemap',     schema: 'monitor' }
      - { name: 'pg_repack',           schema: 'monitor' }
      - name: postgres_fdw
      - name: file_fdw
      - name: btree_gist
      - name: btree_gin
      - name: pg_trgm
      - name: intagg
      - name: intarray

    # - hba - #
    pg_offline_query: false                       # set to true to enable offline query on instance
    pg_reload: true                               # reload postgres after hba changes
    pg_hba_rules:                                 # postgres host-based authentication rules
      - title: allow meta node password access
        role: common
        rules:
          - host    all     all                         10.10.10.10/32      md5

      - title: allow intranet admin password access
        role: common
        rules:
          - host    all     +dbrole_admin               10.0.0.0/8          md5
          - host    all     +dbrole_admin               172.16.0.0/12       md5
          - host    all     +dbrole_admin               192.168.0.0/16      md5

      - title: allow intranet password access
        role: common
        rules:
          - host    all             all                 10.0.0.0/8          md5
          - host    all             all                 172.16.0.0/12       md5
          - host    all             all                 192.168.0.0/16      md5

      - title: allow local read/write (local production user via pgbouncer)
        role: common
        rules:
          - local   all     +dbrole_readonly                                md5
          - host    all     +dbrole_readonly           127.0.0.1/32         md5

      - title: allow offline query (ETL,SAGA,Interactive) on offline instance
        role: offline
        rules:
          - host    all     +dbrole_offline               10.0.0.0/8        md5
          - host    all     +dbrole_offline               172.16.0.0/12     md5
          - host    all     +dbrole_offline               192.168.0.0/16    md5

    pg_hba_rules_extra: []                        # extra hba rules (for cluster/instance overwrite)

    pgbouncer_hba_rules:                          # pgbouncer host-based authentication rules
      - title: local password access
        role: common
        rules:
          - local  all          all                                     md5
          - host   all          all                     127.0.0.1/32    md5

      - title: intranet password access
        role: common
        rules:
          - host   all          all                     10.0.0.0/8      md5
          - host   all          all                     172.16.0.0/12   md5
          - host   all          all                     192.168.0.0/16  md5

    pgbouncer_hba_rules_extra: []                 # extra pgbouncer hba rules (for cluster/instance overwrite)
    # pg_users: []                                # business users
    # pg_databases: []                            # business databases

    #------------------------------------------------------------------------------
    # MONITOR PROVISION
    #------------------------------------------------------------------------------
    # - install - #
    exporter_install: none                        # none|yum|binary, none by default
    exporter_repo_url: ''                         # if set, repo will be added to /etc/yum.repos.d/ before yum installation

    # - collect - #
    exporter_metrics_path: /metrics               # default metric path for pg related exporter

    # - node exporter - #
    node_exporter_enabled: true                   # setup node_exporter on instance
    node_exporter_port: 9100                      # default port for node exporter
    node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'

    # - pg exporter - #
    pg_exporter_config: pg_exporter-demo.yaml     # default config files for pg_exporter
    pg_exporter_enabled: true                     # setup pg_exporter on instance
    pg_exporter_port: 9630                        # default port for pg exporter
    pg_exporter_url: ''                           # optional, if not set, generate from reference parameters

    # - pgbouncer exporter - #
    pgbouncer_exporter_enabled: true              # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
    pgbouncer_exporter_port: 9631                 # default port for pgbouncer exporter
    pgbouncer_exporter_url: ''                    # optional, if not set, generate from reference parameters

    # - promtail - #                              # promtail is a beta feature which requires manual deployment
    promtail_enabled: true                        # enable promtail logging collector?
    promtail_clean: false                         # remove promtail status file? false by default
    promtail_port: 9080                           # default listen address for promtail
    promtail_status_file: /tmp/promtail-status.yml
    promtail_send_url: http://10.10.10.10:3100/loki/api/v1/push  # loki url to receive logs

    #------------------------------------------------------------------------------
    # SERVICE PROVISION
    #------------------------------------------------------------------------------
    pg_weight: 100              # default load balance weight (instance level)

    # - service - #
    pg_services:                                  # how to expose postgres service in cluster?
      # primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
      - name: primary           # service name {{ pg_cluster }}-primary
        src_ip: "*"
        src_port: 5433
        dst_port: pgbouncer     # 5433 route to pgbouncer
        check_url: /primary     # primary health check, success when instance is primary
        selector: "[]"          # select all instance as primary service candidate

      # replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
      - name: replica           # service name {{ pg_cluster }}-replica
        src_ip: "*"
        src_port: 5434
        dst_port: pgbouncer
        check_url: /read-only   # read-only health check. (including primary)
        selector: "[]"          # select all instance as replica service candidate
        selector_backup: "[? pg_role == `primary`]"   # primary are used as backup server in replica service

      # default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
      - name: default           # service's actual name is {{ pg_cluster }}-default
        src_ip: "*"             # service bind ip address, * for all, vip for cluster virtual ip address
        src_port: 5436          # bind port, mandatory
        dst_port: postgres      # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
        check_method: http      # health check method: only http is available for now
        check_port: patroni     # health check port:  patroni|pg_exporter|port_number , patroni by default
        check_url: /primary     # health check url path, / as default
        check_code: 200         # health check http code, 200 as default
        selector: "[]"          # instance selector
        haproxy:                # haproxy specific fields
          maxconn: 3000         # default front-end connection
          balance: roundrobin   # load balance algorithm (roundrobin by default)
          default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'

      # offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
      - name: offline           # service name {{ pg_cluster }}-offline
        src_ip: "*"
        src_port: 5438
        dst_port: postgres
        check_url: /replica     # offline MUST be a replica
        selector: "[? pg_role == `offline` || pg_offline_query ]"         # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
        selector_backup: "[? pg_role == `replica` && !pg_offline_query]"  # replica are used as backup server in offline service

    pg_services_extra: []        # extra services to be added

    # - haproxy - #
    haproxy_enabled: true                         # enable haproxy among every cluster members
    haproxy_reload: true                          # reload haproxy after config
    haproxy_admin_auth_enabled: false             # enable authentication for haproxy admin?
    haproxy_admin_username: admin                 # default haproxy admin username
    haproxy_admin_password: admin                 # default haproxy admin password
    haproxy_exporter_port: 9101                   # default admin/exporter port
    haproxy_client_timeout: 3h                    # client side connection timeout
    haproxy_server_timeout: 3h                    # server side connection timeout

    # - vip - #
    vip_mode: none                                # none | l2 | l4
    vip_reload: true                              # whether reload service after config
    # vip_address: 127.0.0.1                      # virtual ip address ip (l2 or l4)
    # vip_cidrmask: 24                            # virtual ip address cidr mask (l2 only)
    # vip_interface: eth0                         # virtual ip network interface (l2 only)

    # - dns - #                                   # NOT IMPLEMENTED
    # dns_mode: vip                               # vip|all|selector: how to resolve cluster DNS?
    # dns_selector: '[]'                          # if dns_mode == vip, filter instances been resolved

...

8.2 - 内核优化

Pigsty针对操作系统内核进行的参数调整

Pigsty使用tuned调整操作系统配置,tuned是CentOS7自带的调参工具。

Pigsty Tuned配置

Pigsty默认会为操作系统安装四种tuned profile

  • OLTP:针对常规业务库,优化延迟
  • OLAP:针对分析库,优化吞吐量
  • CRIT:针对核心业务库,优化RPO
  • TINY:针对微型实例与虚拟机
tuned-adm profile oltp    # 启用OLTP模式
tuned-adm profile olap    # 启用OLAP模式
tuned-adm profile crit    # 启用CRIT模式
tuned-adm profile tiny    # 启用TINY模式

Tuned基本操作

# 如需启动 tuned,请以 root 身份运行下列指令:
systemctl start tuned

# 若要在每次计算机启动时激活 tuned,请输入以下指令:
systemctl enable tuned

# 其它的 tuned 控制,例如配置文件选择等,请使用:
tuned-adm

# 若要查看可用的已安装配置文件,此命令需要 tuned 服务正在运行。
tuned-adm list

# 若要查看目前已激活的配置文件,请运行:
tuned-adm active

# 若要选择或激活某一配置文件,请运行:
tuned-adm profile profile
# 例如
tuned-adm profile powersave

# 若要让 tuned 推荐最适合您的系统的配置文件,同时不改变任何现有的配置文件,也不使用安装期间使用过的逻辑,请运行以下指令:
tuned-adm recommend

# 要禁用所有微调:
tuned-adm off

要列出所有可用配置文件并识别目前激活的配置文件,请运行:
tuned-adm list
要只显示当前激活的配置文件请运行:
tuned-adm active
要切换到某个可用的配置文件请运行:
tuned-adm profile profile_name
例如:
tuned-adm profile server-powersave

OLTP配置

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-06-29
# Desc      :   Tune operatiing system to oltp mode
# Path      :   /etc/tuned/oltp/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL OLTP System
include=network-latency

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}

# total shmem size in pages:  $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}

# total shmem segs 4096 -> 8192
kernel.shmmni=8192

# total msg queue number, set to mem size in MB
kernel.msgmni=32768

# max length of message queue
kernel.msgmnb=65536

# max size of message
kernel.msgmax=65536

kernel.pid_max=131072

# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536

# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0

# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000

#-------------------------------------------------------------#
#                             VM                              #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0

# disable when most mem are for file cache
vm.zone_reclaim_mode=0

# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80

# vm.dirty_background_bytes=67108864 # 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_ratio=3       # latency-performance default
vm.dirty_ratio=10                 # latency-performance default

# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536

#-------------------------------------------------------------#
#                        Filesystem                           #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160

# max concurrent unfinished async io, should be larger than 1M.  65536->1M
fs.aio-max-nr=1048576


#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304

# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000

# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000

net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1

# max connection tracking number
net.netfilter.nf_conntrack_max=1048576

OLAP配置

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-09-18
# Desc      :   Tune operatiing system to olap mode
# Path      :   /etc/tuned/olap/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL OLAP System
include=network-throughput

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}

# total shmem size in pages:  $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}

# total shmem segs 4096 -> 8192
kernel.shmmni=8192

# total msg queue number, set to mem size in MB
kernel.msgmni=32768

# max length of message queue
kernel.msgmnb=65536

# max size of message
kernel.msgmax=65536

kernel.pid_max=131072

# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536

# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0

# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000

#-------------------------------------------------------------#
#                             VM                              #
#-------------------------------------------------------------#
# try not using swap
# vm.swappiness=10

# disable when most mem are for file cache
vm.zone_reclaim_mode=0

# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80

vm.dirty_background_ratio = 10    # throughput-performance default
vm.dirty_ratio=80                 # throughput-performance default 40 -> 80

# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536

#-------------------------------------------------------------#
#                        Filesystem                           #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160

# max concurrent unfinished async io, should be larger than 1M.  65536->1M
fs.aio-max-nr=1048576


#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304

# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000

# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000

net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1

# max connection tracking number
net.netfilter.nf_conntrack_max=1048576

CRIT配置

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-06-29
# Desc      :   Tune operatiing system to crit mode
# Path      :   /etc/tuned/crit/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL CRIT System
include=network-latency

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}

# total shmem size in pages:  $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}

# total shmem segs 4096 -> 8192
kernel.shmmni=8192

# total msg queue number, set to mem size in MB
kernel.msgmni=32768

# max length of message queue
kernel.msgmnb=65536

# max size of message
kernel.msgmax=65536

kernel.pid_max=131072

# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536

# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0

# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000

#-------------------------------------------------------------#
#                             VM                              #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0

# disable when most mem are for file cache
vm.zone_reclaim_mode=0

# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=100

# 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_bytes=67108864
# vm.dirty_background_ratio=3       # latency-performance default
vm.dirty_ratio=6                    # latency-performance default

# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536

#-------------------------------------------------------------#
#                        Filesystem                           #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160

# max concurrent unfinished async io, should be larger than 1M.  65536->1M
fs.aio-max-nr=1048576


#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304

# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000

# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000

net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1

# max connection tracking number
net.netfilter.nf_conntrack_max=1048576

TINY配置

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-06-29
# Desc      :   Tune operatiing system to tiny mode
# Path      :   /etc/tuned/tiny/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL TINY System
# include=virtual-guest

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up.  Workloads
# that mostly use file mappings may be able to use even higher values.
#
# The generator of dirty data starts writeback at this percentage (system default
# is 20%)
vm.dirty_ratio = 40

# Filesystem I/O is usually much more efficient than swapping, so try to keep
# swapping low.  It's usually safe to go even lower than this on systems with
# server-grade storage.
vm.swappiness = 30

#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

数据库内核调优参考

# Database kernel optimisation
fs.aio-max-nr = 1048576 # 限制并发未完成的异步请求数目,,不应小于1M
fs.file-max = 16777216  # 最大打开16M个文件

# kernel
kernel.shmmax = 485058		# 共享内存最大页面数量: $(expr $(getconf _PHYS_PAGES) / 2)
kernel.shmall = 1986797568 	# 共享内存总大小: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
kernel.shmmni = 16384 		# 系统范围内共享内存段的最大数量 4096 -> 16384
kernel.msgmni = 32768		# 系统的消息队列数目,影响可以启动的代理程序数 设为内存MB数
kernel.msgmnb = 65536		# 影响队列的大小
kernel.msgmax = 65536		# 影响队列中可以发送的消息的大小
kernel.numa_balancing = 0   # Numa禁用
kernel.sched_migration_cost_ns = 5000000 # 5ms内,调度认为进程还是Hot的。
kernel.sem = 2048 134217728 2048 65536   # 每个信号集最大信号量2048,系统总共可用信号量134217728,单次最大操作2048,信号集总数65536

# vm
vm.dirty_ratio = 80                       # 绝对限制,超过80%阻塞写请求刷盘
vm.dirty_background_bytes = 268435456     # 256MB脏数据唤醒刷盘进程
vm.dirty_expire_centisecs = 6000          # 1分钟前的数据被认为需要刷盘
vm.dirty_writeback_centisecs= 500         # 刷新进程运行间隔5秒
vm.mmap_min_addr = 65536                  # 禁止访问0x10000下的内存
vm.zone_reclaim_mode = 0                  # Numa禁用

# vm swap
vm.swappiness = 0                         # 禁用SWAP,但高水位仍会有
vm.overcommit_memory = 2                  # 允许一定程度的Overcommit
vm.overcommit_ratio = 50                  # 允许的Overcommit:$((($mem - $swap) * 100 / $mem))

# tcp memory
net.ipv4.tcp_rmem = 8192 65536 16777216		# tcp读buffer: 32M/256M/16G
net.ipv4.tcp_wmem = 8192 65536 16777216		# tcp写buffer: 32M/256M/16G
net.ipv4.tcp_mem = 131072 262144 16777216	# tcp 内存使用 512M/1G/16G
net.core.rmem_default = 262144      		# 接受缓冲区默认大小: 256K
net.core.rmem_max = 4194304         		# 接受缓冲区最大大小: 4M
net.core.wmem_default = 262144      		# 发送缓冲区默认大小: 256K
net.core.wmem_max = 4194304         		# 发送缓冲区最大大小: 4M
# tcp keepalive
net.ipv4.tcp_keepalive_intvl = 20	# 探测没有确认时,重新发送探测的频度。默认75s -> 20s
net.ipv4.tcp_keepalive_probes = 3	# 3 * 20 = 1分钟超时断开
net.ipv4.tcp_keepalive_time = 60	# 探活周期1分钟
# tcp port resure
net.ipv4.tcp_tw_reuse = 1           # 允许将TIME_WAIT socket用于新的TCP连接。默认为0
net.ipv4.tcp_tw_recycle = 0			# 快速回收,已弃用
net.ipv4.tcp_fin_timeout = 5        # 保持在FIN-WAIT-2状态的秒时间
net.ipv4.tcp_timestamps = 1
# tcp anti-flood
net.ipv4.tcp_syncookies = 1			# SYN_RECV队列满后发cookie,防止恶意攻击
net.ipv4.tcp_synack_retries = 1		# 收到不完整sync后的重试次数 5->2
net.ipv4.tcp_syn_retries = 1         #表示在内核放弃建立连接之前发送SYN包的数量。
# tcp load-balancer
net.ipv4.ip_forward = 1						# IP转发
net.ipv4.ip_nonlocal_bind = 1				# 绑定非本机地址
net.netfilter.nf_conntrack_max = 1048576	# 最大跟踪连接数
net.ipv4.ip_local_port_range = 10000 65535	# 端口范围
net.ipv4.tcp_max_tw_buckets = 262144		# 256k  TIME_WAIT
net.core.somaxconn = 65535          		# 限制LISTEN队列最大数据包量,触发重传机制。
net.ipv4.tcp_max_syn_backlog = 8192 		# SYN队列大小:1024->8192
net.core.netdev_max_backlog = 8192			# 网卡收包快于内核时,允许队列长度

8.3 - 指标清单

Pigsty可用监控指标清单

下面是Pigsty目前可用的监控指标列表。

衍生指标的定义规则,请查阅 衍生指标 一节。

监控指标列表

name
go_gc_duration_seconds
go_gc_duration_seconds_count
go_gc_duration_seconds_sum
go_goroutines
go_info
go_memstats_alloc_bytes
go_memstats_alloc_bytes_total
go_memstats_buck_hash_sys_bytes
go_memstats_frees_total
go_memstats_gc_cpu_fraction
go_memstats_gc_sys_bytes
go_memstats_heap_alloc_bytes
go_memstats_heap_idle_bytes
go_memstats_heap_inuse_bytes
go_memstats_heap_objects
go_memstats_heap_released_bytes
go_memstats_heap_sys_bytes
go_memstats_last_gc_time_seconds
go_memstats_lookups_total
go_memstats_mallocs_total
go_memstats_mcache_inuse_bytes
go_memstats_mcache_sys_bytes
go_memstats_mspan_inuse_bytes
go_memstats_mspan_sys_bytes
go_memstats_next_gc_bytes
go_memstats_other_sys_bytes
go_memstats_stack_inuse_bytes
go_memstats_stack_sys_bytes
go_memstats_sys_bytes
go_threads
haproxy_backend_active_servers
haproxy_backend_backup_servers
haproxy_backend_bytes_in_total
haproxy_backend_bytes_out_total
haproxy_backend_check_last_change_seconds
haproxy_backend_check_up_down_total
haproxy_backend_client_aborts_total
haproxy_backend_connect_time_average_seconds
haproxy_backend_connection_attempts_total
haproxy_backend_connection_errors_total
haproxy_backend_connection_reuses_total
haproxy_backend_current_queue
haproxy_backend_current_sessions
haproxy_backend_downtime_seconds_total
haproxy_backend_failed_header_rewriting_total
haproxy_backend_http_cache_hits_total
haproxy_backend_http_cache_lookups_total
haproxy_backend_http_comp_bytes_bypassed_total
haproxy_backend_http_comp_bytes_in_total
haproxy_backend_http_comp_bytes_out_total
haproxy_backend_http_comp_responses_total
haproxy_backend_http_requests_total
haproxy_backend_http_responses_total
haproxy_backend_internal_errors_total
haproxy_backend_last_session_seconds
haproxy_backend_limit_sessions
haproxy_backend_loadbalanced_total
haproxy_backend_max_connect_time_seconds
haproxy_backend_max_queue
haproxy_backend_max_queue_time_seconds
haproxy_backend_max_response_time_seconds
haproxy_backend_max_session_rate
haproxy_backend_max_sessions
haproxy_backend_max_total_time_seconds
haproxy_backend_queue_time_average_seconds
haproxy_backend_redispatch_warnings_total
haproxy_backend_requests_denied_total
haproxy_backend_response_errors_total
haproxy_backend_response_time_average_seconds
haproxy_backend_responses_denied_total
haproxy_backend_retry_warnings_total
haproxy_backend_server_aborts_total
haproxy_backend_sessions_total
haproxy_backend_status
haproxy_backend_total_time_average_seconds
haproxy_backend_weight
haproxy_frontend_bytes_in_total
haproxy_frontend_bytes_out_total
haproxy_frontend_connections_rate_max
haproxy_frontend_connections_total
haproxy_frontend_current_sessions
haproxy_frontend_denied_connections_total
haproxy_frontend_denied_sessions_total
haproxy_frontend_failed_header_rewriting_total
haproxy_frontend_http_cache_hits_total
haproxy_frontend_http_cache_lookups_total
haproxy_frontend_http_comp_bytes_bypassed_total
haproxy_frontend_http_comp_bytes_in_total
haproxy_frontend_http_comp_bytes_out_total
haproxy_frontend_http_comp_responses_total
haproxy_frontend_http_requests_rate_max
haproxy_frontend_http_requests_total
haproxy_frontend_http_responses_total
haproxy_frontend_intercepted_requests_total
haproxy_frontend_internal_errors_total
haproxy_frontend_limit_session_rate
haproxy_frontend_limit_sessions
haproxy_frontend_max_session_rate
haproxy_frontend_max_sessions
haproxy_frontend_request_errors_total
haproxy_frontend_requests_denied_total
haproxy_frontend_responses_denied_total
haproxy_frontend_sessions_total
haproxy_frontend_status
haproxy_process_active_peers
haproxy_process_busy_polling_enabled
haproxy_process_connected_peers
haproxy_process_connections_total
haproxy_process_current_backend_ssl_key_rate
haproxy_process_current_connection_rate
haproxy_process_current_connections
haproxy_process_current_frontend_ssl_key_rate
haproxy_process_current_run_queue
haproxy_process_current_session_rate
haproxy_process_current_ssl_connections
haproxy_process_current_ssl_rate
haproxy_process_current_tasks
haproxy_process_current_zlib_memory
haproxy_process_dropped_logs_total
haproxy_process_frontent_ssl_reuse
haproxy_process_hard_max_connections
haproxy_process_http_comp_bytes_in_total
haproxy_process_http_comp_bytes_out_total
haproxy_process_idle_time_percent
haproxy_process_jobs
haproxy_process_limit_connection_rate
haproxy_process_limit_http_comp
haproxy_process_limit_session_rate
haproxy_process_limit_ssl_rate
haproxy_process_listeners
haproxy_process_max_backend_ssl_key_rate
haproxy_process_max_connection_rate
haproxy_process_max_connections
haproxy_process_max_fds
haproxy_process_max_frontend_ssl_key_rate
haproxy_process_max_memory_bytes
haproxy_process_max_pipes
haproxy_process_max_session_rate
haproxy_process_max_sockets
haproxy_process_max_ssl_connections
haproxy_process_max_ssl_rate
haproxy_process_max_zlib_memory
haproxy_process_nbproc
haproxy_process_nbthread
haproxy_process_pipes_free_total
haproxy_process_pipes_used_total
haproxy_process_pool_allocated_bytes
haproxy_process_pool_failures_total
haproxy_process_pool_used_bytes
haproxy_process_relative_process_id
haproxy_process_requests_total
haproxy_process_ssl_cache_lookups_total
haproxy_process_ssl_cache_misses_total
haproxy_process_ssl_connections_total
haproxy_process_start_time_seconds
haproxy_process_stopping
haproxy_process_unstoppable_jobs
haproxy_server_bytes_in_total
haproxy_server_bytes_out_total
haproxy_server_check_code
haproxy_server_check_duration_seconds
haproxy_server_check_failures_total
haproxy_server_check_last_change_seconds
haproxy_server_check_status
haproxy_server_check_up_down_total
haproxy_server_client_aborts_total
haproxy_server_connect_time_average_seconds
haproxy_server_connection_attempts_total
haproxy_server_connection_errors_total
haproxy_server_connection_reuses_total
haproxy_server_current_queue
haproxy_server_current_sessions
haproxy_server_current_throttle
haproxy_server_downtime_seconds_total
haproxy_server_failed_header_rewriting_total
haproxy_server_internal_errors_total
haproxy_server_last_session_seconds
haproxy_server_limit_sessions
haproxy_server_loadbalanced_total
haproxy_server_max_connect_time_seconds
haproxy_server_max_queue
haproxy_server_max_queue_time_seconds
haproxy_server_max_response_time_seconds
haproxy_server_max_session_rate
haproxy_server_max_sessions
haproxy_server_max_total_time_seconds
haproxy_server_queue_limit
haproxy_server_queue_time_average_seconds
haproxy_server_redispatch_warnings_total
haproxy_server_response_errors_total
haproxy_server_response_time_average_seconds
haproxy_server_responses_denied_total
haproxy_server_retry_warnings_total
haproxy_server_server_aborts_total
haproxy_server_server_idle_connections_current
haproxy_server_server_idle_connections_limit
haproxy_server_sessions_total
haproxy_server_status
haproxy_server_total_time_average_seconds
haproxy_server_weight
node:cls:cpu_count
node:cls:cpu_mode
node:cls:cpu_usage
node:cls:cpu_usage_avg5m
node:cls:disk_io_rate
node:cls:disk_iops
node:cls:disk_read_iops
node:cls:disk_read_rate
node:cls:disk_write_iops
node:cls:disk_write_rate
node:cls:mem_usage
node:cls:network_io
node:cls:network_rx
node:cls:network_tx
node:cls:ntp_offset_range
node:cls:sched_timeslicesa
node:cpu:cpu_mode
node:cpu:cpu_usage
node:cpu:cpu_usage_avg5m
node:cpu:sched_timeslices
node:dev:disk_io_rate
node:dev:disk_iops
node:dev:disk_read_iops
node:dev:disk_read_rate
node:dev:disk_read_rt
node:dev:disk_read_time
node:dev:disk_write_iops
node:dev:disk_write_rate
node:dev:disk_write_rt
node:dev:disk_write_time
node:dev:network_io_rate
node:dev:network_rx
node:dev:network_tx
node:fs:avail_bytes
node:fs:free_bytes
node:fs:free_inode
node:fs:inode_usage
node:fs:size_bytes
node:fs:space_deriv_1h
node:fs:space_exhaust
node:fs:space_usage
node:fs:total_inode
node:ins:cpu_count
node:ins:cpu_mode
node:ins:cpu_usage
node:ins:cpu_usage_avg5m
node:ins:ctx_switch
node:ins:disk_io_rate
node:ins:disk_iops
node:ins:disk_read_iops
node:ins:disk_read_rate
node:ins:disk_write_iops
node:ins:disk_write_rate
node:ins:fd_usage
node:ins:forks
node:ins:intrrupt
node:ins:mem_app
node:ins:mem_free
node:ins:mem_usage
node:ins:network_io
node:ins:network_rx
node:ins:network_tx
node:ins:pagefault
node:ins:pagein
node:ins:pageout
node:ins:sched_timeslices
node:ins:stdload1
node:ins:stdload15
node:ins:stdload5
node:ins:swap_usage
node:ins:swapin
node:ins:swapout
node:ins:tcp_active_opens
node:ins:tcp_dropped
node:ins:tcp_insegs
node:ins:tcp_outsegs
node:ins:tcp_overflow
node:ins:tcp_overflow_rate
node:ins:tcp_passive_opens
node:ins:tcp_retrans_rate
node:ins:tcp_retranssegs
node:ins:tcp_segs
node:uptime
node_arp_entries
node_boot_time_seconds
node_context_switches_total
node_cooling_device_cur_state
node_cooling_device_max_state
node_cpu_guest_seconds_total
node_cpu_seconds_total
node_disk_io_now
node_disk_io_time_seconds_total
node_disk_io_time_weighted_seconds_total
node_disk_read_bytes_total
node_disk_read_time_seconds_total
node_disk_reads_completed_total
node_disk_reads_merged_total
node_disk_write_time_seconds_total
node_disk_writes_completed_total
node_disk_writes_merged_total
node_disk_written_bytes_total
node_entropy_available_bits
node_exporter_build_info
node_filefd_allocated
node_filefd_maximum
node_filesystem_avail_bytes
node_filesystem_device_error
node_filesystem_files
node_filesystem_files_free
node_filesystem_free_bytes
node_filesystem_readonly
node_filesystem_size_bytes
node_forks_total
node_intr_total
node_ipvs_connections_total
node_ipvs_incoming_bytes_total
node_ipvs_incoming_packets_total
node_ipvs_outgoing_bytes_total
node_ipvs_outgoing_packets_total
node_load1
node_load15
node_load5
node_memory_Active_anon_bytes
node_memory_Active_bytes
node_memory_Active_file_bytes
node_memory_AnonHugePages_bytes
node_memory_AnonPages_bytes
node_memory_Bounce_bytes
node_memory_Buffers_bytes
node_memory_Cached_bytes
node_memory_CmaFree_bytes
node_memory_CmaTotal_bytes
node_memory_CommitLimit_bytes
node_memory_Committed_AS_bytes
node_memory_DirectMap2M_bytes
node_memory_DirectMap4k_bytes
node_memory_Dirty_bytes
node_memory_HardwareCorrupted_bytes
node_memory_HugePages_Free
node_memory_HugePages_Rsvd
node_memory_HugePages_Surp
node_memory_HugePages_Total
node_memory_Hugepagesize_bytes
node_memory_Inactive_anon_bytes
node_memory_Inactive_bytes
node_memory_Inactive_file_bytes
node_memory_KernelStack_bytes
node_memory_Mapped_bytes
node_memory_MemAvailable_bytes
node_memory_MemFree_bytes
node_memory_MemTotal_bytes
node_memory_Mlocked_bytes
node_memory_NFS_Unstable_bytes
node_memory_PageTables_bytes
node_memory_Percpu_bytes
node_memory_SReclaimable_bytes
node_memory_SUnreclaim_bytes
node_memory_Shmem_bytes
node_memory_Slab_bytes
node_memory_SwapCached_bytes
node_memory_SwapFree_bytes
node_memory_SwapTotal_bytes
node_memory_Unevictable_bytes
node_memory_VmallocChunk_bytes
node_memory_VmallocTotal_bytes
node_memory_VmallocUsed_bytes
node_memory_WritebackTmp_bytes
node_memory_Writeback_bytes
node_netstat_Icmp6_InErrors
node_netstat_Icmp6_InMsgs
node_netstat_Icmp6_OutMsgs
node_netstat_Icmp_InErrors
node_netstat_Icmp_InMsgs
node_netstat_Icmp_OutMsgs
node_netstat_Ip6_InOctets
node_netstat_Ip6_OutOctets
node_netstat_IpExt_InOctets
node_netstat_IpExt_OutOctets
node_netstat_Ip_Forwarding
node_netstat_TcpExt_ListenDrops
node_netstat_TcpExt_ListenOverflows
node_netstat_TcpExt_SyncookiesFailed
node_netstat_TcpExt_SyncookiesRecv
node_netstat_TcpExt_SyncookiesSent
node_netstat_TcpExt_TCPSynRetrans
node_netstat_Tcp_ActiveOpens
node_netstat_Tcp_CurrEstab
node_netstat_Tcp_InErrs
node_netstat_Tcp_InSegs
node_netstat_Tcp_OutSegs
node_netstat_Tcp_PassiveOpens
node_netstat_Tcp_RetransSegs
node_netstat_Udp6_InDatagrams
node_netstat_Udp6_InErrors
node_netstat_Udp6_NoPorts
node_netstat_Udp6_OutDatagrams
node_netstat_Udp6_RcvbufErrors
node_netstat_Udp6_SndbufErrors
node_netstat_UdpLite6_InErrors
node_netstat_UdpLite_InErrors
node_netstat_Udp_InDatagrams
node_netstat_Udp_InErrors
node_netstat_Udp_NoPorts
node_netstat_Udp_OutDatagrams
node_netstat_Udp_RcvbufErrors
node_netstat_Udp_SndbufErrors
node_network_address_assign_type
node_network_carrier
node_network_carrier_changes_total
node_network_device_id
node_network_dormant
node_network_flags
node_network_iface_id
node_network_iface_link
node_network_iface_link_mode
node_network_info
node_network_mtu_bytes
node_network_net_dev_group
node_network_protocol_type
node_network_receive_bytes_total
node_network_receive_compressed_total
node_network_receive_drop_total
node_network_receive_errs_total
node_network_receive_fifo_total
node_network_receive_frame_total
node_network_receive_multicast_total
node_network_receive_packets_total
node_network_transmit_bytes_total
node_network_transmit_carrier_total
node_network_transmit_colls_total
node_network_transmit_compressed_total
node_network_transmit_drop_total
node_network_transmit_errs_total
node_network_transmit_fifo_total
node_network_transmit_packets_total
node_network_transmit_queue_length
node_network_up
node_nf_conntrack_entries
node_nf_conntrack_entries_limit
node_ntp_leap
node_ntp_offset_seconds
node_ntp_reference_timestamp_seconds
node_ntp_root_delay_seconds
node_ntp_root_dispersion_seconds
node_ntp_rtt_seconds
node_ntp_sanity
node_ntp_stratum
node_power_supply_capacity
node_power_supply_cyclecount
node_power_supply_energy_full
node_power_supply_energy_full_design
node_power_supply_energy_watthour
node_power_supply_info
node_power_supply_online
node_power_supply_power_watt
node_power_supply_present
node_power_supply_voltage_min_design
node_power_supply_voltage_volt
node_processes_max_processes
node_processes_max_threads
node_processes_pids
node_processes_state
node_processes_threads
node_procs_blocked
node_procs_running
node_schedstat_running_seconds_total
node_schedstat_timeslices_total
node_schedstat_waiting_seconds_total
node_scrape_collector_duration_seconds
node_scrape_collector_success
node_sockstat_FRAG6_inuse
node_sockstat_FRAG6_memory
node_sockstat_FRAG_inuse
node_sockstat_FRAG_memory
node_sockstat_RAW6_inuse
node_sockstat_RAW_inuse
node_sockstat_TCP6_inuse
node_sockstat_TCP_alloc
node_sockstat_TCP_inuse
node_sockstat_TCP_mem
node_sockstat_TCP_mem_bytes
node_sockstat_TCP_orphan
node_sockstat_TCP_tw
node_sockstat_UDP6_inuse
node_sockstat_UDPLITE6_inuse
node_sockstat_UDPLITE_inuse
node_sockstat_UDP_inuse
node_sockstat_UDP_mem
node_sockstat_UDP_mem_bytes
node_sockstat_sockets_used
node_systemd_socket_accepted_connections_total
node_systemd_socket_current_connections
node_systemd_system_running
node_systemd_timer_last_trigger_seconds
node_systemd_unit_state
node_systemd_units
node_systemd_version
node_tcp_connection_states
node_textfile_scrape_error
node_time_seconds
node_timex_estimated_error_seconds
node_timex_frequency_adjustment_ratio
node_timex_loop_time_constant
node_timex_maxerror_seconds
node_timex_offset_seconds
node_timex_pps_calibration_total
node_timex_pps_error_total
node_timex_pps_frequency_hertz
node_timex_pps_jitter_seconds
node_timex_pps_jitter_total
node_timex_pps_shift_seconds
node_timex_pps_stability_exceeded_total
node_timex_pps_stability_hertz
node_timex_status
node_timex_sync_status
node_timex_tai_offset_seconds
node_timex_tick_seconds
node_udp_queues
node_uname_info
node_vmstat_pgfault
node_vmstat_pgmajfault
node_vmstat_pgpgin
node_vmstat_pgpgout
node_vmstat_pswpin
node_vmstat_pswpout
node_xfs_allocation_btree_compares_total
node_xfs_allocation_btree_lookups_total
node_xfs_allocation_btree_records_deleted_total
node_xfs_allocation_btree_records_inserted_total
node_xfs_block_map_btree_compares_total
node_xfs_block_map_btree_lookups_total
node_xfs_block_map_btree_records_deleted_total
node_xfs_block_map_btree_records_inserted_total
node_xfs_block_mapping_extent_list_compares_total
node_xfs_block_mapping_extent_list_deletions_total
node_xfs_block_mapping_extent_list_insertions_total
node_xfs_block_mapping_extent_list_lookups_total
node_xfs_block_mapping_reads_total
node_xfs_block_mapping_unmaps_total
node_xfs_block_mapping_writes_total
node_xfs_directory_operation_create_total
node_xfs_directory_operation_getdents_total
node_xfs_directory_operation_lookup_total
node_xfs_directory_operation_remove_total
node_xfs_extent_allocation_blocks_allocated_total
node_xfs_extent_allocation_blocks_freed_total
node_xfs_extent_allocation_extents_allocated_total
node_xfs_extent_allocation_extents_freed_total
node_xfs_read_calls_total
node_xfs_vnode_active_total
node_xfs_vnode_allocate_total
node_xfs_vnode_get_total
node_xfs_vnode_hold_total
node_xfs_vnode_reclaim_total
node_xfs_vnode_release_total
node_xfs_vnode_remove_total
node_xfs_write_calls_total
pg:all:active_backends
pg:all:age
pg:all:backends
pg:all:buf_alloc
pg:all:buf_flush
pg:all:commits
pg:all:commits_realtime
pg:all:ixact_backends
pg:all:lag_bytes
pg:all:lag_seconds
pg:all:qps_realtime
pg:all:rollbacks
pg:all:rollbacks_realtime
pg:all:sessions
pg:all:tps_realtime
pg:all:tup_deleted
pg:all:tup_inserted
pg:all:tup_modified
pg:all:tup_selected
pg:all:tup_touched
pg:all:tup_updated
pg:all:wal_rate
pg:all:xacts
pg:all:xacts_avg30m
pg:all:xacts_mu
pg:all:xacts_realtime
pg:all:xacts_sigma
pg:cls:active_backends
pg:cls:age
pg:cls:backends
pg:cls:buf_alloc
pg:cls:buf_flush
pg:cls:ckpt_1h
pg:cls:commits
pg:cls:commits_realtime
pg:cls:ixact_backends
pg:cls:lag_bytes
pg:cls:lag_seconds
pg:cls:leader
pg:cls:load0
pg:cls:load1
pg:cls:load15
pg:cls:load5
pg:cls:lock_count
pg:cls:locks
pg:cls:primarys
pg:cls:qps_realtime
pg:cls:replicas
pg:cls:rlock
pg:cls:rollbacks
pg:cls:rollbacks_realtime
pg:cls:saturation0
pg:cls:saturation1
pg:cls:saturation15
pg:cls:saturation5
pg:cls:sessions
pg:cls:size
pg:cls:synchronous
pg:cls:temp_bytes
pg:cls:temp_files
pg:cls:timeline
pg:cls:tps_realtime
pg:cls:tup_deleted
pg:cls:tup_inserted
pg:cls:tup_modified
pg:cls:tup_selected
pg:cls:tup_touched
pg:cls:tup_updated
pg:cls:wal_rate
pg:cls:wlock
pg:cls:xacts
pg:cls:xacts_avg30m
pg:cls:xacts_mu
pg:cls:xacts_realtime
pg:cls:xacts_sigma
pg:cls:xlock
pg:db:age_deriv_1h
pg:db:age_exhaust
pg:db:backends
pg:db:blks_access_1m
pg:db:blks_hit_1m
pg:db:blks_read_1m
pg:db:buffer_hit_rate
pg:db:commits
pg:db:commits_realtime
pg:db:io_time_usage
pg:db:lock_count
pg:db:locks
pg:db:pool_current_conn
pg:db:pool_disabled
pg:db:pool_max_conn
pg:db:pool_paused
pg:db:pool_reserve_size
pg:db:pool_size
pg:db:qps_realtime
pg:db:read_time_usage
pg:db:rlock
pg:db:rollbacks
pg:db:rollbacks_realtime
pg:db:sessions
pg:db:temp_bytes
pg:db:temp_files
pg:db:tps_realtime
pg:db:tup_deleted
pg:db:tup_inserted
pg:db:tup_modified
pg:db:tup_selected
pg:db:tup_touched
pg:db:tup_updated
pg:db:wlock
pg:db:write_time_usage
pg:db:xacts
pg:db:xacts_avg30m
pg:db:xacts_mu
pg:db:xacts_realtime
pg:db:xacts_sigma
pg:db:xlock
pg:ins:active_backends
pg:ins:age
pg:ins:backends
pg:ins:buf_alloc
pg:ins:buf_flush
pg:ins:buf_flush_backend
pg:ins:buf_flush_checkpoint
pg:ins:checkpoint_lsn
pg:ins:ckpt_req
pg:ins:ckpt_timed
pg:ins:commits
pg:ins:commits_realtime
pg:ins:free_clients
pg:ins:free_servers
pg:ins:hit_rate
pg:ins:ixact_backends
pg:ins:lag_bytes
pg:ins:lag_seconds
pg:ins:last_ckpt
pg:ins:load0
pg:ins:load1
pg:ins:load15
pg:ins:load5
pg:ins:lock_count
pg:ins:locks
pg:ins:login_clients
pg:ins:pool_databases
pg:ins:pool_users
pg:ins:pools
pg:ins:qps_realtime
pg:ins:query_rt
pg:ins:query_rt_avg30m
pg:ins:query_rt_mu
pg:ins:query_rt_sigma
pg:ins:query_time_rate15m
pg:ins:query_time_rate1m
pg:ins:query_time_rate5m
pg:ins:recv_init_lsn
pg:ins:recv_init_tli
pg:ins:recv_last_lsn
pg:ins:recv_last_tli
pg:ins:redo_lsn
pg:ins:rlock
pg:ins:rollbacks
pg:ins:rollbacks_realtime
pg:ins:saturation0
pg:ins:saturation1
pg:ins:saturation15
pg:ins:saturation5
pg:ins:sessions
pg:ins:slot_retained_bytes
pg:ins:temp_bytes
pg:ins:temp_files
pg:ins:tps_realtime
pg:ins:tup_deleted
pg:ins:tup_inserted
pg:ins:tup_modified
pg:ins:tup_selected
pg:ins:tup_touched
pg:ins:tup_updated
pg:ins:used_clients
pg:ins:wal_rate
pg:ins:wlock
pg:ins:xact_rt
pg:ins:xact_rt_avg30m
pg:ins:xact_rt_mu
pg:ins:xact_rt_sigma
pg:ins:xact_time_rate15m
pg:ins:xact_time_rate1m
pg:ins:xact_time_rate5m
pg:ins:xacts
pg:ins:xacts_avg30m
pg:ins:xacts_mu
pg:ins:xacts_realtime
pg:ins:xacts_sigma
pg:ins:xlock
pg:query:call
pg:query:rt
pg:svc:active_backends
pg:svc:backends
pg:svc:buf_alloc
pg:svc:buf_flush
pg:svc:commits
pg:svc:commits_realtime
pg:svc:ixact_backends
pg:svc:load0
pg:svc:load1
pg:svc:load15
pg:svc:load5
pg:svc:lock_count
pg:svc:locks
pg:svc:qps_realtime
pg:svc:query_rt
pg:svc:query_rt_avg30m
pg:svc:query_rt_mu
pg:svc:query_rt_sigma
pg:svc:rlock
pg:svc:rollbacks
pg:svc:rollbacks_realtime
pg:svc:sessions
pg:svc:temp_bytes
pg:svc:temp_files
pg:svc:tps_realtime
pg:svc:tup_deleted
pg:svc:tup_inserted
pg:svc:tup_modified
pg:svc:tup_selected
pg:svc:tup_touched
pg:svc:tup_updated
pg:svc:wlock
pg:svc:xact_rt
pg:svc:xact_rt_avg30m
pg:svc:xact_rt_mu
pg:svc:xact_rt_sigma
pg:svc:xacts
pg:svc:xacts_avg30m
pg:svc:xacts_mu
pg:svc:xacts_realtime
pg:svc:xacts_sigma
pg:svc:xlock
pg_activity_count
pg_activity_max_conn_duration
pg_activity_max_duration
pg_activity_max_tx_duration
pg_backend_count
pg_backup_time
pg_bgwriter_buffers_alloc
pg_bgwriter_buffers_backend
pg_bgwriter_buffers_backend_fsync
pg_bgwriter_buffers_checkpoint
pg_bgwriter_buffers_clean
pg_bgwriter_checkpoint_sync_time
pg_bgwriter_checkpoint_write_time
pg_bgwriter_checkpoints_req
pg_bgwriter_checkpoints_timed
pg_bgwriter_maxwritten_clean
pg_bgwriter_stats_reset
pg_boot_time
pg_checkpoint_checkpoint_lsn
pg_checkpoint_elapse
pg_checkpoint_full_page_writes
pg_checkpoint_newest_commit_ts_xid
pg_checkpoint_next_multi_offset
pg_checkpoint_next_multixact_id
pg_checkpoint_next_oid
pg_checkpoint_next_xid
pg_checkpoint_next_xid_epoch
pg_checkpoint_oldest_active_xid
pg_checkpoint_oldest_commit_ts_xid
pg_checkpoint_oldest_multi_dbid
pg_checkpoint_oldest_multi_xid
pg_checkpoint_oldest_xid
pg_checkpoint_oldest_xid_dbid
pg_checkpoint_prev_tli
pg_checkpoint_redo_lsn
pg_checkpoint_time
pg_checkpoint_tli
pg_class_relage
pg_class_relpages
pg_class_relsize
pg_class_reltuples
pg_conf_reload_time
pg_database_age
pg_database_allow_conn
pg_database_conn_limit
pg_database_frozen_xid
pg_database_is_template
pg_db_blk_read_time
pg_db_blk_write_time
pg_db_blks_access
pg_db_blks_hit
pg_db_blks_read
pg_db_checksum_failures
pg_db_checksum_last_failure
pg_db_confl_bufferpin
pg_db_confl_deadlock
pg_db_confl_lock
pg_db_confl_snapshot
pg_db_confl_tablespace
pg_db_conflicts
pg_db_deadlocks
pg_db_numbackends
pg_db_stats_reset
pg_db_temp_bytes
pg_db_temp_files
pg_db_tup_deleted
pg_db_tup_fetched
pg_db_tup_inserted
pg_db_tup_modified
pg_db_tup_returned
pg_db_tup_updated
pg_db_xact_commit
pg_db_xact_rollback
pg_db_xact_total
pg_downstream_count
pg_exporter_last_scrape_time
pg_exporter_query_cache_ttl
pg_exporter_query_scrape_duration
pg_exporter_query_scrape_error_count
pg_exporter_query_scrape_hit_count
pg_exporter_query_scrape_metric_count
pg_exporter_query_scrape_total_count
pg_exporter_scrape_duration
pg_exporter_scrape_error_count
pg_exporter_scrape_total_count
pg_exporter_server_scrape_duration
pg_exporter_server_scrape_total_count
pg_exporter_server_scrape_total_seconds
pg_exporter_up
pg_exporter_uptime
pg_flush_lsn
pg_func_calls
pg_func_self_time
pg_func_total_time
pg_in_recovery
pg_index_bloat_ratio
pg_index_bloat_size
pg_index_idx_blks_hit
pg_index_idx_blks_read
pg_index_idx_scan
pg_index_idx_tup_fetch
pg_index_idx_tup_read
pg_insert_lsn
pg_is_in_backup
pg_is_in_recovery
pg_is_primary
pg_is_replica
pg_is_wal_replay_paused
pg_lag
pg_last_replay_time
pg_lock_count
pg_lsn
pg_meta_info
pg_query_blk_io_time
pg_query_calls
pg_query_max_time
pg_query_mean_time
pg_query_min_time
pg_query_rows
pg_query_stddev_time
pg_query_total_time
pg_query_wal_bytes
pg_receive_lsn
pg_replay_lsn
pg_setting_block_size
pg_setting_data_checksums
pg_setting_max_connections
pg_setting_max_locks_per_transaction
pg_setting_max_prepared_transactions
pg_setting_max_replication_slots
pg_setting_max_wal_senders
pg_setting_max_worker_processes
pg_setting_wal_log_hints
pg_shmem_allocated_size
pg_shmem_offset
pg_shmem_size
pg_size_bytes
pg_slru_blks_exists
pg_slru_blks_hit
pg_slru_blks_read
pg_slru_blks_written
pg_slru_blks_zeroed
pg_slru_flushes
pg_slru_stats_reset
pg_slru_truncates
pg_status
pg_sync_standby_disabled
pg_sync_standby_enabled
pg_table_analyze_count
pg_table_autoanalyze_count
pg_table_autovacuum_count
pg_table_bloat_ratio
pg_table_bloat_size
pg_table_heap_blks_hit
pg_table_heap_blks_read
pg_table_idx_blks_hit
pg_table_idx_blks_read
pg_table_idx_scan
pg_table_idx_tup_fetch
pg_table_last_analyze
pg_table_last_autoanalyze
pg_table_last_autovacuum
pg_table_last_vacuum
pg_table_n_dead_tup
pg_table_n_live_tup
pg_table_n_mod_since_analyze
pg_table_n_tup_del
pg_table_n_tup_hot_upd
pg_table_n_tup_ins
pg_table_n_tup_mod
pg_table_n_tup_upd
pg_table_seq_scan
pg_table_seq_tup_read
pg_table_size_bytes
pg_table_size_indexsize
pg_table_size_relsize
pg_table_size_toastsize
pg_table_tbl_scan
pg_table_tidx_blks_hit
pg_table_tidx_blks_read
pg_table_toast_blks_hit
pg_table_toast_blks_read
pg_table_tup_read
pg_table_vacuum_count
pg_timeline
pg_timestamp
pg_up
pg_uptime
pg_version
pg_write_lsn
pg_xact_xmax
pg_xact_xmin
pg_xact_xnum
pgbouncer_database_current_connections
pgbouncer_database_disabled
pgbouncer_database_max_connections
pgbouncer_database_paused
pgbouncer_database_pool_size
pgbouncer_database_reserve_pool
pgbouncer_exporter_last_scrape_time
pgbouncer_exporter_query_cache_ttl
pgbouncer_exporter_query_scrape_duration
pgbouncer_exporter_query_scrape_error_count
pgbouncer_exporter_query_scrape_hit_count
pgbouncer_exporter_query_scrape_metric_count
pgbouncer_exporter_query_scrape_total_count
pgbouncer_exporter_scrape_duration
pgbouncer_exporter_scrape_error_count
pgbouncer_exporter_scrape_total_count
pgbouncer_exporter_server_scrape_duration
pgbouncer_exporter_server_scrape_total_count
pgbouncer_exporter_server_scrape_total_seconds
pgbouncer_exporter_up
pgbouncer_exporter_uptime
pgbouncer_in_recovery
pgbouncer_list_items
pgbouncer_pool_active_clients
pgbouncer_pool_active_servers
pgbouncer_pool_idle_servers
pgbouncer_pool_login_servers
pgbouncer_pool_maxwait
pgbouncer_pool_maxwait_us
pgbouncer_pool_tested_servers
pgbouncer_pool_used_servers
pgbouncer_pool_waiting_clients
pgbouncer_stat_avg_query_count
pgbouncer_stat_avg_query_time
pgbouncer_stat_avg_recv
pgbouncer_stat_avg_sent
pgbouncer_stat_avg_wait_time
pgbouncer_stat_avg_xact_count
pgbouncer_stat_avg_xact_time
pgbouncer_stat_total_query_count
pgbouncer_stat_total_query_time
pgbouncer_stat_total_received
pgbouncer_stat_total_sent
pgbouncer_stat_total_wait_time
pgbouncer_stat_total_xact_count
pgbouncer_stat_total_xact_time
pgbouncer_up
pgbouncer_version
process_cpu_seconds_total
process_max_fds
process_open_fds
process_resident_memory_bytes
process_start_time_seconds
process_virtual_memory_bytes
process_virtual_memory_max_bytes
promhttp_metric_handler_errors_total
promhttp_metric_handler_requests_in_flight
promhttp_metric_handler_requests_total
scrape_duration_seconds
scrape_samples_post_metric_relabeling
scrape_samples_scraped
scrape_series_added
up

8.4 - 衍生指标

Pigsty衍生监控指标的定义详情

这里是Pigsty所有衍生指标的定义规则。

机器节点聚合指标

---
  - name: node-rules
    rules:
      #==============================================================#
      #                         Aliveness                            #
      #==============================================================#
      # TODO: change this to your node exporter port
      - record: node_exporter_up
        expr: up{instance=~".*:9099"}
      - record: node:uptime
        expr: time() - node_boot_time_seconds{}


      #==============================================================#
      #                             CPU                              #
      #==============================================================#
      # cpu mode time ratio
      - record: node:cpu:cpu_mode
        expr: irate(node_cpu_seconds_total{}[1m])
      - record: node:ins:cpu_mode
        expr: sum without (cpu) (node:cpu:cpu_mode)
      - record: node:cls:cpu_mode
        expr: sum by (cls, mode) (node:ins:cpu_mode)

      # cpu schedule time-slices
      - record: node:cpu:sched_timeslices
        expr: irate(node_schedstat_timeslices_total{}[1m])
      - record: node:ins:sched_timeslices
        expr: sum without (cpu) (node:cpu:sched_timeslices)
      - record: node:cls:sched_timeslicesa
        expr: sum by (cls) (node:ins:sched_timeslices)

      # cpu count
      - record: node:ins:cpu_count
        expr: count without (cpu) (node:cpu:cpu_usage)
      - record: node:cls:cpu_count
        expr: sum by (cls) (node:ins:cpu_count)

      # cpu usage
      - record: node:cpu:cpu_usage
        expr: 1 - sum without (mode) (node:cpu:cpu_mode{mode="idle"})
      - record: node:ins:cpu_usage
        expr: sum without (cpu) (node:cpu:cpu_usage) / node:ins:cpu_count
      - record: node:cls:cpu_usage
        expr: sum by (cls) (node:ins:cpu_usage * node:ins:cpu_count) / sum by (cls) (node:ins:cpu_count)

      # cpu usage avg5m
      - record: node:cpu:cpu_usage_avg5m
        expr: avg_over_time(node:cpu:cpu_usage[5m])
      - record: node:ins:cpu_usage_avg5m
        expr: avg_over_time(node:ins:cpu_usage[5m])
      - record: node:cls:cpu_usage_avg5m
        expr: avg_over_time(node:cls:cpu_usage[5m])

      #==============================================================#
      #                            Memory                            #
      #==============================================================#
      # mem usage
      - record: node:ins:mem_app
        expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes - node_memory_Slab_bytes - node_memory_PageTables_bytes - node_memory_SwapCached_bytes
      - record: node:ins:mem_free
        expr: node_memory_MemFree_bytes{} + node_memory_Cached_bytes{}
      - record: node:ins:mem_usage
        expr: node:ins:mem_app / node_memory_MemTotal_bytes
      - record: node:cls:mem_usage
        expr: sum by (cls) (node:ins:mem_app) / sum by (cls) (node_memory_MemTotal_bytes)
      - record: node:ins:swap_usage
        expr: 1 - node_memory_SwapFree_bytes{} / node_memory_SwapTotal_bytes{}


      #==============================================================#
      #                            Disk                              #
      #==============================================================#
      # disk read iops
      - record: node:dev:disk_read_iops
        expr: irate(node_disk_reads_completed_total{device=~"[a-zA-Z-_]+"}[1m])
      - record: node:ins:disk_read_iops
        expr: sum without (device) (node:dev:disk_read_iops)
      - record: node:cls:disk_read_iops
        expr: sum by (cls) (node:ins:disk_read_iops)

      # disk write iops
      - record: node:dev:disk_write_iops
        expr: irate(node_disk_writes_completed_total{device=~"[a-zA-Z-_]+"}[1m])
      - record: node:ins:disk_write_iops
        expr: sum without (device) (node:dev:disk_write_iops)
      - record: node:cls:disk_write_iops
        expr: sum by (cls) (node:ins:disk_write_iops)

      # disk iops
      - record: node:dev:disk_iops
        expr: node:dev:disk_read_iops + node:dev:disk_write_iops
      - record: node:ins:disk_iops
        expr: node:ins:disk_read_iops + node:ins:disk_write_iops
      - record: node:cls:disk_iops
        expr: node:cls:disk_read_iops + node:cls:disk_write_iops

      # read bandwidth (rate1m)
      - record: node:dev:disk_read_rate
        expr: rate(node_disk_read_bytes_total{device=~"[a-zA-Z-_]+"}[1m])
      - record: node:ins:disk_read_rate
        expr: sum without (device) (node:dev:disk_read_rate)
      - record: node:cls:disk_read_rate
        expr: sum by (cls) (node:ins:disk_read_rate)

      # write bandwidth (rate1m)
      - record: node:dev:disk_write_rate
        expr: rate(node_disk_written_bytes_total{device=~"[a-zA-Z-_]+"}[1m])
      - record: node:ins:disk_write_rate
        expr: sum without (device) (node:dev:disk_write_rate)
      - record: node:cls:disk_write_rate
        expr: sum by (cls) (node:ins:disk_write_rate)

      # io bandwidth (rate1m)
      - record: node:dev:disk_io_rate
        expr: node:dev:disk_read_rate + node:dev:disk_write_rate
      - record: node:ins:disk_io_rate
        expr: node:ins:disk_read_rate + node:ins:disk_write_rate
      - record: node:cls:disk_io_rate
        expr: node:cls:disk_read_rate + node:cls:disk_write_rate

      # read/write total time
      - record: node:dev:disk_read_time
        expr: rate(node_disk_read_time_seconds_total{device=~"[a-zA-Z-_]+"}[1m])
      - record: node:dev:disk_write_time
        expr: rate(node_disk_read_time_seconds_total{device=~"[a-zA-Z-_]+"}[1m])

      # read/write response time
      - record: node:dev:disk_read_rt
        expr: node:dev:disk_read_time / node:dev:disk_read_iops
      - record: node:dev:disk_write_rt
        expr: node:dev:disk_write_time / node:dev:disk_write_iops
      - record: node:dev:disk_rt
        expr: (node:dev:disk_read_time + node:dev:disk_write_time) / node:dev:iops


      #==============================================================#
      #                            Network                           #
      #==============================================================#
      # transmit bandwidth (out)
      - record: node:dev:network_tx
        expr: irate(node_network_transmit_bytes_total{}[1m])
      - record: node:ins:network_tx
        expr: sum without (device) (node:dev:network_tx{device!~"lo|bond.*"})
      - record: node:cls:network_tx
        expr: sum by (cls) (node:ins:network_tx)

      # receive bandwidth (in)
      - record: node:dev:network_rx
        expr: irate(node_network_receive_bytes_total{}[1m])
      - record: node:ins:network_rx
        expr: sum without (device) (node:dev:network_rx{device!~"lo|bond.*"})
      - record: node:cls:network_rx
        expr: sum by (cls) (node:ins:network_rx)

      # io bandwidth
      - record: node:dev:network_io_rate
        expr: node:dev:network_tx + node:dev:network_rx
      - record: node:ins:network_io
        expr: node:ins:network_tx + node:ins:network_rx
      - record: node:cls:network_io
        expr: node:cls:network_tx + node:cls:network_rx


      #==============================================================#
      #                           Schedule                           #
      #==============================================================#
      # normalized load
      - record: node:ins:stdload1
        expr: node_load1 / node:ins:cpu_count
      - record: node:ins:stdload5
        expr: node_load5 / node:ins:cpu_count
      - record: node:ins:stdload15
        expr: node_load15 / node:ins:cpu_count

      # process
      - record: node:ins:forks
        expr: irate(node_forks_total[1m])
      # interrupt & context switch
      - record: node:ins:intrrupt
        expr: irate(node_intr_total[1m])
      - record: node:ins:ctx_switch
        expr: irate(node_context_switches_total{}[1m])


      #==============================================================#
      #                              VM                              #
      #==============================================================#
      - record: node:ins:pagefault
        expr: irate(node_vmstat_pgfault[1m])
      - record: node:ins:pagein
        expr: irate(node_vmstat_pgpgin[1m])
      - record: node:ins:pageout
        expr: irate(node_vmstat_pgpgout[1m])
      - record: node:ins:swapin
        expr: irate(node_vmstat_pswpin[1m])
      - record: node:ins:swapout
        expr: irate(node_vmstat_pswpout[1m])


      #==============================================================#
      #                              FS                              #
      #==============================================================#
      # filesystem space usage
      - record: node:fs:free_bytes
        expr: max without(device, fstype) (node_filesystem_free_bytes{fstype!~"(n|root|tmp)fs.*"})
      - record: node:fs:avail_bytes
        expr: max without(device, fstype) (node_filesystem_avail_bytes{fstype!~"(n|root|tmp)fs.*"})
      - record: node:fs:size_bytes
        expr: max without(device, fstype) (node_filesystem_size_bytes{fstype!~"(n|root|tmp)fs.*"})
      - record: node:fs:space_usage
        expr: 1 - (node:fs:avail_bytes{} / node:fs:size_bytes{})
      - record: node:fs:free_inode
        expr: max without(device, fstype) (node_filesystem_files_free{fstype!~"(n|root|tmp)fs.*"})
      - record: node:fs:total_inode
        expr: max without(device, fstype) (node_filesystem_files{fstype!~"(n|root|tmp)fs.*"})

      # space delta and prediction
      - record: node:fs:space_deriv_1h
        expr: 0 - deriv(node_filesystem_avail_bytes{}[1h])
      - record: node:fs:space_exhaust
        expr: (node_filesystem_avail_bytes{} / node:fs:space_deriv_1h{}) > 0

      # fs inode usage
      - record: node:fs:inode_usage
        expr: 1 - (node:fs:free_inode / node:fs:total_inode)
      # file descriptor usage
      - record: node:ins:fd_usage
        expr: node_filefd_allocated / node_filefd_maximum


      #==============================================================#
      #                             TCP                              #
      #==============================================================#
      # tcp segments (rate1m)
      - record: node:ins:tcp_insegs
        expr: rate(node_netstat_Tcp_InSegs{}[1m])
      - record: node:ins:tcp_outsegs
        expr: rate(node_netstat_Tcp_OutSegs{}[1m])
      - record: node:ins:tcp_retranssegs
        expr: rate(node_netstat_Tcp_RetransSegs{}[1m])
      - record: node:ins:tcp_segs
        expr: node:ins:tcp_insegs + node:ins:tcp_outsegs
      # retransmit
      - record: node:ins:tcp_retrans_rate
        expr: node:ins:tcp_retranssegs / node:ins:tcp_outsegs
      # overflow
      - record: node:ins:tcp_overflow_rate
        expr: rate(node_netstat_TcpExt_ListenOverflows[1m])


      #==============================================================#
      #                           Netstat                            #
      #==============================================================#
      # tcp open (rate1m)
      - record: node:ins:tcp_passive_opens
        expr: rate(node_netstat_Tcp_PassiveOpens[1m])
      - record: node:ins:tcp_active_opens
        expr: rate(node_netstat_Tcp_ActiveOpens[1m])
      # tcp close
      - record: node:ins:tcp_attempt_fails
        expr: rate(node_netstat_Tcp_AttemptFails[1m])
      - record: node:ins:tcp_estab_resets
        expr: rate(node_netstat_Tcp_EstabResets[1m])
      # tcp drop
      - record: node:ins:tcp_overflow
        expr: rate(node_netstat_TcpExt_ListenOverflows[1m])
      - record: node:ins:tcp_dropped
        expr: rate(node_netstat_TcpExt_ListenDrops[1m])


      #==============================================================#
      #                             NTP                              #
      #==============================================================#
      - record: node:cls:ntp_offset_range
        expr: max by (cls)(node_ntp_offset_seconds) - min by (cls)(node_ntp_offset_seconds)

...

数据库与连接池聚合指标

---
#==============================================================#
# File      :   pgsql.yml
# Ctime     :   2020-04-22
# Mtime     :   2020-12-03
# Desc      :   Record and alert rules for postgres
# Path      :   /etc/prometheus/rules/pgsql.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

groups:

  ################################################################
  #                         PgSQL Rules                          #
  ################################################################
  - name: pgsql-rules
    rules:

      #==============================================================#
      #                        Aliveness                             #
      #==============================================================#
      # TODO: change these to your pg_exporter & pgbouncer_exporter port
      - record: pg_exporter_up
        expr: up{instance=~".*:9185"}

      - record: pgbouncer_exporter_up
        expr: up{instance=~".*:9127"}


      #==============================================================#
      #                        Identity                              #
      #==============================================================#
      - record: pg_is_primary
        expr: 1 - pg_in_recovery
      - record: pg_is_replica
        expr: pg_in_recovery
      - record: pg_status
        expr: (pg_up{} * 2) +  (1 - pg_in_recovery{})
      # encoded: 0:replica[DOWN] 1:primary[DOWN] 2:replica 3:primary


      #==============================================================#
      #                            Age                               #
      #==============================================================#
      # age
      - record: pg:ins:age
        expr: max without (datname) (pg_database_age{datname!~"template[0-9]"})
      - record: pg:cls:age
        expr: max by (cls) (pg:ins:age)
      - record: pg:all:age
        expr: max(pg:cls:age)

      # age derive and prediction
      - record: pg:db:age_deriv_1h
        expr: deriv(pg_database_age{}[1h])
      - record: pg:db:age_exhaust
        expr: (2147483648 - pg_database_age{}) / pg:db:age_deriv_1h



      #==============================================================#
      #                         Sessions                             #
      #==============================================================#
      # session count (by state)
      - record: pg:db:sessions
        expr: pg_activity_count
      - record: pg:ins:sessions
        expr: sum without (datname) (pg:db:sessions)
      - record: pg:svc:sessions
        expr: sum by (cls, role, state) (pg:ins:sessions)
      - record: pg:cls:sessions
        expr: sum by (cls, state) (pg:ins:sessions)
      - record: pg:all:sessions
        expr: sum by (state) (pg:cls:sessions)

      # backends
      - record: pg:db:backends
        expr: pg_db_numbackends
      - record: pg:ins:backends
        expr: sum without (datname) (pg_db_numbackends)
      - record: pg:svc:backends
        expr: sum by (cls, role) (pg:ins:backends)
      - record: pg:cls:backends
        expr: sum by (cls) (pg:ins:backends)
      - record: pg:all:backends
        expr: sum(pg:cls:backends)

      # active backends
      - record: pg:ins:active_backends
        expr: pg:ins:sessions{state="active"}
      - record: pg:svc:active_backends
        expr: sum by (cls, role) (pg:ins:active_backends)
      - record: pg:cls:active_backends
        expr: sum by (cls) (pg:ins:active_backends)
      - record: pg:all:active_backends
        expr: sum(pg:cls:active_backends)

      # idle in xact backends (including abort)
      - record: pg:ins:ixact_backends
        expr: pg:ins:sessions{state=~"idle in.*"}
      - record: pg:svc:ixact_backends
        expr: sum by (cls, role) (pg:ins:active_backends)
      - record: pg:cls:ixact_backends
        expr: sum by (cls) (pg:ins:active_backends)
      - record: pg:all:ixact_backends
        expr: sum(pg:cls:active_backends)


      #==============================================================#
      #                    Servers (Pgbouncer)                       #
      #==============================================================#

      # active servers
      - record: pg:pool:active_servers
        expr: pgbouncer_pool_active_servers{datname!="pgbouncer"}
      - record: pg:db:active_servers
        expr: sum without(user) (pg:pool:active_servers)
      - record: pg:ins:active_servers
        expr: sum without(user, datname) (pg:pool:active_servers)
      - record: pg:svc:active_servers
        expr: sum by (cls, role) (pg:ins:active_servers)
      - record: pg:cls:active_servers
        expr: sum by (cls) (pg:ins:active_servers)
      - record: pg:all:active_servers
        expr: sum(pg:cls:active_servers)

      # idle servers
      - record: pg:pool:idle_servers
        expr: pgbouncer_pool_idle_servers{datname!="pgbouncer"}
      - record: pg:db:idle_servers
        expr: sum without(user) (pg:pool:idle_servers)
      - record: pg:ins:idle_servers
        expr: sum without(user, datname) (pg:pool:idle_servers)
      - record: pg:svc:idle_servers
        expr: sum by (cls, role) (pg:ins:idle_servers)
      - record: pg:cls:idle_servers
        expr: sum by (cls) (pg:ins:idle_servers)
      - record: pg:all:idle_servers
        expr: sum(pg:cls:idle_servers)

      # used servers
      - record: pg:pool:used_servers
        expr: pgbouncer_pool_used_servers{datname!="pgbouncer"}
      - record: pg:db:used_servers
        expr: sum without(user) (pg:pool:used_servers)
      - record: pg:ins:used_servers
        expr: sum without(user, datname) (pg:pool:used_servers)
      - record: pg:svc:used_servers
        expr: sum by (cls, role) (pg:ins:used_servers)
      - record: pg:cls:used_servers
        expr: sum by (cls) (pg:ins:used_servers)
      - record: pg:all:used_servers
        expr: sum(pg:cls:used_servers)

      # tested servers
      - record: pg:pool:tested_servers
        expr: pgbouncer_pool_tested_servers{datname!="pgbouncer"}
      - record: pg:db:tested_servers
        expr: sum without(user) (pg:pool:tested_servers)
      - record: pg:ins:tested_servers
        expr: sum without(user, datname) (pg:pool:tested_servers)
      - record: pg:svc:tested_servers
        expr: sum by (cls, role) (pg:ins:tested_servers)
      - record: pg:cls:tested_servers
        expr: sum by (cls) (pg:ins:tested_servers)
      - record: pg:all:tested_servers
        expr: sum(pg:cls:tested_servers)

      # login servers
      - record: pg:pool:login_servers
        expr: pgbouncer_pool_login_servers{datname!="pgbouncer"}
      - record: pg:db:login_servers
        expr: sum without(user) (pg:pool:login_servers)
      - record: pg:ins:login_servers
        expr: sum without(user, datname) (pg:pool:login_servers)
      - record: pg:svc:login_servers
        expr: sum by (cls, role) (pg:ins:login_servers)
      - record: pg:cls:login_servers
        expr: sum by (cls) (pg:ins:login_servers)
      - record: pg:all:login_servers
        expr: sum(pg:cls:login_servers)



      #==============================================================#
      #                   Clients (Pgbouncer)                        #
      #==============================================================#
      # active clients
      - record: pg:pool:active_clients
        expr: pgbouncer_pool_active_clients{datname!="pgbouncer"}
      - record: pg:db:active_clients
        expr: sum without(user) (pg:pool:active_clients)
      - record: pg:ins:active_clients
        expr: sum without(user, datname) (pg:pool:active_clients)
      - record: pg:svc:active_clients
        expr: sum by (cls, role) (pg:ins:active_clients)
      - record: pg:cls:active_clients
        expr: sum by (cls) (pg:ins:active_clients)
      - record: pg:all:active_clients
        expr: sum(pg:cls:active_clients)

      # waiting clients
      - record: pg:pool:waiting_clients
        expr: pgbouncer_pool_waiting_clients{datname!="pgbouncer"}
      - record: pg:db:waiting_clients
        expr: sum without(user) (pg:pool:waiting_clients)
      - record: pg:ins:waiting_clients
        expr: sum without(user, datname) (pg:pool:waiting_clients)
      - record: pg:svc:waiting_clients
        expr: sum by (cls, role) (pg:ins:waiting_clients)
      - record: pg:cls:waiting_clients
        expr: sum by (cls) (pg:ins:waiting_clients)
      - record: pg:all:waiting_clients
        expr: sum(pg:cls:waiting_clients)


      #==============================================================#
      #                       Transactions                           #
      #==============================================================#
      # commits (realtime)
      - record: pg:db:commits_realtime
        expr: irate(pg_db_xact_commit{}[1m])
      - record: pg:ins:commits_realtime
        expr: sum without (datname) (pg:db:commits_realtime)
      - record: pg:svc:commits_realtime
        expr: sum by (cls, role) (pg:ins:commits_realtime)
      - record: pg:cls:commits_realtime
        expr: sum by (cls) (pg:ins:commits_realtime)
      - record: pg:all:commits_realtime
        expr: sum(pg:cls:commits_realtime)

      # commits (rate1m)
      - record: pg:db:commits
        expr: rate(pg_db_xact_commit{}[1m])
      - record: pg:ins:commits
        expr: sum without (datname) (pg:db:commits)
      - record: pg:svc:commits
        expr: sum by (cls, role) (pg:ins:commits)
      - record: pg:cls:commits
        expr: sum by (cls) (pg:ins:commits)
      - record: pg:all:commits
        expr: sum(pg:cls:commits)

      # rollbacks realtime
      - record: pg:db:rollbacks_realtime
        expr: irate(pg_db_xact_rollback{}[1m])
      - record: pg:ins:rollbacks_realtime
        expr: sum without (datname) (pg:db:rollbacks_realtime)
      - record: pg:svc:rollbacks_realtime
        expr: sum by (cls, role) (pg:ins:rollbacks_realtime)
      - record: pg:cls:rollbacks_realtime
        expr: sum by (cls) (pg:ins:rollbacks_realtime)
      - record: pg:all:rollbacks_realtime
        expr: sum(pg:cls:rollbacks_realtime)
      # rollbacks
      - record: pg:db:rollbacks
        expr: rate(pg_db_xact_rollback{}[1m])
      - record: pg:ins:rollbacks
        expr: sum without (datname) (pg:db:rollbacks)
      - record: pg:svc:rollbacks
        expr: sum by (cls, role) (pg:ins:rollbacks)
      - record: pg:cls:rollbacks
        expr: sum by (cls) (pg:ins:rollbacks)
      - record: pg:all:rollbacks
        expr: sum(pg:cls:rollbacks)

      # xacts (realtime)
      - record: pg:db:xacts_realtime
        expr: irate(pg_db_xact_commit{}[1m])
      - record: pg:ins:xacts_realtime
        expr: sum without (datname) (pg:db:xacts_realtime)
      - record: pg:svc:xacts_realtime
        expr: sum by (cls, role) (pg:ins:xacts_realtime)
      - record: pg:cls:xacts_realtime
        expr: sum by (cls) (pg:ins:xacts_realtime)
      - record: pg:all:xacts_realtime
        expr: sum(pg:cls:xacts_realtime)
      # xacts (rate1m)
      - record: pg:db:xacts
        expr: rate(pg_db_xact_commit{}[1m])
      - record: pg:ins:xacts
        expr: sum without (datname) (pg:db:xacts)
      - record: pg:svc:xacts
        expr: sum by (cls, role) (pg:ins:xacts)
      - record: pg:cls:xacts
        expr: sum by (cls) (pg:ins:xacts)
      - record: pg:all:xacts
        expr: sum(pg:cls:xacts)
      # xacts avg30m
      - record: pg:db:xacts_avg30m
        expr: avg_over_time(pg:db:xacts[30m])
      - record: pg:ins:xacts_avg30m
        expr: avg_over_time(pg:ins:xacts[30m])
      - record: pg:svc:xacts_avg30m
        expr: avg_over_time(pg:svc:xacts[30m])
      - record: pg:cls:xacts_avg30m
        expr: avg_over_time(pg:cls:xacts[30m])
      - record: pg:all:xacts_avg30m
        expr: avg_over_time(pg:all:xacts[30m])
      # xacts µ
      - record: pg:db:xacts_mu
        expr: avg_over_time(pg:db:xacts_avg30m[30m])
      - record: pg:ins:xacts_mu
        expr: avg_over_time(pg:ins:xacts_avg30m[30m])
      - record: pg:svc:xacts_mu
        expr: avg_over_time(pg:svc:xacts_avg30m[30m])
      - record: pg:cls:xacts_mu
        expr: avg_over_time(pg:cls:xacts_avg30m[30m])
      - record: pg:all:xacts_mu
        expr: avg_over_time(pg:all:xacts_avg30m[30m])
      # xacts σ: sigma
      - record: pg:db:xacts_sigma
        expr: stddev_over_time(pg:db:xacts[30m])
      - record: pg:ins:xacts_sigma
        expr: stddev_over_time(pg:ins:xacts[30m])
      - record: pg:svc:xacts_sigma
        expr: stddev_over_time(pg:svc:xacts[30m])
      - record: pg:cls:xacts_sigma
        expr: stddev_over_time(pg:cls:xacts[30m])
      - record: pg:all:xacts_sigma
        expr: stddev_over_time(pg:all:xacts[30m])


      #==============================================================#
      #                      TPS (Pgbouncer)                         #
      #==============================================================#
      # TPS realtime (irate1m)
      - record: pg:db:tps_realtime
        expr: irate(pgbouncer_stat_total_xact_count{}[1m])
      - record: pg:ins:tps_realtime
        expr: sum without(datname) (pg:db:tps_realtime{})
      - record: pg:svc:tps_realtime
        expr: sum by(cls, role) (pg:ins:tps_realtime{})
      - record: pg:cls:tps_realtime
        expr: sum by(cls) (pg:ins:tps_realtime{})
      - record: pg:all:tps_realtime
        expr: sum(pg:cls:tps_realtime{})

      # TPS (rate1m)
      - record: pg:db:tps
        expr: pgbouncer_stat_avg_xact_count{datname!="pgbouncer"}
      - record: pg:ins:tps
        expr: sum without(datname) (pg:db:tps)
      - record: pg:svc:tps
        expr: sum by (cls, role) (pg:ins:tps)
      - record: pg:cls:tps
        expr: sum by(cls) (pg:ins:tps)
      - record: pg:all:tps
        expr: sum(pg:cls:tps)
      # tps : avg30m
      - record: pg:db:tps_avg30m
        expr: avg_over_time(pg:db:tps[30m])
      - record: pg:ins:tps_avg30m
        expr: avg_over_time(pg:ins:tps[30m])
      - record: pg:svc:tps_avg30m
        expr: avg_over_time(pg:svc:tps[30m])
      - record: pg:cls:tps_avg30m
        expr: avg_over_time(pg:cls:tps[30m])
      - record: pg:all:tps_avg30m
        expr: avg_over_time(pg:all:tps[30m])
      # tps µ
      - record: pg:db:tps_mu
        expr: avg_over_time(pg:db:tps_avg30m[30m])
      - record: pg:ins:tps_mu
        expr: avg_over_time(pg:ins:tps_avg30m[30m])
      - record: pg:svc:tps_mu
        expr: avg_over_time(pg:svc:tps_avg30m[30m])
      - record: pg:cls:tps_mu
        expr: avg_over_time(pg:cls:tps_avg30m[30m])
      - record: pg:all:tps_mu
        expr: avg_over_time(pg:all:tps_avg30m[30m])
      # tps σ
      - record: pg:db:tps_sigma
        expr: stddev_over_time(pg:db:tps[30m])
      - record: pg:ins:tps_sigma
        expr: stddev_over_time(pg:ins:tps[30m])
      - record: pg:svc:tps_sigma
        expr: stddev_over_time(pg:svc:tps[30m])
      - record: pg:cls:tps_sigma
        expr: stddev_over_time(pg:cls:tps[30m])
      - record: pg:all:tps_sigma
        expr: stddev_over_time(pg:all:tps[30m])

      # xact rt (rate1m)
      - record: pg:db:xact_rt
        expr: pgbouncer_stat_avg_xact_time{datname!="pgbouncer"} / 1000000
      - record: pg:ins:xact_rt
        expr: sum without(datname) (rate(pgbouncer_stat_total_xact_time[1m])) / sum without(datname) (rate(pgbouncer_stat_total_xact_count[1m])) / 1000000
      - record: pg:svc:xact_rt
        expr: sum by (cls, role) (rate(pgbouncer_stat_total_xact_time[1m])) / sum by (cls, role) (rate(pgbouncer_stat_total_xact_count[1m])) / 1000000
      # xact_rt avg30m
      - record: pg:db:xact_rt_avg30m
        expr: avg_over_time(pg:db:xact_rt[30m])
      - record: pg:ins:xact_rt_avg30m
        expr: avg_over_time(pg:ins:xact_rt[30m])
      - record: pg:svc:xact_rt_avg30m
        expr: avg_over_time(pg:svc:xact_rt[30m])
      # xact_rt µ
      - record: pg:db:xact_rt_mu
        expr: avg_over_time(pg:db:xact_rt_avg30m[30m])
      - record: pg:ins:xact_rt_mu
        expr: avg_over_time(pg:ins:xact_rt_avg30m[30m])
      - record: pg:svc:xact_rt_mu
        expr: avg_over_time(pg:svc:xact_rt_avg30m[30m])

      # xact_rt σ: stddev30m
      - record: pg:db:xact_rt_sigma
        expr: stddev_over_time(pg:db:xact_rt[30m])
      - record: pg:ins:xact_rt_sigma
        expr: stddev_over_time(pg:ins:xact_rt[30m])
      - record: pg:svc:xact_rt_sigma
        expr: stddev_over_time(pg:svc:xact_rt[30m])



      #==============================================================#
      #                     QPS (Pgbouncer)                          #
      #==============================================================#
      # QPS realtime (irate1m)
      - record: pg:db:qps_realtime
        expr: irate(pgbouncer_stat_total_query_count{}[1m])
      - record: pg:ins:qps_realtime
        expr: sum without(datname) (pg:db:qps_realtime{})
      - record: pg:svc:qps_realtime
        expr: sum by(cls, role) (pg:ins:qps_realtime{})
      - record: pg:cls:qps_realtime
        expr: sum by(cls) (pg:ins:qps_realtime{})
      - record: pg:all:qps_realtime
        expr: sum(pg:cls:qps_realtime{})
      # qps (rate1m)
      - record: pg:db:qps
        expr: pgbouncer_stat_avg_query_count{datname!="pgbouncer"}
      - record: pg:ins:qps
        expr: sum without(datname) (pg:db:qps)
      - record: pg:svc:qps
        expr: sum by (cls, role) (pg:ins:qps)
      - record: pg:cls:qps
        expr: sum by(cls) (pg:ins:qps)
      - record: pg:all:qps
        expr: sum(pg:cls:qps)

      # qps avg30m
      - record: pg:db:qps_avg30m
        expr: avg_over_time(pg:db:qps[30m])
      - record: pg:ins:qps_avg30m
        expr: avg_over_time(pg:ins:qps[30m])
      - record: pg:svc:qps_avg30m
        expr: avg_over_time(pg:svc:qps[30m])
      - record: pg:cls:qps_avg30m
        expr: avg_over_time(pg:cls:qps[30m])
      - record: pg:all:qps_avg30m
        expr: avg_over_time(pg:all:qps[30m])
      # qps µ
      - record: pg:db:qps_mu
        expr: avg_over_time(pg:db:qps_avg30m[30m])
      - record: pg:ins:qps_mu
        expr: avg_over_time(pg:ins:qps_avg30m[30m])
      - record: pg:svc:qps_mu
        expr: avg_over_time(pg:svc:qps_avg30m[30m])
      - record: pg:cls:qps_mu
        expr: avg_over_time(pg:cls:qps_avg30m[30m])
      - record: pg:all:qps_mu
        expr: avg_over_time(pg:all:qps_avg30m[30m])
      # qps σ: stddev30m qps
      - record: pg:db:qps_sigma
        expr: stddev_over_time(pg:db:qps[30m])
      - record: pg:ins:qps_sigma
        expr: stddev_over_time(pg:ins:qps[30m])
      - record: pg:svc:qps_sigma
        expr: stddev_over_time(pg:svc:qps[30m])
      - record: pg:cls:qps_sigma
        expr: stddev_over_time(pg:cls:qps[30m])
      - record: pg:all:qps_sigma
        expr: stddev_over_time(pg:all:qps[30m])
      # query rt (1m avg)
      - record: pg:db:query_rt
        expr: pgbouncer_stat_avg_query_time{datname!="pgbouncer"} / 1000000
      - record: pg:ins:query_rt
        expr: sum without(datname) (rate(pgbouncer_stat_total_query_time[1m])) / sum without(datname) (rate(pgbouncer_stat_total_query_count[1m])) / 1000000
      - record: pg:svc:query_rt
        expr: sum by (cls, role) (rate(pgbouncer_stat_total_query_time[1m])) / sum by (cls, role) (rate(pgbouncer_stat_total_query_count[1m])) / 1000000
      # query_rt avg30m
      - record: pg:db:query_rt_avg30m
        expr: avg_over_time(pg:db:query_rt[30m])
      - record: pg:ins:query_rt_avg30m
        expr: avg_over_time(pg:ins:query_rt[30m])
      - record: pg:svc:query_rt_avg30m
        expr: avg_over_time(pg:svc:query_rt[30m])
      # query_rt µ
      - record: pg:db:query_rt_mu
        expr: avg_over_time(pg:db:query_rt_avg30m[30m])
      - record: pg:ins:query_rt_mu
        expr: avg_over_time(pg:ins:query_rt_avg30m[30m])
      - record: pg:svc:query_rt_mu
        expr: avg_over_time(pg:svc:query_rt_avg30m[30m])
      # query_rt σ: stddev30m
      - record: pg:db:query_rt_sigma
        expr: stddev_over_time(pg:db:query_rt[30m])
      - record: pg:ins:query_rt_sigma
        expr: stddev_over_time(pg:ins:query_rt[30m])
      - record: pg:svc:query_rt_sigma
        expr: stddev_over_time(pg:svc:query_rt[30m])


      #==============================================================#
      #                        PG Load                               #
      #==============================================================#
      # seconds spend on transaction in last minute
      - record: pg:ins:xact_time_rate1m
        expr: sum without (datname) (rate(pgbouncer_stat_total_xact_time{}[1m])) / 1000000
      - record: pg:ins:xact_time_rate5m
        expr: sum without (datname) (rate(pgbouncer_stat_total_xact_time{}[5m])) / 1000000
      - record: pg:ins:xact_time_rate15m
        expr: sum without (datname) (rate(pgbouncer_stat_total_xact_time{}[15m])) / 1000000

      # seconds spend on queries in last minute
      - record: pg:ins:query_time_rate1m
        expr: sum without (datname) (rate(pgbouncer_stat_total_query_time{}[1m])) / 1000000
      - record: pg:ins:query_time_rate5m
        expr: sum without (datname) (rate(pgbouncer_stat_total_query_time{}[5m])) / 1000000
      - record: pg:ins:query_time_rate15m
        expr: sum without (datname) (rate(pgbouncer_stat_total_query_time{}[15m])) / 1000000

      # instance level load
      - record: pg:ins:load0
        expr: sum without (datname) (irate(pgbouncer_stat_total_xact_time{}[1m])) / on (ip) group_left()  node:ins:cpu_count / 1000000
      - record: pg:ins:load1
        expr: pg:ins:xact_time_rate1m  / on (ip) group_left()  node:ins:cpu_count
      - record: pg:ins:load5
        expr: pg:ins:xact_time_rate5m  / on (ip) group_left()  node:ins:cpu_count
      - record: pg:ins:load15
        expr: pg:ins:xact_time_rate15m  / on (ip) group_left()  node:ins:cpu_count

      # service level load
      - record: pg:svc:load0
        expr: sum by (svc, cls, role) (irate(pgbouncer_stat_total_xact_time{}[1m])) / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
      - record: pg:svc:load1
        expr: sum by (svc, cls, role) (pg:ins:xact_time_rate1m)  / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
      - record: pg:svc:load5
        expr: sum by (svc, cls, role) (pg:ins:xact_time_rate5m)  / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
      - record: pg:svc:load15
        expr: sum by (svc, cls, role) (pg:ins:xact_time_rate15m)  / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000

      # cluster level load
      - record: pg:cls:load0
        expr: sum by (cls) (irate(pgbouncer_stat_total_xact_time{}[1m])) / on (cls) node:cls:cpu_count{} / 1000000
      - record: pg:cls:load1
        expr: sum by (cls) (pg:ins:xact_time_rate1m)  / on (cls) node:cls:cpu_count
      - record: pg:cls:load5
        expr: sum by (cls) (pg:ins:xact_time_rate5m)  / on (cls) node:cls:cpu_count
      - record: pg:cls:load15
        expr: sum by (cls) (pg:ins:xact_time_rate15m)  / on (cls) node:cls:cpu_count


      #==============================================================#
      #                     PG Saturation                            #
      #==============================================================#
      # max value of pg_load and cpu_usage

      # instance level saturation
      - record: pg:ins:saturation0
        expr: pg:ins:load0 > node:ins:cpu_usage or node:ins:cpu_usage
      - record: pg:ins:saturation1
        expr: pg:ins:load1 > node:ins:cpu_usage or node:ins:cpu_usage
      - record: pg:ins:saturation5
        expr: pg:ins:load5 > node:ins:cpu_usage or node:ins:cpu_usage
      - record: pg:ins:saturation15
        expr: pg:ins:load15 > node:ins:cpu_usage or node:ins:cpu_usage

      # cluster level saturation
      - record: pg:cls:saturation0
        expr: pg:cls:load0 > node:cls:cpu_usage or node:cls:cpu_usage
      - record: pg:cls:saturation1
        expr: pg:cls:load1 > node:cls:cpu_usage or node:cls:cpu_usage
      - record: pg:cls:saturation5
        expr: pg:cls:load5 > node:cls:cpu_usage or node:cls:cpu_usage
      - record: pg:cls:saturation15
        expr: pg:cls:load15 > node:cls:cpu_usage or node:cls:cpu_usage


      #==============================================================#
      #                          CRUD                                #
      #==============================================================#
      # rows touched
      - record: pg:db:tup_touched
        expr: irate(pg_db_tup_fetched{}[1m])
      - record: pg:ins:tup_touched
        expr: sum without(datname) (pg:db:tup_touched)
      - record: pg:svc:tup_touched
        expr: sum by (cls, role) (pg:ins:tup_touched)
      - record: pg:cls:tup_touched
        expr: sum by (cls) (pg:ins:tup_touched)
      - record: pg:all:tup_touched
        expr: sum(pg:cls:tup_touched)

      # selected
      - record: pg:db:tup_selected
        expr: irate(pg_db_tup_returned{}[1m])
      - record: pg:ins:tup_selected
        expr: sum without(datname) (pg:db:tup_selected)
      - record: pg:svc:tup_selected
        expr: sum by (cls, role) (pg:ins:tup_selected)
      - record: pg:cls:tup_selected
        expr: sum by (cls) (pg:ins:tup_selected)
      - record: pg:all:tup_selected
        expr: sum(pg:cls:tup_selected)

      # inserted
      - record: pg:db:tup_inserted
        expr: irate(pg_db_tup_inserted{}[1m])
      - record: pg:ins:tup_inserted
        expr: sum without(datname) (pg:db:tup_inserted)
      - record: pg:svc:tup_inserted
        expr: sum by (cls, role) (pg:ins:tup_inserted)
      - record: pg:cls:tup_inserted
        expr: sum by (cls) (pg:ins:tup_inserted{role="primary"})
      - record: pg:all:tup_inserted
        expr: sum(pg:cls:tup_inserted)

      # updated
      - record: pg:db:tup_updated
        expr: irate(pg_db_tup_updated{}[1m])
      - record: pg:ins:tup_updated
        expr: sum without(datname) (pg:db:tup_updated)
      - record: pg:svc:tup_updated
        expr: sum by (cls, role) (pg:ins:tup_updated)
      - record: pg:cls:tup_updated
        expr: sum by (cls) (pg:ins:tup_updated{role="primary"})
      - record: pg:all:tup_updated
        expr: sum(pg:cls:tup_updated)

      # deleted
      - record: pg:db:tup_deleted
        expr: irate(pg_db_tup_deleted{}[1m])
      - record: pg:ins:tup_deleted
        expr: sum without(datname) (pg:db:tup_deleted)
      - record: pg:svc:tup_deleted
        expr: sum by (cls, role) (pg:ins:tup_deleted)
      - record: pg:cls:tup_deleted
        expr: sum by (cls) (pg:ins:tup_deleted{role="primary"})
      - record: pg:all:tup_deleted
        expr: sum(pg:cls:tup_deleted)

      # modified
      - record: pg:db:tup_modified
        expr: irate(pg_db_tup_modified{}[1m])
      - record: pg:ins:tup_modified
        expr: sum without(datname) (pg:db:tup_modified)
      - record: pg:svc:tup_modified
        expr: sum by (cls, role) (pg:ins:tup_modified)
      - record: pg:cls:tup_modified
        expr: sum by (cls) (pg:ins:tup_modified{role="primary"})
      - record: pg:all:tup_modified
        expr: sum(pg:cls:tup_deleted)


      #==============================================================#
      #                      Object Access                           #
      #==============================================================#
      # table access
      - record: pg:table:idx_scan
        expr: rate(pg_table_idx_scan{}[1m])
      - record: pg:table:seq_scan
        expr: rate(pg_table_seq_scan{}[1m])
      - record: pg:table:qps_realtime
        expr: irate(pg_table_idx_scan{}[1m])

      # index access
      - record: pg:index:idx_scan
        expr: rate(pg_index_idx_scan{}[1m])
      - record: pg:index:qps_realtime
        expr: irate(pg_index_idx_scan{}[1m])

      # func access
      - record: pg:func:call
        expr: rate(pg_func_calls{}[1m])
      - record: pg:func:rt
        expr: rate(pg_func_total_time{}[1m]) / pg:func:call

      # query access
      - record: pg:query:call
        expr: rate(pg_query_calls{}[1m])
      - record: pg:query:rt
        expr: rate(pg_query_total_time{}[1m]) / pg:query:call / 1000



      #==============================================================#
      #                        Blocks IO                             #
      #==============================================================#
      # blocks read/hit/access in 1min
      - record: pg:db:blks_read_1m
        expr: increase(pg_db_blks_read{}[1m])
      - record: pg:db:blks_hit_1m
        expr: increase(pg_db_blks_hit{}[1m])
      - record: pg:db:blks_access_1m
        expr: increase(pg_db_blks_access{}[1m])

      # buffer hit rate (1m)
      - record: pg:db:buffer_hit_rate
        expr: pg:db:blks_hit_1m / pg:db:blks_access_1m
      - record: pg:ins:hit_rate
        expr: sum without(datname) (pg:db:blks_hit_1m) / sum without(datname) (pg:db:blks_access_1m)

      # read/write time usage
      - record: pg:db:read_time_usage
        expr: rate(pg_db_blk_read_time[1m])
      - record: pg:db:write_time_usage
        expr: rate(pg_db_blk_write_time[1m])
      - record: pg:db:io_time_usage
        expr: pg:db:read_time_usage + pg:db:write_time_usage



      #==============================================================#
      #                  Traffic IO (Pgbouncer)                      #
      #==============================================================#
      # transmit bandwidth (sent, out)
      - record: pg:db:tx
        expr: irate(pgbouncer_stat_total_sent{datname!="pgbouncer"}[1m])
      - record: pg:ins:tx
        expr: sum without (user, datname) (pg:db:tx)
      - record: pg:svc:tx
        expr: sum by (cls, role) (pg:ins:tx)
      - record: pg:cls:tx
        expr: sum by (cls) (pg:ins:tx)
      - record: pg:all:tx
        expr: sum(pg:cls:tx)

      # receive bandwidth (sent, out)
      - record: pg:db:rx
        expr: irate(pgbouncer_stat_total_received{datname!="pgbouncer"}[1m])
      - record: pg:ins:rx
        expr: sum without (datname) (pg:db:rx)
      - record: pg:svc:rx
        expr: sum by (cls, role) (pg:ins:rx)
      - record: pg:cls:rx
        expr: sum by (cls) (pg:ins:rx)
      - record: pg:all:rx
        expr: sum(pg:cls:rx)



      #==============================================================#
      #                          Lock                                #
      #==============================================================#
      # lock count by mode
      - record: pg:db:locks
        expr: pg_lock_count
      - record: pg:ins:locks
        expr: sum without(datname) (pg:db:locks)
      - record: pg:svc:locks
        expr: sum by (cls, role, mode) (pg:ins:locks)
      - record: pg:cls:locks
        expr: sum by (cls, mode) (pg:ins:locks)

      # total lock count
      - record: pg:db:lock_count
        expr: sum without (mode) (pg_lock_count{})
      - record: pg:ins:lock_count
        expr: sum without(datname) (pg:db:lock_count)
      - record: pg:svc:lock_count
        expr: sum by (cls, role) (pg:ins:lock_count)
      - record: pg:cls:lock_count
        expr: sum by (cls) (pg:ins:lock_count)

      # read category lock
      - record: pg:db:rlock
        expr: sum without (mode) (pg_lock_count{mode="AccessShareLock"})
      - record: pg:ins:rlock
        expr: sum without(datname) (pg:db:rlock)
      - record: pg:svc:rlock
        expr: sum by (cls, role) (pg:ins:rlock)
      - record: pg:cls:rlock
        expr: sum by (cls) (pg:ins:rlock)

      # write category lock (insert|update|delete)
      - record: pg:db:wlock
        expr: sum without (mode) (pg_lock_count{mode=~"RowShareLock|RowExclusiveLock"})
      - record: pg:ins:wlock
        expr: sum without(datname) (pg:db:wlock)
      - record: pg:svc:wlock
        expr: sum by (cls, role) (pg:ins:wlock)
      - record: pg:cls:wlock
        expr: sum by (cls) (pg:ins:wlock)

      # exclusive category lock
      - record: pg:db:xlock
        expr: sum without (mode) (pg_lock_count{mode=~"AccessExclusiveLock|ExclusiveLock|ShareRowExclusiveLock|ShareLock|ShareUpdateExclusiveLock"})
      - record: pg:ins:xlock
        expr: sum without(datname) (pg:db:xlock)
      - record: pg:svc:xlock
        expr: sum by (cls, role) (pg:ins:xlock)
      - record: pg:cls:xlock
        expr: sum by (cls) (pg:ins:xlock)


      #==============================================================#
      #                          Temp                                #
      #==============================================================#
      # temp files and bytes
      - record: pg:db:temp_bytes
        expr: rate(pg_db_temp_bytes{}[1m])
      - record: pg:ins:temp_bytes
        expr: sum without(datname) (pg:db:temp_bytes)
      - record: pg:svc:temp_bytes
        expr: sum by (cls, role) (pg:ins:temp_bytes)
      - record: pg:cls:temp_bytes
        expr: sum by (cls) (pg:ins:temp_bytes)

      # temp file count in last 1m
      - record: pg:db:temp_files
        expr: increase(pg_db_temp_files{}[1m])
      - record: pg:ins:temp_files
        expr: sum without(datname) (pg:db:temp_files)
      - record: pg:svc:temp_files
        expr: sum by (cls, role) (pg:ins:temp_files)
      - record: pg:cls:temp_files
        expr: sum by (cls) (pg:ins:temp_files)



      #==============================================================#
      #                           Size                               #
      #==============================================================#
      # database size
      - record: pg:ins:db_size
        expr: pg_size_database
      - record: pg:cls:db_size
        expr: sum by (cls) (pg:ins:db_size)
      # wal size
      - record: pg:ins:wal_size
        expr: pg_size_wal
      - record: pg:cls:wal_size
        expr: sum by (cls) (pg:ins:wal_size)
      # log size
      - record: pg:ins:log_size
        expr: pg_size_log
      - record: pg:cls:log_size
        expr: sum by (cls) (pg_size_log)



      #==============================================================#
      #                        Checkpoint                            #
      #==============================================================#
      # checkpoint stats
      - record: pg:ins:last_ckpt
        expr: pg_checkpoint_elapse
      - record: pg:ins:ckpt_timed
        expr: increase(pg_bgwriter_checkpoints_timed{}[30s])
      - record: pg:ins:ckpt_req
        expr: increase(pg_bgwriter_checkpoints_req{}[30s])
      - record: pg:cls:ckpt_1h
        expr: increase(pg:ins:ckpt_timed[1h]) + increase(pg:ins:ckpt_req[1h])

      # buffer flush & alloc
      - record: pg:ins:buf_flush_backend
        expr: irate(pg_bgwriter_buffers_backend{}[1m]) * 8192
      - record: pg:ins:buf_flush_checkpoint
        expr: irate(pg_bgwriter_buffers_checkpoint{}[1m]) * 8192

      - record: pg:ins:buf_flush
        expr: pg:ins:buf_flush_backend + pg:ins:buf_flush_checkpoint
      - record: pg:svc:buf_flush
        expr: sum by (cls, role) (pg:ins:buf_flush)
      - record: pg:cls:buf_flush
        expr: sum by (cls) (pg:ins:buf_flush)
      - record: pg:all:buf_flush
        expr: sum(pg:cls:buf_flush)

      - record: pg:ins:buf_alloc
        expr: irate(pg_bgwriter_buffers_alloc{}[1m]) * 8192
      - record: pg:svc:buf_alloc
        expr: sum by (cls, role) (pg:ins:buf_alloc)
      - record: pg:cls:buf_alloc
        expr: sum by (cls) (pg:ins:buf_alloc)
      - record: pg:all:buf_alloc
        expr: sum(pg:cls:buf_alloc)




      #==============================================================#
      #                           LSN                                #
      #==============================================================#
      # timeline & LSN
      - record: pg_timeline
        expr: pg_checkpoint_tli
      - record: pg:ins:redo_lsn
        expr: pg_checkpoint_redo_lsn
      - record: pg:ins:checkpoint_lsn
        expr: pg_checkpoint_checkpoint_lsn

      # wal rate
      - record: pg:ins:wal_rate
        expr: rate(pg_lsn[1m])
      - record: pg:cls:wal_rate
        expr: max by (cls) (pg:ins:wal_rate{role="primary"})
      - record: pg:all:wal_rate
        expr: sum(pg:cls:wal_rate)



      #==============================================================#
      #                       Replication                            #
      #==============================================================#
      # lag time from replica's view
      - record: pg:ins:lag_seconds
        expr: pg_lag
      - record: pg:cls:lag_seconds
        expr: max by (cls) (pg:ins:lag_seconds)
      - record: pg:all:lag_seconds
        expr: max(pg:cls:lag_seconds)

      # sync status
      - record: pg:ins:sync_status # application_name must set to replica ins name
        expr: max by (ins, svc, cls) (label_replace(pg_replication_sync_status, "ins", "$1", "application_name", "(.+)"))

      # lag of self (application_name must set to standby ins name)
      - record: pg:ins:lag_bytes
        expr: max by (ins, svc, cls, role) (label_replace(pg_replication_lsn{} - pg_replication_replay_lsn{}, "ins", "$1", "application_name", "(.+)"))
      - record: pg:cls:lag_bytes
        expr: max by (cls) (pg:ins:lag_bytes)
      - record: pg:all:lag_bytes
        expr: max(pg:cls:lag_bytes)

      # replication slot retained bytes
      - record: pg:ins:slot_retained_bytes
        expr: pg_slot_retained_bytes

      # replica walreceiver
      - record: pg:ins:recv_init_lsn
        expr: pg_walreceiver_init_lsn
      - record: pg:ins:recv_last_lsn
        expr: pg_walreceiver_last_lsn
      - record: pg:ins:recv_init_tli
        expr: pg_walreceiver_init_tli
      - record: pg:ins:recv_last_tli
        expr: pg_walreceiver_last_tli




      #==============================================================#
      # Cluster Level Metrics
      #==============================================================#
      # cluster member count
      - record: pg:cls:leader
        expr: count by (cls, ins) (max by (cls, ins) (pg_status{}) == 3)
      - record: pg:cls:size
        expr: count by (cls) (max by (cls, ins) (pg_up{}))
      - record: pg:cls:timeline
        expr: max by (cls) (pg_checkpoint_tli{})
      - record: pg:cls:primarys
        expr: count by (cls) (max by (cls, ins) (pg_in_recovery{}) == 0)
      - record: pg:cls:replicas
        expr: count by (cls) (max by (cls, ins) (pg_in_recovery{}) == 1)
      - record: pg:cls:synchronous
        expr: max by (cls) (pg_sync_standby_enabled) > bool 0
      - record: pg:cls:bridging_instances
        expr: count by (cls, role, ins, ip) (pg_replication_lsn{state="streaming", role!="primary"} > 0)
      - record: pg:cls:bridging
        expr: count by (cls) (pg:cls:bridging_instances)
      - record: pg:cls:cascading
        expr: count by (cls) (pg_replication_lsn{state="streaming", role!="primary"})





      #==============================================================#
      #                    Pgbouncer List                            #
      #==============================================================#
      # object list
      - record: pg:ins:pools
        expr: pgbouncer_list_items{list="pools"}
      - record: pg:ins:pool_databases
        expr: pgbouncer_list_items{list="databases"}
      - record: pg:ins:pool_users
        expr: pgbouncer_list_items{list="users"}
      - record: pg:ins:login_clients
        expr: pgbouncer_list_items{list="login_clients"}
      - record: pg:ins:free_clients
        expr: pgbouncer_list_items{list="free_clients"}
      - record: pg:ins:used_clients
        expr: pgbouncer_list_items{list="used_clients"}
      - record: pg:ins:free_servers
        expr: pgbouncer_list_items{list="free_servers"}



      #==============================================================#
      #                  DBConfig (Pgbouncer)                        #
      #==============================================================#
      - record: pg:db:pool_max_conn
        expr: pgbouncer_database_pool_size{datname!="pgbouncer"} + pgbouncer_database_reserve_pool{datname!="pgbouncer"}
      - record: pg:db:pool_size
        expr: pgbouncer_database_pool_size{datname!="pgbouncer"}
      - record: pg:db:pool_reserve_size
        expr: pgbouncer_database_reserve_pool{datname!="pgbouncer"}
      - record: pg:db:pool_current_conn
        expr: pgbouncer_database_current_connections{datname!="pgbouncer"}
      - record: pg:db:pool_paused
        expr: pgbouncer_database_paused{datname!="pgbouncer"}
      - record: pg:db:pool_disabled
        expr: pgbouncer_database_disabled{datname!="pgbouncer"}



      #==============================================================#
      #                  Waiting (Pgbouncer)                         #
      #==============================================================#
      # average wait time
      - record: pg:db:wait_rt
        expr: pgbouncer_stat_avg_wait_time{datname!="pgbouncer"} / 1000000

      # max wait time among all clients
      - record: pg:pool:maxwait
        expr: pgbouncer_pool_maxwait{datname!="pgbouncer"} + pgbouncer_pool_maxwait_us{datname!="pgbouncer"} / 1000000
      - record: pg:db:maxwait
        expr: max without(user) (pg:pool:maxwait)
      - record: pg:ins:maxwait
        expr: max without(user, datname) (pg:db:maxwait)
      - record: pg:svc:maxwait
        expr: max by (cls, role) (pg:ins:maxwait)
      - record: pg:cls:maxwait
        expr: max by (cls) (pg:ins:maxwait)
      - record: pg:all:maxwait
        expr: max(pg:cls:maxwait)

...

8.5 - 报警规则

Pigsty报警规则定义

Prometheus报警规则

机器节点报警规则

################################################################
#                          Node Alert                          #
################################################################
- name: node-alert
  rules:

    # node exporter down for 1m triggers a P1 alert
    - alert: NODE_EXPORTER_DOWN
      expr: up{instance=~"^.*:(9100)$"} == 0
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node Exporter Down: {{ $labels.ins }} {{ $value }}"
        description: |
          up[instance={{ $labels.instance }}] = {{ $value }} == 0
          https://dba.p1staff.com/d/node?var-ip={{ $labels.instance }}&from=now-5m&to=now&refresh=10s          



    #==============================================================#
    #                          CPU & Load                          #
    #==============================================================#
    # node avg CPU usage > 90% for 1m
    - alert: NODE_CPU_HIGH
      expr: node:ins:cpu_usage > 0.90
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node CPU High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:ins:cpu_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=28&fullscreen&var-ip={{ $labels.ip }}          

    # node load5 > 100%
    - alert: NODE_LOAD_HIGH
      expr: node:ins:stdload5 > 1
      for: 3m
      labels:
        severity: P2
      annotations:
        summary: "P2 Node Load High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:ins:stdload5[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 100%
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=37&fullscreen&var-ip={{ $labels.ip }}          



    #==============================================================#
    #                      Disk & Filesystem                       #
    #==============================================================#
    # main fs readonly triggers an immediate P0 alert
    - alert: NODE_FS_READONLY
      expr: node_filesystem_readonly{fstype!~"(n|root|tmp)fs.*"} == 1
      labels:
        severity: P0
      annotations:
        summary: "P0 Node Filesystem Readonly: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node_filesystem_readonly{ins={{ $labels.ins }}, ip={{ $labels.ip }},fstype!~"(n|root|tmp)fs.*"} == 1
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=110&fullscreen&var-ip={{ $labels.ip }}          

    # main fs usage > 90% for 1m triggers P1 alert
    - alert: NODE_FS_SPACE_FULL
      expr: node:fs:space_usage > 0.90
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node Filesystem Space Full: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:fs:space_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=110&fullscreen&var-ip={{ $labels.ip }}          

    # main fs inode usage > 90% for 1m triggers P1 alert
    - alert: NODE_FS_INODE_FULL
      expr: node:fs:inode_usage > 0.90
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node Filesystem iNode Full: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:fs:inode_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=110&fullscreen&var-ip={{ $labels.ip }}          

    # fd usage > 90% for 1m triggers P1 alert
    - alert: NODE_FD_FULL
      expr: node:fs:fd_usage > 0.90
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node File Descriptor Full: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:fs:fd_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=58&fullscreen&var-ip={{ $labels.ip }}          


    # ssd read latency > 32ms for 3m (except long-read)
    - alert: NODE_READ_LATENCY_HIGH
      expr: node:dev:disk_read_rt  < 10000 and node:dev:disk_read_rt  > 0.032
      for: 3m
      labels:
        severity: P2
      annotations:
        summary: "P2 Node Read Latency High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:dev:disk_read_rt[ins={{ $labels.ins }}, ip={{ $labels.ip }}, device={{ $labels.device }}] = {{ $value }} > 32ms
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=29&fullscreen&var-ip={{ $labels.ip }}          

    # ssd write latency > 16ms for 3m
    - alert: NODE_WRITE_LATENCY_HIGH
      expr: node:dev:disk_write_rt  < 10000 and node:dev:disk_write_rt  > 0.016
      for: 3m
      labels:
        severity: P2
      annotations:
        summary: "P2 Node Write Latency High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:dev:disk_write_rt[ins={{ $labels.ins }}, ip={{ $labels.ip }}, device={{ $labels.device }}] = {{ $value }} > 16ms
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=29&fullscreen&var-ip={{ $labels.ip }}          



    #==============================================================#
    #                           Memory                             #
    #==============================================================#
    # shared memory usage > 80% for 1m triggers a P1 alert
    - alert: NODE_MEM_HIGH
      expr: node:ins:mem_usage > 0.80
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node Mem High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:ins:mem_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 80%
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=40&fullscreen&var-ip={{ $labels.ip }}          



    #==============================================================#
    #                      Network & TCP                           #
    #==============================================================#
    # node tcp listen overflow > 2 for 3m
    - alert: NODE_TCP_LISTEN_OVERFLOW
      expr: node:ins:tcp_overflow_rate > 2
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node TCP Listen Overflow: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:ins:tcp_overflow_rate[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 2
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=55&fullscreen&var-ip={{ $labels.ip }}          

    # node tcp retrans > 32 per sec for 3m
    - alert: NODE_TCP_RETRANS_HIGH
      expr: node:ins:tcp_retranssegs > 32
      for: 3m
      labels:
        severity: P2
      annotations:
        summary: "P2 Node TCP Retrans High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node:ins:tcp_retranssegs[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 32
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=52&fullscreen&var-ip={{ $labels.ip }}          

    # node tcp conn > 32768 for 1m
    - alert: NODE_TCP_CONN_HIGH
      expr: node_netstat_Tcp_CurrEstab > 32768
      for: 3m
      labels:
        severity: P2
      annotations:
        summary: "P2 Node TCP Connection High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node_netstat_Tcp_CurrEstab[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 32768
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=54&fullscreen&var-ip={{ $labels.ip }}          



    #==============================================================#
    #                          Misc                                #
    #==============================================================#
    # node ntp offset > 1s for 1m
    - alert: NODE_NTP_OFFSET_HIGH
      expr: abs(node_ntp_offset_seconds) > 1
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 Node NTP Offset High: {{ $labels.ins }} {{ $labels.ip }}"
        description: |
          node_ntp_offset_seconds[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 32768
          http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=70&fullscreen&var-ip={{ $labels.ip }}          


数据库与连接池报警规则

---
################################################################
#                         PgSQL Alert                          #
################################################################
- name: pgsql-alert
  rules:

    #==============================================================#
    #                     Error / Aliveness                        #
    #==============================================================#
    # cluster size change triggers a P0 alert (warn: auto heal in 5min)
    - alert: PGSQL_CLUSTER_SHRINK
      expr: delta(pg:cls:size{}[5m]) < 0
      for: 15s
      labels:
        severity: P1
      annotations:
        summary: 'delta(pg:cls:size{cls={{ $labels.cls }}}[15s]) = {{ $value | printf "%.0f" }} < 0'
        description: |
                    http://g.pigsty/d/pg-cluster&from=now-10m&to=now&var-cls={{ $labels.cls }}


    # postgres down for 15s triggers a P0 alert
    - alert: PGSQL_DOWN
      expr: PGSQL_up{} == 0
      labels:
        severity: P0
      annotations:
        summary: "[P0] PGSQL_DOWN: {{ $labels.ins }} {{ $value }}"
        description: |
          PGSQL_up[ins={{ $labels.ins }}] = {{ $value }} == 0
          http://g.pigsty/d/pg-instance&from=now-10m&to=now&var-ins={{ $labels.ins }}          

    # pgbouncer down for 15s triggers a P0 alert
    - alert: PGBOUNCER_DOWN
      expr: pgbouncer_up{} == 0
      labels:
        severity: P0
      annotations:
        summary: "P0 Pgbouncer Down: {{ $labels.ins }} {{ $value }}"
        description: |
          pgbouncer_up[ins={{ $labels.ins }}] = {{ $value }} == 0
          http://g.pigsty/d/pg-pgbouncer&from=now-10m&to=now&var-ins={{ $labels.ins }}          

    # pg/pgbouncer exporter down for 1m triggers a P1 alert
    - alert: PGSQL_EXPORTER_DOWN
      expr: up{instance=~"^.*:(9630|9631)$"} == 0
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG/PGB Exporter Down: {{ $labels.ins }} {{ $labels.instance }} {{ $value }}"
        description: |
          up[instance={{ $labels.instance }}] = {{ $value }} == 0
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=262&fullscreen&var-ins={{ $labels.ins }}          



    #==============================================================#
    #                         Latency                              #
    #==============================================================#
    # replication break for 1m triggers a P1 alert (warn: heal in 5m)
    - alert: PGSQL_REPLICATION_BREAK
      expr: delta(PGSQL_downstream_count{state="streaming"}[5m]) < 0
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Replication Break: {{ $labels.ins }} {{ $value }}"
        description: |
          PGSQL_downstream_count_delta[ins={{ $labels.ins }}] = {{ $value }} < 0
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=180&fullscreen&var-ins={{ $labels.ins }}          

    # replication lag greater than 8 second for 3m triggers a P1 alert
    - alert: PGSQL_REPLICATION_LAG
      expr: PGSQL_replication_replay_lag{application_name!='PGSQL_receivewal'} > 8
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Replication Lagged: {{ $labels.ins }} {{ $value }}"
        description: |
          PGSQL_replication_replay_lag[ins={{ $labels.ins }}] = {{ $value }} > 8s
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=384&fullscreen&var-ins={{ $labels.ins }}          

    # pg avg response time > 16ms
    - alert: PGSQL_QUERY_RT_HIGH
      expr: pg:ins:query_rt > 0.016
      for: 1m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Query Response Time High: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:query_rt[ins={{ $labels.ins }}] = {{ $value }} > 16ms
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=137&fullscreen&var-ins={{ $labels.ins }}          


    #==============================================================#
    #                        Saturation                            #
    #==============================================================#
    # pg load1 high than 70% for 3m triggers a P1 alert
    - alert: PGSQL_LOAD_HIGH
      expr: pg:ins:load1{} > 0.70
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Load High: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:load1[ins={{ $labels.ins }}] = {{ $value }} > 70%
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=210&fullscreen&var-ins={{ $labels.ins }}          

    # pg active backend more than 2 times of available cpu cores for 3m triggers a P1 alert
    - alert: PGSQL_BACKEND_HIGH
      expr: pg:ins:active_backends / on(ins) node:ins:cpu_count > 2
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Backend High: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:active_backends/node:ins:cpu_count[ins={{ $labels.ins }}] = {{ $value }} > 2
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=150&fullscreen&var-ins={{ $labels.ins }}          

    # max idle xact duration exceed 3m
    - alert: PGSQL_IDLE_XACT_BACKEND_HIGH
      expr: pg:ins:ixact_backends > 1
      for: 3m
      labels:
        severity: P2
      annotations:
        summary: "P1 PG Idle In Transaction Backend High: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:ixact_backends[ins={{ $labels.ins }}] = {{ $value }} > 1
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=161&fullscreen&var-ins={{ $labels.ins }}          


    # 2 waiting clients for 3m triggers a P1 alert
    - alert: PGSQL_CLIENT_QUEUING
      expr: pg:ins:waiting_clients > 2
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Client Queuing: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:waiting_clients[ins={{ $labels.ins }}] = {{ $value }} > 2
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=159&fullscreen&var-ins={{ $labels.ins }}          

    # age wrap around (near half) triggers a P1 alert
    - alert: PGSQL_AGE_HIGH
      expr: pg:ins:age > 1000000000
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 PG Age High: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:age[ins={{ $labels.ins }}] = {{ $value }} > 1000000000
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=172&fullscreen&var-ins={{ $labels.ins }}          



    #==============================================================#
    #                         Traffic                              #
    #==============================================================#
    # more than 30k TPS lasts for 3m triggers a P1 (pgbouncer bottleneck)
    - alert: PGSQL_TPS_HIGH
      expr: pg:ins:xacts > 30000
      for: 3m
      labels:
        severity: P1
      annotations:
        summary: "P1 Postgres TPS High: {{ $labels.ins }} {{ $value }}"
        description: |
          pg:ins:xacts[ins={{ $labels.ins }}] = {{ $value }} > 30000
          http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=125&fullscreen&var-ins={{ $labels.ins }}          

...

8.6 - 标准输出

完成沙箱环境初始化剧本所执行的具体步骤与输出结果

在本地拉起沙箱时所执行的Makefile快捷命令,以及其输出结果。

命令概览

# 下载本项目代码
cd /tmp && git clone git@github.com:Vonng/pigsty.git && cd pigsty

make up         # 拉起vagrant虚拟机
make ssh        # 配置虚拟机ssh访问      【单次,下次启动无需再次执行】
sudo make dns   # 写入Pigsty静态DNS域名  【sudo输入密码,可选,单次】
make download   # 下载最新离线软件包      【可选,可显著加速初始化】
make upload     # 将离线软件包上传至元节点
make init       # 初始化Pigsty
make mon-view   # 打开Pigsty监控首页(默认用户密码:admin:admin)

clone

克隆并进入项目目录,后续操作均位于项目根目录中(以/tmp/pigsty为例)

cd /tmp && git clone git@github.com:Vonng/pigsty.git && cd pigsty

clean

清理所有的沙箱痕迹(如果有)

$ make clean
cd vagrant && vagrant destroy -f --parallel; exit 0
==> vagrant: A new version of Vagrant is available: 2.2.14 (installed version: 2.2.13)!
==> vagrant: To upgrade visit: https://www.vagrantup.com/downloads.html

==> node-3: Forcing shutdown of VM...
==> node-3: Destroying VM and associated drives...
==> node-2: Forcing shutdown of VM...
==> node-2: Destroying VM and associated drives...
==> node-1: Forcing shutdown of VM...
==> node-1: Destroying VM and associated drives...
==> meta: Forcing shutdown of VM...
==> meta: Destroying VM and associated drives...

up

执行make up将调用vagrant up命令,根据Vagrantfile中的定义,使用Virtualbox创建四台虚拟机。

请注意第一次执行vagrant up时,软件会自动从官网下载 CentOS/7 的虚拟机镜像。如果您的网络状况不佳(例如没有FQ代理),则可能需要等待相当长的一段时间。您也可以选择自己创建虚拟机,并根据 部署 一章的说明进行Pigsty部署(不建议)。

$ make up
cd vagrant && vagrant up
Bringing machine 'meta' up with 'virtualbox' provider...
Bringing machine 'node-1' up with 'virtualbox' provider...
Bringing machine 'node-2' up with 'virtualbox' provider...
Bringing machine 'node-3' up with 'virtualbox' provider...
==> meta: Cloning VM...
==> meta: Matching MAC address for NAT networking...
==> meta: Setting the name of the VM: vagrant_meta_1614587906789_29514
==> meta: Clearing any previously set network interfaces...
==> meta: Preparing network interfaces based on configuration...
    meta: Adapter 1: nat
    meta: Adapter 2: hostonly
==> meta: Forwarding ports...
    meta: 22 (guest) => 2222 (host) (adapter 1)
==> meta: Running 'pre-boot' VM customizations...
==> meta: Booting VM...
==> meta: Waiting for machine to boot. This may take a few minutes...
    meta: SSH address: 127.0.0.1:2222
    meta: SSH username: vagrant
    meta: SSH auth method: private key
==> meta: Machine booted and ready!
==> meta: Checking for guest additions in VM...
    meta: No guest additions were detected on the base box for this VM! Guest
    meta: additions are required for forwarded ports, shared folders, host only
    meta: networking, and more. If SSH fails on this machine, please install
    meta: the guest additions and repackage the box to continue.
    meta:
    meta: This is not an error message; everything may continue to work properly,
    meta: in which case you may ignore this message.
==> meta: Setting hostname...
==> meta: Configuring and enabling network interfaces...
==> meta: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> meta: Running provisioner: shell...
    meta: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-1jv6obp.sh
    meta: [INFO] write ssh config to /home/vagrant/.ssh
==> node-1: Cloning VM...
==> node-1: Matching MAC address for NAT networking...
==> node-1: Setting the name of the VM: vagrant_node-1_1614587930603_84690
==> node-1: Fixed port collision for 22 => 2222. Now on port 2200.
==> node-1: Clearing any previously set network interfaces...
==> node-1: Preparing network interfaces based on configuration...
    node-1: Adapter 1: nat
    node-1: Adapter 2: hostonly
==> node-1: Forwarding ports...
    node-1: 22 (guest) => 2200 (host) (adapter 1)
==> node-1: Running 'pre-boot' VM customizations...
==> node-1: Booting VM...
==> node-1: Waiting for machine to boot. This may take a few minutes...
    node-1: SSH address: 127.0.0.1:2200
    node-1: SSH username: vagrant
    node-1: SSH auth method: private key
==> node-1: Machine booted and ready!
==> node-1: Checking for guest additions in VM...
    node-1: No guest additions were detected on the base box for this VM! Guest
    node-1: additions are required for forwarded ports, shared folders, host only
    node-1: networking, and more. If SSH fails on this machine, please install
    node-1: the guest additions and repackage the box to continue.
    node-1:
    node-1: This is not an error message; everything may continue to work properly,
    node-1: in which case you may ignore this message.
==> node-1: Setting hostname...
==> node-1: Configuring and enabling network interfaces...
==> node-1: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> node-1: Running provisioner: shell...
    node-1: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-5w83e1.sh
    node-1: [INFO] write ssh config to /home/vagrant/.ssh
==> node-2: Cloning VM...
==> node-2: Matching MAC address for NAT networking...
==> node-2: Setting the name of the VM: vagrant_node-2_1614587953786_32441
==> node-2: Fixed port collision for 22 => 2222. Now on port 2201.
==> node-2: Clearing any previously set network interfaces...
==> node-2: Preparing network interfaces based on configuration...
    node-2: Adapter 1: nat
    node-2: Adapter 2: hostonly
==> node-2: Forwarding ports...
    node-2: 22 (guest) => 2201 (host) (adapter 1)
==> node-2: Running 'pre-boot' VM customizations...
==> node-2: Booting VM...
==> node-2: Waiting for machine to boot. This may take a few minutes...
    node-2: SSH address: 127.0.0.1:2201
    node-2: SSH username: vagrant
    node-2: SSH auth method: private key
==> node-2: Machine booted and ready!
==> node-2: Checking for guest additions in VM...
    node-2: No guest additions were detected on the base box for this VM! Guest
    node-2: additions are required for forwarded ports, shared folders, host only
    node-2: networking, and more. If SSH fails on this machine, please install
    node-2: the guest additions and repackage the box to continue.
    node-2:
    node-2: This is not an error message; everything may continue to work properly,
    node-2: in which case you may ignore this message.
==> node-2: Setting hostname...
==> node-2: Configuring and enabling network interfaces...
==> node-2: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> node-2: Running provisioner: shell...
    node-2: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-1xljcde.sh
    node-2: [INFO] write ssh config to /home/vagrant/.ssh
==> node-3: Cloning VM...
==> node-3: Matching MAC address for NAT networking...
==> node-3: Setting the name of the VM: vagrant_node-3_1614587977533_52921
==> node-3: Fixed port collision for 22 => 2222. Now on port 2202.
==> node-3: Clearing any previously set network interfaces...
==> node-3: Preparing network interfaces based on configuration...
    node-3: Adapter 1: nat
    node-3: Adapter 2: hostonly
==> node-3: Forwarding ports...
    node-3: 22 (guest) => 2202 (host) (adapter 1)
==> node-3: Running 'pre-boot' VM customizations...
==> node-3: Booting VM...
==> node-3: Waiting for machine to boot. This may take a few minutes...
    node-3: SSH address: 127.0.0.1:2202
    node-3: SSH username: vagrant
    node-3: SSH auth method: private key
==> node-3: Machine booted and ready!
==> node-3: Checking for guest additions in VM...
    node-3: No guest additions were detected on the base box for this VM! Guest
    node-3: additions are required for forwarded ports, shared folders, host only
    node-3: networking, and more. If SSH fails on this machine, please install
    node-3: the guest additions and repackage the box to continue.
    node-3:
    node-3: This is not an error message; everything may continue to work properly,
    node-3: in which case you may ignore this message.
==> node-3: Setting hostname...
==> node-3: Configuring and enabling network interfaces...
==> node-3: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> node-3: Running provisioner: shell...
    node-3: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-1cykx8o.sh
    node-3: [INFO] write ssh config to /home/vagrant/.ssh

ssh

新拉起的虚拟机默认用户为vagrant,需要配置本机到虚拟机的免密ssh访问。 执行make ssh命令将调用vagrant的ssh-config命令,将pigsty虚拟机节点的ssh配置文件写入~/.ssh/pigsty_config

通常该命令只需要在首次启动沙箱时执行一次,后续重新拉起的虚拟机通常会保有相同的SSH配置。

执行完毕后,用户才可以使用类似ssh node-1的方式通过SSH别名连接至沙箱内的虚拟机节点。

$ make ssh
cd vagrant && vagrant ssh-config > ~/.ssh/pigsty_config 2>/dev/null; true
if ! grep --quiet "pigsty_config" ~/.ssh/config ; then (echo 'Include ~/.ssh/pigsty_config' && cat ~/.ssh/config) >  ~/.ssh/config.tmp; mv ~/.ssh/config.tmp ~/.ssh/config && chmod 0600 ~/.ssh/config; fi
if ! grep --quiet "StrictHostKeyChecking=no" ~/.ssh/config ; then (echo 'StrictHostKeyChecking=no' && cat ~/.ssh/config) >  ~/.ssh/config.tmp; mv ~/.ssh/config.tmp ~/.ssh/config && chmod 0600 ~/.ssh/config; fi

dns

此命令将Pigsty沙箱虚拟机的静态DNS配置写入/etc/hosts,通常该命令只需要在首次启动沙箱时执行一次。

执行完毕后,用户才可以从本地浏览器使用域名访问 http://g.pigsty 等WebUI。

注意DNS命令需要SUDO权限执行,需要输入密码,因为/etc/hosts文件需要特权方可修改。

$ sudo make dns
Password: #<在此输入用户密码>
if ! grep --quiet "pigsty dns records" /etc/hosts ; then cat files/dns >> /etc/hosts; fi

download

从CDN下载最新的Pigsty离线安装包至本地,大小约1GB,约1分钟下载完成。

$ make download
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz -o files/pkg.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1067M  100 1067M    0     0  15.2M      0  0:01:10  0:01:10 --:--:-- 29.0M

Pigsty是一个复杂的软件系统,为了确保系统的稳定,Pigsty会在初始化过程中从互联网下载所有依赖的软件包并建立本地Yum源。

所有依赖的软件总大小约1GB左右,下载速度取决于您的网络情况。尽管Pigsty已经尽量使用镜像源以加速下载,但少量包的下载仍可能受到防火墙的阻挠,可能出现非常慢的情况。您可以通过proxy_env配置项设置下载代理以完成首次下载,或直接下载预先打包好的离线安装包。

最新的离线安装包地址为:

Github Release:https://github.com/Vonng/pigsty/releases

CDN Download:http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz

您也可以手工下载好后放置于files/pkg.tgz

upload

将下载的离线安装包上传元节点并解压,加速后续初始化。

$ make upload
ssh -t meta "sudo rm -rf /tmp/pkg.tgz"
Connection to 127.0.0.1 closed.
scp -r files/pkg.tgz meta:/tmp/pkg.tgz
pkg.tgz                                                                                                                                                                 100% 1068MB  53.4MB/s   00:19
ssh -t meta "sudo mkdir -p /www/pigsty/; sudo rm -rf /www/pigsty/*; sudo tar -xf /tmp/pkg.tgz --strip-component=1 -C /www/pigsty/"
Connection to 127.0.0.1 closed.

init

完成上述操作后,执行make init即会调用ansible完成Pigsty系统的初始化。

$ make init
./sandbox.yml   # 快速初始化,并行初始化元节点与普通数据库节点

sandbox.yml是专门为本地沙箱环境准备的初始化剧本,通过同时初始化元节点和数据库节点节省了一半时间。 生产环境建议使用infra.ymlpgsql.yml分别依次完成元节点与普通节点的初始化。

如果您已经将离线安装包上传至元节点,那么初始化环境会比较快,视机器配置可能总共需要5~10分钟不等。

若离线安装包不存在,那么Pigsty会在初始化过程中从互联网下载约1GB数据,视网络条件可能需要20分钟或更久。

$ make init
./sandbox.yml                       # interleave sandbox provisioning
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

PLAY [Init local repo] ***********************************************************************************************************************************************************************************

TASK [repo : Create local repo directory] ****************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [repo : Backup & remove existing repos] *************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [repo : Add required upstream repos] ****************************************************************************************************************************************************************
[WARNING]: Using a variable for a task's 'args' is unsafe in some situations (see https://docs.ansible.com/ansible/devel/reference_appendices/faq.html#argsplat-unsafe)
changed: [10.10.10.10] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
changed: [10.10.10.10] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
changed: [10.10.10.10] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
changed: [10.10.10.10] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
changed: [10.10.10.10] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
changed: [10.10.10.10] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
changed: [10.10.10.10] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
changed: [10.10.10.10] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
changed: [10.10.10.10] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})

TASK [repo : Check repo pkgs cache exists] ***************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [repo : Set fact whether repo_exists] ***************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [repo : Move upstream repo to backup] ***************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [repo : Add local file system repos] ****************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [repo : Remake yum cache if not exists] *************************************************************************************************************************************************************
[WARNING]: Consider using the yum module rather than running 'yum'.  If you need to use command because yum is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [10.10.10.10]

TASK [repo : Install repo bootstrap packages] ************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=['yum-utils', 'createrepo', 'ansible', 'nginx', 'wget'])

TASK [repo : Render repo nginx server files] *************************************************************************************************************************************************************
changed: [10.10.10.10] => (item={'src': 'index.html.j2', 'dest': '/www/index.html'})
changed: [10.10.10.10] => (item={'src': 'default.conf.j2', 'dest': '/etc/nginx/conf.d/default.conf'})
changed: [10.10.10.10] => (item={'src': 'local.repo.j2', 'dest': '/www/pigsty.repo'})
changed: [10.10.10.10] => (item={'src': 'nginx.conf.j2', 'dest': '/etc/nginx/nginx.conf'})

TASK [repo : Disable selinux for repo server] ************************************************************************************************************************************************************
[WARNING]: SELinux state temporarily changed from 'enforcing' to 'permissive'. State change will take effect next reboot.
changed: [10.10.10.10]

TASK [repo : Launch repo nginx server] *******************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [repo : Waits repo server online] *******************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [repo : Download web url packages] ******************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter-0.3.2-1.el7.x86_64.rpm)
skipping: [10.10.10.10] => (item=https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm)
skipping: [10.10.10.10] => (item=http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm)

TASK [repo : Download repo packages] *********************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=epel-release nginx wget yum-utils yum createrepo)
skipping: [10.10.10.10] => (item=ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned)
skipping: [10.10.10.10] => (item=readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates)
skipping: [10.10.10.10] => (item=numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet)
skipping: [10.10.10.10] => (item=grafana prometheus2 pushgateway alertmanager)
skipping: [10.10.10.10] => (item=node_exporter postgres_exporter nginx_exporter blackbox_exporter)
skipping: [10.10.10.10] => (item=consul consul_exporter consul-template etcd)
skipping: [10.10.10.10] => (item=ansible python python-pip python-psycopg2 audit)
skipping: [10.10.10.10] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.10] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.10] => (item=haproxy keepalived dnsmasq)
skipping: [10.10.10.10] => (item=patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity)
skipping: [10.10.10.10] => (item=pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel)
skipping: [10.10.10.10] => (item=barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail)
skipping: [10.10.10.10] => (item=postgresql13* postgis31* citus_13 timescaledb_13)
skipping: [10.10.10.10] => (item=pg_repack13 pg_squeeze13)
skipping: [10.10.10.10] => (item=pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13)
skipping: [10.10.10.10] => (item=plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13)
skipping: [10.10.10.10] => (item=hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13)
skipping: [10.10.10.10] => (item=wal2json13 count_distinct13 ddlx_13 geoip13 orafce13)
skipping: [10.10.10.10] => (item=rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13)
skipping: [10.10.10.10] => (item=pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13)
skipping: [10.10.10.10] => (item=pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13)
skipping: [10.10.10.10] => (item=pguint13 pguri13 prefix13  safeupdate_13 semver13  table_version13 tdigest13)

TASK [repo : Download repo pkg deps] *********************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=epel-release nginx wget yum-utils yum createrepo)
skipping: [10.10.10.10] => (item=ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned)
skipping: [10.10.10.10] => (item=readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates)
skipping: [10.10.10.10] => (item=numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet)
skipping: [10.10.10.10] => (item=grafana prometheus2 pushgateway alertmanager)
skipping: [10.10.10.10] => (item=node_exporter postgres_exporter nginx_exporter blackbox_exporter)
skipping: [10.10.10.10] => (item=consul consul_exporter consul-template etcd)
skipping: [10.10.10.10] => (item=ansible python python-pip python-psycopg2 audit)
skipping: [10.10.10.10] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.10] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.10] => (item=haproxy keepalived dnsmasq)
skipping: [10.10.10.10] => (item=patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity)
skipping: [10.10.10.10] => (item=pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel)
skipping: [10.10.10.10] => (item=barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail)
skipping: [10.10.10.10] => (item=postgresql13* postgis31* citus_13 timescaledb_13)
skipping: [10.10.10.10] => (item=pg_repack13 pg_squeeze13)
skipping: [10.10.10.10] => (item=pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13)
skipping: [10.10.10.10] => (item=plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13)
skipping: [10.10.10.10] => (item=hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13)
skipping: [10.10.10.10] => (item=wal2json13 count_distinct13 ddlx_13 geoip13 orafce13)
skipping: [10.10.10.10] => (item=rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13)
skipping: [10.10.10.10] => (item=pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13)
skipping: [10.10.10.10] => (item=pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13)
skipping: [10.10.10.10] => (item=pguint13 pguri13 prefix13  safeupdate_13 semver13  table_version13 tdigest13)

TASK [repo : Create local repo index] ********************************************************************************************************************************************************************
skipping: [10.10.10.10]

TASK [repo : Copy bootstrap scripts] *********************************************************************************************************************************************************************
skipping: [10.10.10.10]

TASK [repo : Mark repo cache as valid] *******************************************************************************************************************************************************************
skipping: [10.10.10.10]

PLAY [Provision Node] ************************************************************************************************************************************************************************************

TASK [node : Update node hostname] ***********************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [node : Add new hostname to /etc/hosts] *************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [node : Write static dns records] *******************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=10.10.10.10 yum.pigsty)
changed: [10.10.10.11] => (item=10.10.10.10 yum.pigsty)
changed: [10.10.10.13] => (item=10.10.10.10 yum.pigsty)
changed: [10.10.10.12] => (item=10.10.10.10 yum.pigsty)

TASK [node : Get old nameservers] ************************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [node : Truncate resolv file] ***********************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [node : Write resolv options] ***********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.12] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.10] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.13] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.11] => (item=domain service.consul)
changed: [10.10.10.12] => (item=domain service.consul)
changed: [10.10.10.13] => (item=domain service.consul)
changed: [10.10.10.10] => (item=domain service.consul)

TASK [node : Add new nameservers] ************************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=10.10.10.10)
changed: [10.10.10.12] => (item=10.10.10.10)
changed: [10.10.10.10] => (item=10.10.10.10)
changed: [10.10.10.13] => (item=10.10.10.10)

TASK [node : Append old nameservers] *********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=10.0.2.3)
changed: [10.10.10.12] => (item=10.0.2.3)
changed: [10.10.10.10] => (item=10.0.2.3)
changed: [10.10.10.13] => (item=10.0.2.3)

TASK [node : Node configure disable firewall] ************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [node : Node disable selinux by default] ************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
[WARNING]: SELinux state change will take effect next reboot
ok: [10.10.10.10]

TASK [node : Backup existing repos] **********************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [node : Install upstream repo] **********************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.12] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.13] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.13] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.13] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.13] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.13] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.10] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.10] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.11] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.10] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.11] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.12] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.10] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.11] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.12] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.10] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.11] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.12] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.10] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.11] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.12] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.13] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.10] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.11] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.12] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.10] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.11] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.12] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.11] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
skipping: [10.10.10.12] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.12] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})

TASK [node : Install local repo] *************************************************************************************************************************************************************************
changed: [10.10.10.13] => (item=http://yum.pigsty/pigsty.repo)
changed: [10.10.10.12] => (item=http://yum.pigsty/pigsty.repo)
changed: [10.10.10.11] => (item=http://yum.pigsty/pigsty.repo)
changed: [10.10.10.10] => (item=http://yum.pigsty/pigsty.repo)

TASK [node : Install node basic packages] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=[])
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])

TASK [node : Install node extra packages] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=[])
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])

TASK [node : Install meta specific packages] *************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=[])
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])

TASK [node : Install node basic packages] ****************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
changed: [10.10.10.13] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
changed: [10.10.10.11] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
changed: [10.10.10.12] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])

TASK [node : Install node extra packages] ****************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
changed: [10.10.10.12] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
changed: [10.10.10.13] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
changed: [10.10.10.10] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])

TASK [node : Install meta specific packages] *************************************************************************************************************************************************************
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])
changed: [10.10.10.10] => (item=['grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway', 'dnsmasq,nginx,ansible,pgbadger,polysh'])

TASK [node : Node configure disable numa] ****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [node : Node configure disable swap] ****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [node : Node configure unmount swap] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=swap)
skipping: [10.10.10.10] => (item=none)
skipping: [10.10.10.11] => (item=swap)
skipping: [10.10.10.11] => (item=none)
skipping: [10.10.10.12] => (item=swap)
skipping: [10.10.10.12] => (item=none)
skipping: [10.10.10.13] => (item=swap)
skipping: [10.10.10.13] => (item=none)

TASK [node : Node setup static network] ******************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Node configure disable firewall] ************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [node : Node configure disk prefetch] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [node : Enable linux kernel modules] ****************************************************************************************************************************************************************
changed: [10.10.10.13] => (item=softdog)
changed: [10.10.10.12] => (item=softdog)
changed: [10.10.10.11] => (item=softdog)
changed: [10.10.10.10] => (item=softdog)
changed: [10.10.10.13] => (item=br_netfilter)
changed: [10.10.10.12] => (item=br_netfilter)
changed: [10.10.10.11] => (item=br_netfilter)
changed: [10.10.10.10] => (item=br_netfilter)
changed: [10.10.10.12] => (item=ip_vs)
changed: [10.10.10.13] => (item=ip_vs)
changed: [10.10.10.11] => (item=ip_vs)
changed: [10.10.10.10] => (item=ip_vs)
changed: [10.10.10.13] => (item=ip_vs_rr)
changed: [10.10.10.12] => (item=ip_vs_rr)
changed: [10.10.10.11] => (item=ip_vs_rr)
changed: [10.10.10.10] => (item=ip_vs_rr)
ok: [10.10.10.13] => (item=ip_vs_rr)
ok: [10.10.10.12] => (item=ip_vs_rr)
ok: [10.10.10.11] => (item=ip_vs_rr)
ok: [10.10.10.10] => (item=ip_vs_rr)
changed: [10.10.10.13] => (item=ip_vs_wrr)
changed: [10.10.10.12] => (item=ip_vs_wrr)
changed: [10.10.10.11] => (item=ip_vs_wrr)
changed: [10.10.10.10] => (item=ip_vs_wrr)
changed: [10.10.10.13] => (item=ip_vs_sh)
changed: [10.10.10.12] => (item=ip_vs_sh)
changed: [10.10.10.11] => (item=ip_vs_sh)
changed: [10.10.10.10] => (item=ip_vs_sh)
changed: [10.10.10.13] => (item=nf_conntrack_ipv4)
changed: [10.10.10.12] => (item=nf_conntrack_ipv4)
changed: [10.10.10.11] => (item=nf_conntrack_ipv4)
changed: [10.10.10.10] => (item=nf_conntrack_ipv4)

TASK [node : Enable kernel module on reboot] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.10]

TASK [node : Get config parameter page count] ************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [node : Get config parameter page size] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [node : Tune shmmax and shmall via mem] *************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [node : Create tuned profiles] **********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=oltp)
changed: [10.10.10.12] => (item=oltp)
changed: [10.10.10.10] => (item=oltp)
changed: [10.10.10.13] => (item=oltp)
changed: [10.10.10.11] => (item=olap)
changed: [10.10.10.12] => (item=olap)
changed: [10.10.10.13] => (item=olap)
changed: [10.10.10.10] => (item=olap)
changed: [10.10.10.11] => (item=crit)
changed: [10.10.10.12] => (item=crit)
changed: [10.10.10.13] => (item=crit)
changed: [10.10.10.10] => (item=crit)
changed: [10.10.10.11] => (item=tiny)
changed: [10.10.10.12] => (item=tiny)
changed: [10.10.10.13] => (item=tiny)
changed: [10.10.10.10] => (item=tiny)

TASK [node : Render tuned profiles] **********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=oltp)
changed: [10.10.10.12] => (item=oltp)
changed: [10.10.10.13] => (item=oltp)
changed: [10.10.10.10] => (item=oltp)
changed: [10.10.10.12] => (item=olap)
changed: [10.10.10.11] => (item=olap)
changed: [10.10.10.13] => (item=olap)
changed: [10.10.10.10] => (item=olap)
changed: [10.10.10.12] => (item=crit)
changed: [10.10.10.11] => (item=crit)
changed: [10.10.10.13] => (item=crit)
changed: [10.10.10.10] => (item=crit)
changed: [10.10.10.11] => (item=tiny)
changed: [10.10.10.12] => (item=tiny)
changed: [10.10.10.13] => (item=tiny)
changed: [10.10.10.10] => (item=tiny)

TASK [node : Active tuned profile] ***********************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]

TASK [node : Change additional sysctl params] ************************************************************************************************************************************************************
changed: [10.10.10.13] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
changed: [10.10.10.12] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
changed: [10.10.10.11] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
changed: [10.10.10.10] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})

TASK [node : Copy default user bash profile] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Setup node default pam ulimits] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Create os user group admin] *****************************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Create os user admin] ***********************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]

TASK [node : Grant admin group nopass sudo] **************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Add no host checking to ssh config] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Add admin ssh no host checking] *************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [node : Fetch all admin public keys] ****************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [node : Exchange all admin ssh keys] ****************************************************************************************************************************************************************
changed: [10.10.10.10 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.10'])
changed: [10.10.10.13 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.10'])
changed: [10.10.10.12 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.10'])
changed: [10.10.10.11 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.10'])
changed: [10.10.10.10 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.11'])
changed: [10.10.10.12 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.11'])
changed: [10.10.10.13 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.11'])
changed: [10.10.10.11 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.11'])
changed: [10.10.10.10 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.12'])
changed: [10.10.10.13 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.12'])
changed: [10.10.10.12 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.12'])
changed: [10.10.10.11 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.12'])
changed: [10.10.10.10 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.13'])
changed: [10.10.10.13 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.13'])
changed: [10.10.10.11 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.13'])
changed: [10.10.10.12 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.13'])

TASK [node : Install public keys] ************************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
changed: [10.10.10.10] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
changed: [10.10.10.12] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
changed: [10.10.10.13] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)

TASK [node : Install ntp package] ************************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.10]
ok: [10.10.10.13]

TASK [node : Install chrony package] *********************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
ok: [10.10.10.10]

TASK [node : Setup default node timezone] ****************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]

TASK [node : Copy the ntp.conf file] *********************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Copy the chrony.conf template] **************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [node : Launch ntpd service] ************************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [node : Launch chronyd service] *********************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

PLAY [Init meta service] *********************************************************************************************************************************************************************************

TASK [ca : Create local ca directory] ********************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [ca : Copy ca cert from local files] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=ca.key)
skipping: [10.10.10.10] => (item=ca.crt)

TASK [ca : Check ca key cert exists] *********************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [ca : Create self-signed CA key-cert] ***************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nameserver : Make sure dnsmasq package installed] **************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nameserver : Copy dnsmasq /etc/dnsmasq.d/config] ***************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nameserver : Add dynamic dns records to meta] ******************************************************************************************************************************************************
changed: [10.10.10.10] => (item=10.10.10.2  pg-meta)
changed: [10.10.10.10] => (item=10.10.10.3  pg-test)
changed: [10.10.10.10] => (item=10.10.10.10 meta-1)
changed: [10.10.10.10] => (item=10.10.10.11 node-1)
changed: [10.10.10.10] => (item=10.10.10.12 node-2)
changed: [10.10.10.10] => (item=10.10.10.13 node-3)
changed: [10.10.10.10] => (item=10.10.10.10 pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 y.pigsty yum.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 c.pigsty consul.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 g.pigsty grafana.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 p.pigsty prometheus.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 a.pigsty alertmanager.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 n.pigsty ntp.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 h.pigsty haproxy.pigsty)

TASK [nameserver : Launch meta dnsmasq service] **********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nameserver : Wait for meta dnsmasq online] *********************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nameserver : Register consul dnsmasq service] ******************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nameserver : Reload consul] ************************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Make sure nginx installed] *****************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nginx : Create local html directory] ***************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nginx : Create nginx config directory] *************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Update default nginx index page] ***********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Copy nginx default config] *****************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nginx : Copy nginx upstream conf] ******************************************************************************************************************************************************************
changed: [10.10.10.10] => (item={'name': 'home', 'host': 'pigsty', 'url': '127.0.0.1:3000'})
changed: [10.10.10.10] => (item={'name': 'consul', 'host': 'c.pigsty', 'url': '127.0.0.1:8500'})
changed: [10.10.10.10] => (item={'name': 'grafana', 'host': 'g.pigsty', 'url': '127.0.0.1:3000'})
changed: [10.10.10.10] => (item={'name': 'prometheus', 'host': 'p.pigsty', 'url': '127.0.0.1:9090'})
changed: [10.10.10.10] => (item={'name': 'alertmanager', 'host': 'a.pigsty', 'url': '127.0.0.1:9093'})
changed: [10.10.10.10] => (item={'name': 'haproxy', 'host': 'h.pigsty', 'url': '127.0.0.1:9091'})

TASK [nginx : Templating /etc/nginx/haproxy.conf] ********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Render haproxy upstream in cluster mode] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)

TASK [nginx : Render haproxy location in cluster mode] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)

TASK [nginx : Templating haproxy cluster index] **********************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)

TASK [nginx : Templating haproxy cluster index] **********************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
ok: [10.10.10.10] => (item=pg-test)

TASK [nginx : Restart meta nginx service] ****************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Wait for nginx service online] *************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nginx : Make sure nginx exporter installed] ********************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nginx : Config nginx_exporter options] *************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Restart nginx_exporter service] ************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Wait for nginx exporter online] ************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [nginx : Register cosnul nginx service] *************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Register consul nginx-exporter service] ****************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [nginx : Reload consul] *****************************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Install prometheus and alertmanager] **************************************************************************************************************************************************
ok: [10.10.10.10] => (item=prometheus2)
ok: [10.10.10.10] => (item=alertmanager)

TASK [prometheus : Wipe out prometheus config dir] *******************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Wipe out existing prometheus data] ****************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [prometheus : Create postgres directory structure] **************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/prometheus)
changed: [10.10.10.10] => (item=/etc/prometheus/bin)
changed: [10.10.10.10] => (item=/etc/prometheus/rules)
changed: [10.10.10.10] => (item=/etc/prometheus/targets)
changed: [10.10.10.10] => (item=/export/prometheus/data)

TASK [prometheus : Copy prometheus bin scripts] **********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Copy prometheus rules scripts] ********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Copy altermanager config] *************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Render prometheus config] *************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Config /etc/prometheus opts] **********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Launch prometheus service] ************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Launch alertmanager service] **********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Wait for prometheus online] ***********************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [prometheus : Wait for alertmanager online] *********************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [prometheus : Render prometheus targets in cluster mode] ********************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)

TASK [prometheus : Reload prometheus service] ************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Copy prometheus service definition] ***************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Copy alertmanager service definition] *************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [prometheus : Reload consul to register prometheus] *************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Make sure grafana is installed] **********************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [grafana : Check grafana plugin cache exists] *******************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [grafana : Provision grafana plugins via cache] *****************************************************************************************************************************************************
[WARNING]: Consider using the file module with state=absent rather than running 'rm'.  If you need to use command because file is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [10.10.10.10]

TASK [grafana : Download grafana plugins from web] *******************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=redis-datasource)
skipping: [10.10.10.10] => (item=simpod-json-datasource)
skipping: [10.10.10.10] => (item=fifemon-graphql-datasource)
skipping: [10.10.10.10] => (item=sbueringer-consul-datasource)
skipping: [10.10.10.10] => (item=camptocamp-prometheus-alertmanager-datasource)
skipping: [10.10.10.10] => (item=ryantxu-ajax-panel)
skipping: [10.10.10.10] => (item=marcusolsson-hourly-heatmap-panel)
skipping: [10.10.10.10] => (item=michaeldmoore-multistat-panel)
skipping: [10.10.10.10] => (item=marcusolsson-treemap-panel)
skipping: [10.10.10.10] => (item=pr0ps-trackmap-panel)
skipping: [10.10.10.10] => (item=dalvany-image-panel)
skipping: [10.10.10.10] => (item=magnesium-wordcloud-panel)
skipping: [10.10.10.10] => (item=cloudspout-button-panel)
skipping: [10.10.10.10] => (item=speakyourcode-button-panel)
skipping: [10.10.10.10] => (item=jdbranham-diagram-panel)
skipping: [10.10.10.10] => (item=grafana-piechart-panel)
skipping: [10.10.10.10] => (item=snuids-radar-panel)
skipping: [10.10.10.10] => (item=digrich-bubblechart-panel)

TASK [grafana : Download grafana plugins from web] *******************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=https://github.com/Vonng/grafana-echarts)

TASK [grafana : Create grafana plugins cache] ************************************************************************************************************************************************************
skipping: [10.10.10.10]

TASK [grafana : Copy /etc/grafana/grafana.ini] ***********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Remove grafana provision dir] ************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Copy provisioning content] ***************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Copy pigsty dashboards] ******************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Copy pigsty icon image] ******************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Replace grafana icon with pigsty] ********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Launch grafana service] ******************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Wait for grafana online] *****************************************************************************************************************************************************************
ok: [10.10.10.10]

TASK [grafana : Update grafana default preferences] ******************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Register consul grafana service] *********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [grafana : Reload consul] ***************************************************************************************************************************************************************************
changed: [10.10.10.10]

PLAY [Init dcs] ******************************************************************************************************************************************************************************************

TASK [consul : Check for existing consul] ****************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [consul : Consul exists flag fact set] **************************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [consul : Abort due to consul exists] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Clean existing consul instance] ***********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Stop any running consul instance] *********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [consul : Remove existing consul dir] ***************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/consul.d)
changed: [10.10.10.11] => (item=/etc/consul.d)
changed: [10.10.10.12] => (item=/etc/consul.d)
changed: [10.10.10.13] => (item=/etc/consul.d)
changed: [10.10.10.10] => (item=/var/lib/consul)
changed: [10.10.10.11] => (item=/var/lib/consul)
changed: [10.10.10.12] => (item=/var/lib/consul)
changed: [10.10.10.13] => (item=/var/lib/consul)

TASK [consul : Recreate consul dir] **********************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/consul.d)
changed: [10.10.10.11] => (item=/etc/consul.d)
changed: [10.10.10.12] => (item=/etc/consul.d)
changed: [10.10.10.13] => (item=/etc/consul.d)
changed: [10.10.10.10] => (item=/var/lib/consul)
changed: [10.10.10.11] => (item=/var/lib/consul)
changed: [10.10.10.13] => (item=/var/lib/consul)
changed: [10.10.10.12] => (item=/var/lib/consul)

TASK [consul : Make sure consul is installed] ************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [consul : Make sure consul dir exists] **************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [consul : Get dcs server node names] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Get dcs node name from var] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Get dcs node name from var] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [consul : Fetch hostname as dcs node name] **********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Get dcs name from hostname] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Copy /etc/consul.d/consul.json] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [consul : Copy consul agent service] ****************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [consul : Get dcs bootstrap expect quroum] **********************************************************************************************************************************************************
ok: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Copy consul server service unit] **********************************************************************************************************************************************************
changed: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Launch consul server service] *************************************************************************************************************************************************************
changed: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Wait for consul server online] ************************************************************************************************************************************************************
ok: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [consul : Launch consul agent service] **************************************************************************************************************************************************************
skipping: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [consul : Wait for consul agent online] *************************************************************************************************************************************************************
skipping: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

PLAY [Init database cluster] *****************************************************************************************************************************************************************************

TASK [postgres : Create os group postgres] ***************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Make sure dcs group exists] *************************************************************************************************************************************************************
ok: [10.10.10.10] => (item=consul)
ok: [10.10.10.11] => (item=consul)
ok: [10.10.10.12] => (item=consul)
ok: [10.10.10.13] => (item=consul)
ok: [10.10.10.11] => (item=etcd)
ok: [10.10.10.10] => (item=etcd)
ok: [10.10.10.12] => (item=etcd)
ok: [10.10.10.13] => (item=etcd)

TASK [postgres : Create dbsu postgres] *******************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Grant dbsu nopass sudo] *****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Grant dbsu all sudo] ********************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Grant dbsu limited sudo] ****************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Config patroni watchdog support] ********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Add dbsu ssh no host checking] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Fetch dbsu public keys] *****************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Exchange dbsu ssh keys] *****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.10'])
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.11'])
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.12'])
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.13'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.10'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.11'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.12'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.13'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.10'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.11'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.12'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.13'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.10'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.11'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.12'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.13'])

TASK [postgres : Install offical pgdg yum repo] **********************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=postgresql${pg_version}*)
skipping: [10.10.10.10] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.10] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.11] => (item=postgresql${pg_version}*)
skipping: [10.10.10.10] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.11] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.10] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.11] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.12] => (item=postgresql${pg_version}*)
skipping: [10.10.10.10] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.11] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.12] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.11] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.12] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.13] => (item=postgresql${pg_version}*)
skipping: [10.10.10.11] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.12] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.13] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.12] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.13] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.12] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.13] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.13] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.13] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)

TASK [postgres : Install pg packages] ********************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
changed: [10.10.10.11] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
changed: [10.10.10.13] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
changed: [10.10.10.12] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])

TASK [postgres : Install pg extensions] ******************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
changed: [10.10.10.10] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
changed: [10.10.10.13] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
changed: [10.10.10.12] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])

TASK [postgres : Link /usr/pgsql to current version] *****************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Add pg bin dir to profile path] *********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Fix directory ownership] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Remove default postgres service] ********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Check necessary variables exists] *******************************************************************************************************************************************************
ok: [10.10.10.10] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [postgres : Fetch variables via pg_cluster] *********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Set cluster basic facts for hosts] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Assert cluster primary singleton] *******************************************************************************************************************************************************
ok: [10.10.10.10] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.11] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.12] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [10.10.10.13] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [postgres : Setup cluster primary ip address] *******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Setup repl upstream for primary] ********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Setup repl upstream for replicas] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Debug print instance summary] ***********************************************************************************************************************************************************
ok: [10.10.10.10] => {
    "msg": "cluster=pg-meta service=pg-meta-primary instance=pg-meta-1 replication=[primary:itself]->10.10.10.10"
}
ok: [10.10.10.11] => {
    "msg": "cluster=pg-test service=pg-test-primary instance=pg-test-1 replication=[primary:itself]->10.10.10.11"
}
ok: [10.10.10.12] => {
    "msg": "cluster=pg-test service=pg-test-replica instance=pg-test-2 replication=[primary:itself]->10.10.10.12"
}
ok: [10.10.10.13] => {
    "msg": "cluster=pg-test service=pg-test-offline instance=pg-test-3 replication=[primary:itself]->10.10.10.13"
}

TASK [postgres : Check for existing postgres instance] ***************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Set fact whether pg port is open] *******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Abort due to existing postgres instance] ************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Clean existing postgres instance] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Shutdown existing postgres service] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Remove registerd consul service] ********************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.10]

TASK [postgres : Remove postgres metadata in consul] *****************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.10]

TASK [postgres : Remove existing postgres data] **********************************************************************************************************************************************************
ok: [10.10.10.10] => (item=/pg)
ok: [10.10.10.11] => (item=/pg)
ok: [10.10.10.12] => (item=/pg)
ok: [10.10.10.13] => (item=/pg)
ok: [10.10.10.10] => (item=/export/postgres)
ok: [10.10.10.11] => (item=/export/postgres)
ok: [10.10.10.12] => (item=/export/postgres)
ok: [10.10.10.13] => (item=/export/postgres)
ok: [10.10.10.10] => (item=/var/backups/postgres)
ok: [10.10.10.11] => (item=/var/backups/postgres)
ok: [10.10.10.12] => (item=/var/backups/postgres)
ok: [10.10.10.13] => (item=/var/backups/postgres)
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)

TASK [postgres : Make sure main and backup dir exists] ***************************************************************************************************************************************************
changed: [10.10.10.11] => (item=/export)
changed: [10.10.10.12] => (item=/export)
changed: [10.10.10.13] => (item=/export)
changed: [10.10.10.10] => (item=/export)
changed: [10.10.10.11] => (item=/var/backups)
changed: [10.10.10.12] => (item=/var/backups)
changed: [10.10.10.13] => (item=/var/backups)
changed: [10.10.10.10] => (item=/var/backups)

TASK [postgres : Create postgres directory structure] ****************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/export/postgres)
changed: [10.10.10.11] => (item=/export/postgres)
changed: [10.10.10.12] => (item=/export/postgres)
changed: [10.10.10.13] => (item=/export/postgres)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/bin)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/bin)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/bin)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/bin)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/log)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/log)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/log)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/log)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/tmp)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/tmp)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/tmp)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/tmp)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/conf)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/conf)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/conf)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/conf)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/data)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/data)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/data)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/data)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/meta)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/meta)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/meta)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/meta)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/stat)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/stat)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/stat)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/stat)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/change)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/change)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/change)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/change)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/postgres)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/postgres)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/postgres)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/postgres)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/arcwal)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/arcwal)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/arcwal)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/arcwal)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/backup)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/backup)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/backup)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/backup)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/remote)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/remote)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/remote)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/remote)

TASK [postgres : Create pgbouncer directory structure] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)

TASK [postgres : Create links from pgbkup to pgroot] *****************************************************************************************************************************************************
changed: [10.10.10.10] => (item=arcwal)
changed: [10.10.10.11] => (item=arcwal)
changed: [10.10.10.12] => (item=arcwal)
changed: [10.10.10.13] => (item=arcwal)
changed: [10.10.10.10] => (item=backup)
changed: [10.10.10.11] => (item=backup)
changed: [10.10.10.12] => (item=backup)
changed: [10.10.10.13] => (item=backup)
changed: [10.10.10.10] => (item=remote)
changed: [10.10.10.11] => (item=remote)
changed: [10.10.10.12] => (item=remote)
changed: [10.10.10.13] => (item=remote)

TASK [postgres : Create links from current cluster] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]

TASK [postgres : Copy pg_cluster to /pg/meta/cluster] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Copy pg_version to /pg/meta/version] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Copy pg_instance to /pg/meta/instance] **************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Copy pg_seq to /pg/meta/sequence] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Copy pg_role to /pg/meta/role] **********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Copy postgres scripts to /pg/bin/] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Copy alias profile to /etc/profile.d] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Copy psqlrc to postgres home] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Setup hostname to pg instance name] *****************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Copy consul node-meta definition] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Restart consul to load new node-meta] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [postgres : Config patroni watchdog support] ********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Get config parameter page count] ********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Get config parameter page size] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [postgres : Tune shared buffer and work mem] ********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Hanlde small size mem occasion] *********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Calculate postgres mem params] **********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : create patroni config dir] **************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : use predefined patroni template] ********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Render default /pg/conf/patroni.yml] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Link /pg/conf/patroni to /pg/bin/] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Link /pg/bin/patroni.yml to /etc/patroni/] **********************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Config patroni watchdog support] ********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Copy patroni systemd service file] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : create patroni systemd drop-in dir] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Copy postgres systemd service file] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Drop-In consul dependency for patroni] **************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Render default initdb scripts] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [postgres : Launch patroni on primary instance] *****************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Wait for patroni primary online] ********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
ok: [10.10.10.10]
ok: [10.10.10.11]

TASK [postgres : Wait for postgres primary online] *******************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
ok: [10.10.10.10]
ok: [10.10.10.11]

TASK [postgres : Check primary postgres service ready] ***************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
[WARNING]: Module remote_tmp /var/lib/pgsql/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir
with the correct permissions manually
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Check replication connectivity to primary] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Render init roles sql] ******************************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Render init template sql] ***************************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Render default pg-init scripts] *********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.10]

TASK [postgres : Execute initialization scripts] *********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Check primary instance ready] ***********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Add dbsu password to pgpass if exists] **************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Add system user to pgpass] **************************************************************************************************************************************************************
changed: [10.10.10.10] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.11] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.12] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.13] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.11] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.10] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.13] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.12] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.13] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
changed: [10.10.10.12] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
changed: [10.10.10.10] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
changed: [10.10.10.11] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})

TASK [postgres : Check replication connectivity to primary] **********************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Launch patroni on replica instances] ****************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Wait for patroni replica online] ********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Wait for postgres replica online] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Check replica postgres service ready] ***************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Render hba rules] ***********************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]

TASK [postgres : Reload hba rules] ***********************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Pause patroni] **************************************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]

TASK [postgres : Stop patroni on replica instance] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Stop patroni on primary instance] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Launch raw postgres on primary] *********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Launch raw postgres on primary] *********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Wait for postgres online] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Check pgbouncer is installed] ***********************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [postgres : Stop existing pgbouncer service] ********************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Remove existing pgbouncer dirs] *********************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)

TASK [postgres : Recreate dirs with owner postgres] ******************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)

TASK [postgres : Copy /etc/pgbouncer/pgbouncer.ini] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [postgres : Copy /etc/pgbouncer/pgb_hba.conf] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Touch userlist and database list] *******************************************************************************************************************************************************
changed: [10.10.10.10] => (item=database.txt)
changed: [10.10.10.11] => (item=database.txt)
changed: [10.10.10.12] => (item=database.txt)
changed: [10.10.10.13] => (item=database.txt)
changed: [10.10.10.10] => (item=userlist.txt)
changed: [10.10.10.11] => (item=userlist.txt)
changed: [10.10.10.12] => (item=userlist.txt)
changed: [10.10.10.13] => (item=userlist.txt)

TASK [postgres : Add default users to pgbouncer] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [postgres : Copy pgbouncer systemd service] *********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [postgres : Launch pgbouncer pool service] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [postgres : Wait for pgbouncer service online] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [postgres : Check pgbouncer service is ready] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : include_tasks] **************************************************************************************************************************************************************************
included: /private/tmp/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.10 => (item={'name': 'dbuser_meta', 'password': 'DBUser.Meta', 'login': True, 'superuser': False, 'createdb': False, 'createrole': False, 'inherit': True, 'replication': False, 'bypassrls': False, 'connlimit': -1, 'expire_at': '2030-12-31', 'expire_in': 365, 'roles': ['dbrole_readwrite'], 'pgbouncer': True, 'parameters': {'search_path': 'public'}, 'comment': 'test user'})
included: /private/tmp/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.10 => (item={'name': 'dbuser_vonng2', 'password': 'DBUser.Vonng', 'roles': ['dbrole_offline'], 'expire_in': 365, 'pgbouncer': False, 'comment': 'example personal user for interactive queries'})
included: /private/tmp/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13 => (item={'name': 'test', 'password': 'test', 'roles': ['dbrole_readwrite'], 'pgbouncer': True, 'comment': 'default test user for production usage'})

TASK [postgres : Render user dbuser_meta creation sql] ***************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Execute user dbuser_meta creation sql on primary] ***************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Add user to pgbouncer] ******************************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Render user dbuser_vonng2 creation sql] *************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Execute user dbuser_vonng2 creation sql on primary] *************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Add user to pgbouncer] ******************************************************************************************************************************************************************
skipping: [10.10.10.10]

TASK [postgres : Render user test creation sql] **********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [postgres : Execute user test creation sql on primary] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [postgres : Add user to pgbouncer] ******************************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [postgres : include_tasks] **************************************************************************************************************************************************************************
included: /private/tmp/pigsty/roles/postgres/tasks/createdb.yml for 10.10.10.10 => (item={'name': 'meta', 'allowconn': True, 'revokeconn': False, 'connlimit': -1, 'extensions': [{'name': 'postgis', 'schema': 'public'}], 'parameters': {'enable_partitionwise_join': True}, 'pgbouncer': True, 'comment': 'pigsty meta database'})
included: /private/tmp/pigsty/roles/postgres/tasks/createdb.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13 => (item={'name': 'test'})

TASK [postgres : debug] **********************************************************************************************************************************************************************************
ok: [10.10.10.10] => {
    "msg": {
        "allowconn": true,
        "comment": "pigsty meta database",
        "connlimit": -1,
        "extensions": [
            {
                "name": "postgis",
                "schema": "public"
            }
        ],
        "name": "meta",
        "parameters": {
            "enable_partitionwise_join": true
        },
        "pgbouncer": true,
        "revokeconn": false
    }
}

TASK [postgres : Render database meta creation sql] ******************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Render database meta baseline sql] ******************************************************************************************************************************************************
skipping: [10.10.10.10]

TASK [postgres : Execute database meta creation command] *************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Execute database meta creation sql] *****************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : Execute database meta creation sql] *****************************************************************************************************************************************************
skipping: [10.10.10.10]

TASK [postgres : Add pgbouncer busniess database] ********************************************************************************************************************************************************
changed: [10.10.10.10]

TASK [postgres : debug] **********************************************************************************************************************************************************************************
ok: [10.10.10.11] => {
    "msg": {
        "name": "test"
    }
}
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Render database test creation sql] ******************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [postgres : Render database test baseline sql] ******************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Execute database test creation command] *************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [postgres : Execute database test creation sql] *****************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]

TASK [postgres : Execute database test creation sql] *****************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [postgres : Add pgbouncer busniess database] ********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [postgres : Reload pgbouncer to add db and users] ***************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.11]

TASK [postgres : Copy pg service definition to consul] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=postgres)
changed: [10.10.10.11] => (item=postgres)
changed: [10.10.10.12] => (item=postgres)
changed: [10.10.10.13] => (item=postgres)
changed: [10.10.10.10] => (item=pgbouncer)
changed: [10.10.10.11] => (item=pgbouncer)
changed: [10.10.10.12] => (item=pgbouncer)
changed: [10.10.10.13] => (item=pgbouncer)
changed: [10.10.10.10] => (item=patroni)
changed: [10.10.10.11] => (item=patroni)
changed: [10.10.10.12] => (item=patroni)
changed: [10.10.10.13] => (item=patroni)

TASK [postgres : Reload postgres consul service] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [postgres : Render grafana datasource definition] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [postgres : Register datasource to grafana] *********************************************************************************************************************************************************
[WARNING]: Consider using the get_url or uri module rather than running 'curl'.  If you need to use command because get_url or uri is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [monitor : Install exporter yum repo] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [monitor : Install node_exporter and pg_exporter] ***************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=node_exporter)
skipping: [10.10.10.10] => (item=pg_exporter)
skipping: [10.10.10.11] => (item=node_exporter)
skipping: [10.10.10.11] => (item=pg_exporter)
skipping: [10.10.10.12] => (item=node_exporter)
skipping: [10.10.10.12] => (item=pg_exporter)
skipping: [10.10.10.13] => (item=node_exporter)
skipping: [10.10.10.13] => (item=pg_exporter)

TASK [monitor : Copy node_exporter binary] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [monitor : Copy pg_exporter binary] *****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [monitor : Create /etc/pg_exporter conf dir] ********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [monitor : Copy default pg_exporter.yaml] ***********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [monitor : Config /etc/default/pg_exporter] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [monitor : Config pg_exporter service unit] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [monitor : Launch pg_exporter systemd service] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [monitor : Wait for pg_exporter service online] *****************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.11]
ok: [10.10.10.13]

TASK [monitor : Register pg-exporter consul service] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [monitor : Reload pg-exporter consul service] *******************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.10]

TASK [monitor : Config pgbouncer_exporter opts] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [monitor : Config pgbouncer_exporter service] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [monitor : Launch pgbouncer_exporter service] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]

TASK [monitor : Wait for pgbouncer_exporter online] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [monitor : Register pgb-exporter consul service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]

TASK [monitor : Reload pgb-exporter consul service] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [monitor : Copy node_exporter systemd service] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [monitor : Config default node_exporter options] ****************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [monitor : Launch node_exporter service unit] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]

TASK [monitor : Wait for node_exporter online] ***********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [monitor : Register node-exporter service to consul] ************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [monitor : Reload node-exporter consul service] *****************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [service : Make sure haproxy is installed] **********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [service : Create haproxy directory] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]
ok: [10.10.10.11]

TASK [service : Copy haproxy systemd service file] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]

TASK [service : Fetch postgres cluster memberships] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]

TASK [service : Templating /etc/haproxy/haproxy.cfg] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]

TASK [service : Launch haproxy load balancer service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]

TASK [service : Wait for haproxy load balancer online] ***************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.11]
ok: [10.10.10.13]

TASK [service : Reload haproxy load balancer service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.11]

TASK [service : Copy haproxy exporter definition] ********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [service : Copy haproxy service definition] *********************************************************************************************************************************************************
changed: [10.10.10.12] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.10] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.11] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.13] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.10] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.12] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.13] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.11] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.10] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.12] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.11] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.13] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.10] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
changed: [10.10.10.12] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
changed: [10.10.10.11] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
changed: [10.10.10.13] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})

TASK [service : Reload haproxy consul service] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [service : Make sure vip-manager is installed] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.13]
ok: [10.10.10.11]
ok: [10.10.10.12]

TASK [service : Copy vip-manager systemd service file] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]

TASK [service : create vip-manager systemd drop-in dir] **************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]

TASK [service : create vip-manager systemd drop-in file] *************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.11]

TASK [service : Templating /etc/default/vip-manager.yml] *************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]

TASK [service : Launch vip-manager] **********************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]

TASK [service : Fetch postgres cluster memberships] ******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

TASK [service : Render L4 VIP configs] *******************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.10] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.10] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.11] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.10] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
skipping: [10.10.10.11] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.11] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.12] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.11] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
skipping: [10.10.10.12] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.13] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.12] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.13] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.12] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
skipping: [10.10.10.13] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.13] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})

TASK [service : include_tasks] ***************************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]

PLAY RECAP ***********************************************************************************************************************************************************************************************
10.10.10.10                : ok=264  changed=205  unreachable=0    failed=0    skipped=62   rescued=0    ignored=0
10.10.10.11                : ok=182  changed=146  unreachable=0    failed=0    skipped=55   rescued=0    ignored=0
10.10.10.12                : ok=171  changed=135  unreachable=0    failed=0    skipped=66   rescued=0    ignored=0
10.10.10.13                : ok=171  changed=135  unreachable=0    failed=0    skipped=66   rescued=0    ignored=0

烈建议在第一次完成初始化后执行 make cache 命令,该命令会将下载好的软件打为离线缓存包,并放置于files/pkg.tgz中。这样当下一次创建新的pigsty环境时,只要宿主机内操作系统一致,就可以直接复用该离线包,省去大量下载时间。

mon-view

初始化完毕后,您可以通过浏览器访问 http://pigsty 前往监控系统主页。默认的用户名与密码均为admin

如果没有配置DNS,或者没有使用默认的IP地址,也可以直接访问 http://meta_ip_address:3000前往监控系统首页。

$ make mon-view
open -n 'http://g.pigsty/'

8.7 - PG Exporter

PG Exporter参考

Exporter

https://github.com/Vonng/pg_exporter

完全自研的 pg_exporter, 用于收集postgres与pgbouncer的指标:

支持PostgreSQL 9.4 ~ 13版本,Pgbouncer 1.8+版本

几乎所有指标都通过配置文件以SQL的形式获取,完全定制化,提供热重载功能

指标收集器可以根据类似Kubernetes的方式调度执行 (例如只在从库上执行,只在带有tag启动标签的节点执行,只在安装特定扩展的实例上执行等)

带有灵活的指标缓存策略,自动超时取消,最小化监控系统对数据库的性能影响。

提供健康检查,就绪探针,主从角色检查等功能,可用于流量分发

PG Exporter

Prometheus exporter for PostgreSQL metrics. Gives you complete insight on your favourate elephant!

Latest binaries & rpms can be found on release page. Supported pg version: PostgreSQL 9.4+ & Pgbouncer 1.8+. Default collectors definition is compatible with PostgreSQL 10,11,12,13.

Latest pg_exporter version: 0.3.1

Features

  • Support both Postgres & Pgbouncer
  • Flexible: Almost all metrics are defined in customizable configuration files in SQL style.
  • Fine-grained execution control (Tags Filter, Facts Filter, Version Filter, Timeout, Cache, etc…)
  • Dynamic Planning: User could provide multiple branches of a metric queries. Queries matches server version & fact & tag will be actually installed.
  • Configurable caching policy & query timeout
  • Rich metrics about pg_exporter itself.
  • Auto discovery multi database in the same cluster (multiple database scrape TBD)
  • Tested and verified in real world production environment for years (200+ Nodes)
  • Metrics overhelming! Gives you complete insight on your favourate elephant!
  • (Pgbouncer mode is enabled when target dbname is pgbouncer)

性能表现

对于极端场景(几十万张表与几万种查询),一次抓取最多可能耗费秒级的时长。

好在所有指标收集器都是可选关闭的,且pg_exporter 允许为收集器配置主动超时取消(默认100ms)

自监控

Exporter展示了监控系统组件本身的监控指标,包括:

  • Exporter是否存活,Uptime,Exporter每分钟被抓取的次数
  • 每个监控查询的耗时,产生的指标数量与错误数量。

Prometheus的配置

Prometheus的抓取频率建议采用10~15秒,并配置适当的超时。

演示或特殊情况也可以配置的更精细(例如2秒,5秒等)

单Prometheus节点可以支持几百个实例的监控,约几百万个时间序列 (Dell R740 64 Core / 400GB Mem/ 3TB PCI-E SSD)

更大规模的集群可以通过Prometheus级联、联邦或分片实现伸缩。例如为每一个数据库集群部署一个Prometheus,并使用上级Prometheus统筹抓取并计算衍生指标

8.8 - Prometheus服务发现

Prometheus是如何通过静态文件进行服务发现的

当使用 prometheus_sd_method == ‘static’ 的静态文件服务发现模式时,Prometheus会使用静态文件进行服务发现,目标配置文件地址默认为 /etc/prometheus/targets/ 目录中的所有yml文件。

集中式配置

prometheus_sd_target 配置为batch 模式时,Pigsty会采用集中式配置管理Prometheus监控目标。

所有监控对象都定义于单一配置文件:/etc/prometheus/targets/all.yml 中。

#==============================================================#
# File      :   targets/all.yml
# Ctime     :   2021-02-18
# Mtime     :   2021-02-18
# Atime     :   2021-03-01 16:46
# Note      :   Managed by Ansible
# Desc      :   Prometheus Static Monitoring Targets Definition
# Path      :   /etc/prometheus/targets/all.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# static monitor targets, batch version

#======> pg-meta-1 [primary]
- labels: {cls: pg-meta, ins: pg-meta-1, ip: 10.10.10.10, role: primary, svc: pg-meta-primary}
  targets: [10.10.10.10:9630, 10.10.10.10:9100, 10.10.10.10:9631, 10.10.10.10:9101]

#======> pg-test-1 [primary]
- labels: {cls: pg-test, ins: pg-test-1, ip: 10.10.10.11, role: primary, svc: pg-test-primary}
  targets: [10.10.10.11:9630, 10.10.10.11:9100, 10.10.10.11:9631, 10.10.10.11:9101]

#======> pg-test-2 [replica]
- labels: {cls: pg-test, ins: pg-test-2, ip: 10.10.10.12, role: replica, svc: pg-test-replica}
  targets: [10.10.10.12:9630, 10.10.10.12:9100, 10.10.10.12:9631, 10.10.10.12:9101]

#======> pg-test-3 [replica]
- labels: {cls: pg-test, ins: pg-test-3, ip: 10.10.10.13, role: replica, svc: pg-test-replica}
  targets: [10.10.10.13:9630, 10.10.10.13:9100, 10.10.10.13:9631, 10.10.10.13:9101]

分立式配置

prometheus_sd_target 配置为single 模式时,Pigsty会采用分立式配置管理Prometheus监控目标。

每个监控实例,都拥有自己独占的单一配置文件:/etc/prometheus/targets/{{ pg_instance }}.yml 中。

pg-meta-1 实例为例,其配置文件位置为:/etc/prometheus/targets/pg-meta-1.yml,内容为:

# pg-meta-1 [primary]
- labels: {cls: pg-meta, ins: pg-meta-1, ip: 10.10.10.10, role: primary, svc: pg-meta-primary}
  targets: [10.10.10.10:9630, 10.10.10.10:9100, 10.10.10.10:9631, 10.10.10.10:9101]

8.9 - Tuned模板

几种预制的Tuned模板

8.9.1 - OLTP

Tuned OLTP模板

Tuned OLTP模板主要针对延迟进行优化,此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-06-29
# Desc      :   Tune operatiing system to oltp mode
# Path      :   /etc/tuned/oltp/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL OLTP System
include=network-latency

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}

# total shmem size in pages:  $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}

# total shmem segs 4096 -> 8192
kernel.shmmni=8192

# total msg queue number, set to mem size in MB
kernel.msgmni=32768

# max length of message queue
kernel.msgmnb=65536

# max size of message
kernel.msgmax=65536

kernel.pid_max=131072

# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536

# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0

# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000

#-------------------------------------------------------------#
#                             VM                              #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0

# disable when most mem are for file cache
vm.zone_reclaim_mode=0

# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80

# vm.dirty_background_bytes=67108864 # 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_ratio=3       # latency-performance default
vm.dirty_ratio=10                 # latency-performance default

# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536

#-------------------------------------------------------------#
#                        Filesystem                           #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160

# max concurrent unfinished async io, should be larger than 1M.  65536->1M
fs.aio-max-nr=1048576


#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304

# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000

# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000

net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1

# max connection tracking number
net.netfilter.nf_conntrack_max=1048576

8.9.2 - TINY

Tuned TINY模板

Tuned TINY模板主要针对极低配置的虚拟机进行优化,

此模板针对的典型机型是1核/1GB的虚拟机节点。您可以根据自己的实际机型进行调整。

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-06-29
# Desc      :   Tune operatiing system to tiny mode
# Path      :   /etc/tuned/tiny/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL TINY System
# include=virtual-guest

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up.  Workloads
# that mostly use file mappings may be able to use even higher values.
#
# The generator of dirty data starts writeback at this percentage (system default
# is 20%)
vm.dirty_ratio = 40

# Filesystem I/O is usually much more efficient than swapping, so try to keep
# swapping low.  It's usually safe to go even lower than this on systems with
# server-grade storage.
vm.swappiness = 30

#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

8.9.3 - OLAP

Tuned OLAP模板,针对高并行,长查询,高吞吐实例优化

Tuned OLAP模板主要针对吞吐量与计算并行度进行优化

此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-09-18
# Desc      :   Tune operatiing system to olap mode
# Path      :   /etc/tuned/olap/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL OLAP System
include=network-throughput

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}

# total shmem size in pages:  $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}

# total shmem segs 4096 -> 8192
kernel.shmmni=8192

# total msg queue number, set to mem size in MB
kernel.msgmni=32768

# max length of message queue
kernel.msgmnb=65536

# max size of message
kernel.msgmax=65536

kernel.pid_max=131072

# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536

# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0

# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000

#-------------------------------------------------------------#
#                             VM                              #
#-------------------------------------------------------------#
# try not using swap
# vm.swappiness=10

# disable when most mem are for file cache
vm.zone_reclaim_mode=0

# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80

vm.dirty_background_ratio = 10    # throughput-performance default
vm.dirty_ratio=80                 # throughput-performance default 40 -> 80

# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536

#-------------------------------------------------------------#
#                        Filesystem                           #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160

# max concurrent unfinished async io, should be larger than 1M.  65536->1M
fs.aio-max-nr=1048576


#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304

# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000

# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000

net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1

# max connection tracking number
net.netfilter.nf_conntrack_max=1048576

8.9.4 - CRIT

Tuned CRIT模板,针对金融场景、不允许数据丢失错漏的场景进行优化。

Tuned CRIT模板主要针对RPO进行优化,尽可能减少内存中脏数据的量。

此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。

# tuned configuration
#==============================================================#
# File      :   tuned.conf
# Mtime     :   2020-06-29
# Desc      :   Tune operatiing system to crit mode
# Path      :   /etc/tuned/crit/tuned.conf
# Author    :   Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

[main]
summary=Optimize for PostgreSQL CRIT System
include=network-latency

[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
# disable transparent hugepages
transparent_hugepages=never

[sysctl]
#-------------------------------------------------------------#
#                           KERNEL                            #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0

# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}

# total shmem size in pages:  $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}

# total shmem segs 4096 -> 8192
kernel.shmmni=8192

# total msg queue number, set to mem size in MB
kernel.msgmni=32768

# max length of message queue
kernel.msgmnb=65536

# max size of message
kernel.msgmax=65536

kernel.pid_max=131072

# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536

# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0

# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000

#-------------------------------------------------------------#
#                             VM                              #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0

# disable when most mem are for file cache
vm.zone_reclaim_mode=0

# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=100

# 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_bytes=67108864
# vm.dirty_background_ratio=3       # latency-performance default
vm.dirty_ratio=6                    # latency-performance default

# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536

#-------------------------------------------------------------#
#                        Filesystem                           #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160

# max concurrent unfinished async io, should be larger than 1M.  65536->1M
fs.aio-max-nr=1048576


#-------------------------------------------------------------#
#                          Network                            #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304

# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000

# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1

# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"

# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60

net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000

net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1

# max connection tracking number
net.netfilter.nf_conntrack_max=1048576

8.10 - Patroni模板

Pigsty预置的四种Patroni模板

Pigsty使用Patroni管理与初始化Postgres数据库集群。

Pigsty使用Patroni完成供给的主体工作,即使用户选择了无Patroni模式,拉起数据库集群也会由Patroni负责,并在创建完成后移除Patroni组件。

用户可以通过Patroni配置文件,完成大部分的PostgreSQL集群定制工作,Patroni配置文件格式详情请参考 Patroni官方文档

预定义模板

Pigsty提供了四种预定义的初始化模板,初始化模板是用于初始化数据库集群的定义文件,默认位于roles/postgres/templates/。包括:

  • oltp.yml OLTP模板,默认配置,针对生产机型优化延迟与性能。
  • `olap.yml OLAP模板,提高并行度,针对吞吐量,长查询进行优化。
  • crit.yml) 核心业务模板,基于OLTP模板针对RPO、安全性、数据完整性进行优化,启用同步复制与数据校验和。
  • tiny.yml 微型数据库模板,针对低资源场景进行优化,例如运行于虚拟机中的演示数据库集群。

通过 pg_conf 参数指定所需使用的模板路径,如果使用预制模板,则只需填入模板文件名称即可。

如果使用定制的 Patroni配置模板,通常也应当针对机器节点使用配套的 节点优化模板

更详细的配置信息,请参考 PG供给

8.10.1 - OLTP

Patroni OLTP模板

Patroni OLTP模板主要针对延迟进行优化,此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。

#!/usr/bin/env patroni
#==============================================================#
# File      :   patroni.yml
# Ctime     :   2020-04-08
# Mtime     :   2020-12-22
# Desc      :   patroni cluster definition for {{ pg_cluster }} (oltp)
# Path      :   /pg/bin/patroni.yml
# Real Path :   /pg/conf/{{ pg_instance }}.yml
# Link      :   /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note      :   Transactional Database Cluster Template
# Doc       :   https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# OLTP database are optimized for performance, rt latency
# typical spec: 64 Core | 400 GB RAM | PCI-E SSD xTB

---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/          # namespace
scope: {{ pg_cluster }}                 # cluster name
name: {{ pg_instance }}                 # instance name

#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
  level: INFO                           #  NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
  dir: /pg/log/                         #  default log file: /pg/log/patroni.log
  file_size: 100000000                  #  100MB log triggers a log rotate
  # format: '%(asctime)s %(levelname)s: %(message)s'

#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
  host: 127.0.0.1:8500
  consistency: default         # default|consistent|stale
  register_service: true
  service_check_interval: 15s
  service_tags:
    - {{ pg_cluster }}

#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
  listen: 0.0.0.0:{{ patroni_port }}
  connect_address: {{ inventory_hostname }}:{{ patroni_port }}
  authentication:
    verify_client: none                 # none|optional|required
    username: {{ pg_monitor_username }}
    password: '{{ pg_monitor_password }}'

#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
  optional:
    insecure: true
    # cacert: '/path/to/ca/cert'
    # certfile: '/path/to/cert/file'
    # keyfile: '/path/to/key/file'

#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
  nofailover: false
  clonefrom: true
  noloadbalance: false
  nosync: false
{% if pg_upstream is defined %}
  replicatefrom: {{ pg_upstream }}    # clone from another replica rather than primary
{% endif %}

#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
  mode: {{ patroni_watchdog_mode }}
  device: /dev/watchdog
  # safety_margin: 10s

#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:

  #----------------------------------------------------------------------------
  # bootstrap method
  #----------------------------------------------------------------------------
  method: initdb
  # add custom bootstrap method here

  # default bootstrap method: initdb
  initdb:
    - locale: C
    - encoding: UTF8
    # - data-checksums    # enable data-checksum


  #----------------------------------------------------------------------------
  # bootstrap users
  #---------------------------------------------------------------------------
  # additional users which need to be created after initializing new cluster
  # replication user and monitor user are required
  users:
    {{ pg_replication_username }}:
      password: '{{ pg_replication_password }}'
    {{ pg_monitor_username }}:
      password: '{{ pg_monitor_password }}'
    {{ pg_admin_username }}:
      password: '{{ pg_admin_password }}'

  # bootstrap hba, allow local and intranet password access & replication
  # will be overwritten later
  pg_hba:
    - local   all             postgres                                ident
    - local   all             all                                     md5
    - host    all             all            0.0.0.0/0                md5
    - local   replication     postgres                                ident
    - local   replication     all                                     md5
    - host    replication     all            0.0.0.0/0                md5


  #----------------------------------------------------------------------------
  # template
  #---------------------------------------------------------------------------
  # post_init: /pg/bin/pg-init

  #----------------------------------------------------------------------------
  # bootstrap config
  #---------------------------------------------------------------------------
  # this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
  # if will NOT take any effect after cluster bootstrap
  dcs:

{% if pg_role == 'primary' and pg_upstream is defined %}
    #----------------------------------------------------------------------------
    # standby cluster definition
    #---------------------------------------------------------------------------
    standby_cluster:
      host: {{ pg_upstream }}
      port: {{ pg_port }}
      # primary_slot_name: patroni     # must be create manually on upstream server, if specified
      create_replica_methods:
        - basebackup
{% endif %}

    #----------------------------------------------------------------------------
    # important parameters
    #---------------------------------------------------------------------------
    # constraint: ttl >: loop_wait + retry_timeout * 2

    # the number of seconds the loop will sleep. Default value: 10
    # this is patroni check loop interval
    loop_wait: 10

    # the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
    # config this according to your network condition to avoid false-positive failover
    ttl: 30

    # timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
    retry_timeout: 10

    # the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
    # Max RTO: 2 loop wait + master_start_timeout
    master_start_timeout: 10

    # import: candidate will not be promoted if replication lag is higher than this
    # maximum RPO: 1MB
    maximum_lag_on_failover: 1048576

    # The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
    master_stop_timeout: 30

    # turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
    # set to true for RPO mode
    synchronous_mode: false

    # prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
    synchronous_mode_strict: false


    #----------------------------------------------------------------------------
    # postgres parameters
    #---------------------------------------------------------------------------
    postgresql:
      use_slots: true
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true


      parameters:
        #----------------------------------------------------------------------
        # IMPORTANT PARAMETERS
        #----------------------------------------------------------------------
        max_connections: 400                    # 100 -> 400
        superuser_reserved_connections: 10      # reserve 10 connection for su
        max_locks_per_transaction: 128          # 64 -> 128
        max_prepared_transactions: 0            # 0 disable 2PC
        track_commit_timestamp: on              # enabled xact timestamp
        max_worker_processes: 8                 # default 8, set to cpu core
        wal_level: logical                      # logical
        wal_log_hints: on                       # wal log hints to support rewind
        max_wal_senders: 16                     # 10 -> 16
        max_replication_slots: 16               # 10 -> 16
        wal_keep_size: 100GB                    # keep at least 100GB WAL
        password_encryption: md5                # use traditional md5 auth

        #----------------------------------------------------------------------
        # RESOURCE USAGE (except WAL)
        #----------------------------------------------------------------------
        # memory: shared_buffers and maintenance_work_mem will be dynamically set
        shared_buffers: {{ pg_shared_buffers }}
        maintenance_work_mem: {{ pg_maintenance_work_mem }}
        work_mem: 32MB                          # 4MB -> 32MB
        huge_pages: try                         # try huge pages
        temp_file_limit: 100GB                  # 0 -> 100GB
        vacuum_cost_delay: 2ms                  # wait 2ms per 10000 cost
        vacuum_cost_limit: 10000                # 10000 cost each round
        bgwriter_delay: 10ms                    # check dirty page every 10ms
        bgwriter_lru_maxpages: 800              # 100 -> 800
        bgwriter_lru_multiplier: 5.0            # 2.0 -> 5.0  more cushion buffer

        #----------------------------------------------------------------------
        # WAL
        #----------------------------------------------------------------------
        wal_buffers: 16MB                       # max to 16MB
        wal_writer_delay: 20ms                  # wait period
        wal_writer_flush_after: 1MB             # max allowed data loss
        min_wal_size: 100GB                     # at least 100GB WAL
        max_wal_size: 400GB                     # at most 400GB WAL
        commit_delay: 20                        # 200ms -> 20ms, increase speed
        commit_siblings: 10                     # 5 -> 10
        checkpoint_timeout: 60min               # checkpoint 5min -> 1h
        checkpoint_completion_target: 0.95      # 0.5 -> 0.95
        archive_mode: on
        archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'

        #----------------------------------------------------------------------
        # REPLICATION
        #----------------------------------------------------------------------
        # synchronous_standby_names: ''
        vacuum_defer_cleanup_age: 50000         # 0->50000 last 50000 xact changes will not be vacuumed
        promote_trigger_file: promote.signal    # default promote trigger file path
        max_standby_archive_delay: 10min        # max delay before canceling queries when reading WAL from archive;
        max_standby_streaming_delay: 3min       # max delay before canceling queries when reading streaming WAL;
        wal_receiver_status_interval: 1s        # send replies at least this often
        hot_standby_feedback: on                # send info from standby to prevent query conflicts
        wal_receiver_timeout: 60s               # time that receiver waits for
        max_logical_replication_workers: 8      # 4 -> 8
        max_sync_workers_per_subscription: 8    # 4 -> 8

        #----------------------------------------------------------------------
        # QUERY TUNING
        #----------------------------------------------------------------------
        # planner
        # enable_partitionwise_join: on
        random_page_cost: 1.1                   # 4 for HDD, 1.1 for SSD
        effective_cache_size: 320GB             # max mem - shared buffer
        default_statistics_target: 1000         # stat bucket 100 -> 1000

        #----------------------------------------------------------------------
        # REPORTING AND LOGGING
        #----------------------------------------------------------------------
        log_destination: csvlog                 # use standard csv log
        logging_collector: on                   # enable csvlog
        log_directory: log                      # default log dir: /pg/data/log
        # log_filename: 'postgresql-%a.log'     # weekly auto-recycle
        log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
        log_checkpoints: on                     # log checkpoint info
        log_lock_waits: on                      # log lock wait info
        log_replication_commands: on            # log replication info
        log_statement: ddl                      # log ddl change
        log_min_duration_statement: 100         # log slow query (>100ms)

        #----------------------------------------------------------------------
        # STATISTICS
        #----------------------------------------------------------------------
        track_io_timing: on                     # collect io statistics
        track_functions: all                    # track all functions (none|pl|all)
        track_activity_query_size: 8192         # max query length in pg_stat_activity

        #----------------------------------------------------------------------
        # AUTOVACUUM
        #----------------------------------------------------------------------
        log_autovacuum_min_duration: 1s         # log autovacuum activity take more than 1s
        autovacuum_max_workers: 3               # default autovacuum worker 3
        autovacuum_naptime: 1min                # default autovacuum naptime 1min
        autovacuum_vacuum_scale_factor: 0.08    # fraction of table size before vacuum   20% -> 8%
        autovacuum_analyze_scale_factor: 0.04   # fraction of table size before analyze  10% -> 4%
        autovacuum_vacuum_cost_delay: -1        # default vacuum cost delay: same as vacuum_cost_delay
        autovacuum_vacuum_cost_limit: -1        # default vacuum cost limit: same as vacuum_cost_limit
        autovacuum_freeze_max_age: 100000000    # age > 1 billion triggers force vacuum

        #----------------------------------------------------------------------
        # CLIENT
        #----------------------------------------------------------------------
        deadlock_timeout: 50ms                  # 50ms for deadlock
        idle_in_transaction_session_timeout: 10min  # 10min timeout for idle in transaction

        #----------------------------------------------------------------------
        # CUSTOMIZED OPTIONS
        #----------------------------------------------------------------------
        # extensions
        shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'

        # auto_explain
        auto_explain.log_min_duration: 1s       # auto explain query slower than 1s
        auto_explain.log_analyze: true          # explain analyze
        auto_explain.log_verbose: true          # explain verbose
        auto_explain.log_timing: true           # explain timing
        auto_explain.log_nested_statements: true

        # pg_stat_statements
        pg_stat_statements.max: 10000           # 5000 -> 10000 queries
        pg_stat_statements.track: all           # track all statements (all|top|none)
        pg_stat_statements.track_utility: off   # do not track query other than CRUD
        pg_stat_statements.track_planning: off  # do not track planning metrics


#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:

  #----------------------------------------------------------------------------
  # how to connect to postgres
  #----------------------------------------------------------------------------
  bin_dir: {{ pg_bin_dir }}
  data_dir: {{ pg_data }}
  config_dir: {{ pg_data }}
  pgpass: {{ pg_dbsu_home }}/.pgpass
  listen: {{ pg_listen }}:{{ pg_port }}
  connect_address: {{ inventory_hostname }}:{{ pg_port }}
  use_unix_socket: true # default: /var/run/postgresql, /tmp

  #----------------------------------------------------------------------------
  # who to connect to postgres
  #----------------------------------------------------------------------------
  authentication:
    superuser:
      username: {{ pg_dbsu }}
    replication:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'
    rewind:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'

  #----------------------------------------------------------------------------
  # how to react to database operations
  #----------------------------------------------------------------------------
  # event callback script log: /pg/log/callback.log
  callbacks:
    on_start: /pg/bin/pg-failover-callback
    on_stop: /pg/bin/pg-failover-callback
    on_reload: /pg/bin/pg-failover-callback
    on_restart: /pg/bin/pg-failover-callback
    on_role_change: /pg/bin/pg-failover-callback

  # rewind policy: data checksum should be enabled before using rewind
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: false

  #----------------------------------------------------------------------------
  # how to create replica
  #----------------------------------------------------------------------------
  # create replica method: default pg_basebackup
  create_replica_methods:
    - basebackup
  basebackup:
    - max-rate: '1000M'
    - checkpoint: fast
    - status-interva: 1s
    - verbose
    - progress

  #----------------------------------------------------------------------------
  # ad hoc parameters (overwrite with default)
  #----------------------------------------------------------------------------
  # parameters:

  #----------------------------------------------------------------------------
  # host based authentication, overwrite default pg_hba.conf
  #----------------------------------------------------------------------------
  # pg_hba:
  #   - local   all             postgres                                ident
  #   - local   all             all                                     md5
  #   - host    all             all            0.0.0.0/0                md5
  #   - local   replication     postgres                                ident
  #   - local   replication     all                                     md5
  #   - host    replication     all            0.0.0.0/0                md5

...

8.10.2 - TINY

Patroni TINY模板

Patroni TINY模板主要针对极低配置的虚拟机进行优化,

此模板针对的典型机型是1核/1GB的虚拟机节点。您可以根据自己的实际机型进行调整。

#!/usr/bin/env patroni
#==============================================================#
# File      :   patroni.yml
# Ctime     :   2020-04-08
# Mtime     :   2020-12-22
# Desc      :   patroni cluster definition for {{ pg_cluster }} (tiny)
# Path      :   /pg/bin/patroni.yml
# Real Path :   /pg/conf/{{ pg_instance }}.yml
# Link      :   /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note      :   Tiny Database Cluster Template
# Doc       :   https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# TINY database are optimized for low-resource situation (e.g 1 Core 1G)
# typical spec: 1 Core | 1-4 GB RAM | Normal SSD  10x GB

---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/          # namespace
scope: {{ pg_cluster }}                 # cluster name
name: {{ pg_instance }}                 # instance name

#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
  level: INFO                           #  NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
  dir: /pg/log/                         #  default log file: /pg/log/patroni.log
  file_size: 100000000                  #  100MB log triggers a log rotate
  # format: '%(asctime)s %(levelname)s: %(message)s'

#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
  host: 127.0.0.1:8500
  consistency: default         # default|consistent|stale
  register_service: true
  service_check_interval: 15s
  service_tags:
    - {{ pg_cluster }}


#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
  listen: 0.0.0.0:{{ patroni_port }}
  connect_address: {{ inventory_hostname }}:{{ patroni_port }}
  authentication:
    verify_client: none                 # none|optional|required
    username: {{ pg_monitor_username }}
    password: '{{ pg_monitor_password }}'


#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
  optional:
    insecure: true
    # cacert: '/path/to/ca/cert'
    # certfile: '/path/to/cert/file'
    # keyfile: '/path/to/key/file'

#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
  nofailover: false
  clonefrom: true
  noloadbalance: false
  nosync: false
{% if pg_upstream is defined %}
  replicatefrom: {{ pg_upstream }}    # clone from another replica rather than primary
{% endif %}

#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
  mode: {{ patroni_watchdog_mode }}
  device: /dev/watchdog
  # safety_margin: 10s

#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:

  #----------------------------------------------------------------------------
  # bootstrap method
  #----------------------------------------------------------------------------
  method: initdb
  # add custom bootstrap method here

  # default bootstrap method: initdb
  initdb:
    - locale: C
    - encoding: UTF8
    - data-checksums    # enable data-checksum


  #----------------------------------------------------------------------------
  # bootstrap users
  #---------------------------------------------------------------------------
  # additional users which need to be created after initializing new cluster
  # replication user and monitor user are required
  users:
    {{ pg_replication_username }}:
      password: '{{ pg_replication_password }}'
    {{ pg_monitor_username }}:
      password: '{{ pg_monitor_password }}'

  # bootstrap hba, allow local and intranet password access & replication
  # will be overwritten later
  pg_hba:
    - local   all             postgres                                ident
    - local   all             all                                     md5
    - host    all             all            0.0.0.0/0                md5
    - local   replication     postgres                                ident
    - local   replication     all                                     md5
    - host    replication     all            0.0.0.0/0                md5


  #----------------------------------------------------------------------------
  # customization
  #---------------------------------------------------------------------------
  # post_init: /pg/bin/pg-init

  #----------------------------------------------------------------------------
  # bootstrap config
  #---------------------------------------------------------------------------
  # this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
  # if will NOT take any effect after cluster bootstrap
  dcs:

{% if pg_role == 'primary' and pg_upstream is defined %}
    #----------------------------------------------------------------------------
    # standby cluster definition
    #---------------------------------------------------------------------------
    standby_cluster:
      host: {{ pg_upstream }}
      port: {{ pg_port }}
      # primary_slot_name: patroni     # must be create manually on upstream server, if specified
      create_replica_methods:
        - basebackup
{% endif %}

    #----------------------------------------------------------------------------
    # important parameters
    #---------------------------------------------------------------------------
    # constraint: ttl >: loop_wait + retry_timeout * 2

    # the number of seconds the loop will sleep. Default value: 10
    # this is patroni check loop interval
    loop_wait: 10

    # the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
    # config this according to your network condition to avoid false-positive failover
    ttl: 30

    # timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
    retry_timeout: 10

    # the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
    # Max RTO: 2 loop wait + master_start_timeout
    master_start_timeout: 10

    # import: candidate will not be promoted if replication lag is higher than this
    # maximum RPO: 1MB
    maximum_lag_on_failover: 1048576

    # The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
    master_stop_timeout: 30

    # turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
    # set to true for RPO mode
    synchronous_mode: false

    # prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
    synchronous_mode_strict: false


    #----------------------------------------------------------------------------
    # postgres parameters
    #---------------------------------------------------------------------------
    postgresql:
      use_slots: true
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true


      parameters:
        #----------------------------------------------------------------------
        # IMPORTANT PARAMETERS
        #----------------------------------------------------------------------
        max_connections: 50                     # default 100 -> 50
        superuser_reserved_connections: 10      # reserve 10 connection for su
        max_locks_per_transaction: 64           # default 64
        max_prepared_transactions: 0            # 0 disable 2PC
        track_commit_timestamp: on              # enabled xact timestamp
        max_worker_processes: 1                 # default 8 -> 1 (set to cpu core)
        wal_level: logical                      # logical
        wal_log_hints: on                       # wal log hints to support rewind
        max_wal_senders: 10                     # default 10
        max_replication_slots: 10               # default 10
        wal_keep_size: 1GB                      # keep at least 1GB WAL
        password_encryption: md5                # use traditional md5 auth

        #----------------------------------------------------------------------
        # RESOURCE USAGE (except WAL)
        #----------------------------------------------------------------------
        # memory: shared_buffers and maintenance_work_mem will be dynamically set
        shared_buffers: {{ pg_shared_buffers }}
        maintenance_work_mem: {{ pg_maintenance_work_mem }}
        work_mem: 4MB                           # default 4MB
        huge_pages: try                         # try huge pages
        temp_file_limit: 40GB                   # 0 -> 40GB (according to your disk)
        vacuum_cost_delay: 5ms                  # wait 5ms per 10000 cost
        vacuum_cost_limit: 10000                # 10000 cost each round
        bgwriter_delay: 10ms                    # check dirty page every 10ms
        bgwriter_lru_maxpages: 800              # 100 -> 800
        bgwriter_lru_multiplier: 5.0            # 2.0 -> 5.0  more cushion buffer

        #----------------------------------------------------------------------
        # WAL
        #----------------------------------------------------------------------
        wal_buffers: 16MB                       # max to 16MB
        wal_writer_delay: 20ms                  # wait period
        wal_writer_flush_after: 1MB             # max allowed data loss
        min_wal_size: 100GB                     # at least 100GB WAL
        max_wal_size: 400GB                     # at most 400GB WAL
        commit_delay: 20                        # 200ms -> 20ms, increase speed
        commit_siblings: 10                     # 5 -> 10
        checkpoint_timeout: 15min               # checkpoint 5min -> 15min
        checkpoint_completion_target: 0.80      # 0.5 -> 0.8
        archive_mode: on
        archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'

        #----------------------------------------------------------------------
        # REPLICATION
        #----------------------------------------------------------------------
        # synchronous_standby_names: ''
        vacuum_defer_cleanup_age: 50000         # 0->50000 last 50000 xact changes will not be vacuumed
        promote_trigger_file: promote.signal    # default promote trigger file path
        max_standby_archive_delay: 10min        # max delay before canceling queries when reading WAL from archive;
        max_standby_streaming_delay: 3min       # max delay before canceling queries when reading streaming WAL;
        wal_receiver_status_interval: 1s        # send replies at least this often
        hot_standby_feedback: on                # send info from standby to prevent query conflicts
        wal_receiver_timeout: 60s               # time that receiver waits for
        max_logical_replication_workers: 8      # 4 -> 2 (set according to your cpu core)
        max_sync_workers_per_subscription: 8    # 4 -> 2

        #----------------------------------------------------------------------
        # QUERY TUNING
        #----------------------------------------------------------------------
        # planner
        # enable_partitionwise_join: on
        random_page_cost: 1.1                   # 4 for HDD, 1.1 for SSD
        effective_cache_size: 2GB               # max mem - shared buffer
        default_statistics_target: 200          # stat bucket 100 -> 200

        #----------------------------------------------------------------------
        # REPORTING AND LOGGING
        #----------------------------------------------------------------------
        log_destination: csvlog                 # use standard csv log
        logging_collector: on                   # enable csvlog
        log_directory: log                      # default log dir: /pg/data/log
        # log_filename: 'postgresql-%a.log'     # weekly auto-recycle
        log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
        log_checkpoints: on                     # log checkpoint info
        log_lock_waits: on                      # log lock wait info
        log_replication_commands: on            # log replication info
        log_statement: ddl                      # log ddl change
        log_min_duration_statement: 100         # log slow query (>100ms)

        #----------------------------------------------------------------------
        # STATISTICS
        #----------------------------------------------------------------------
        track_io_timing: on                     # collect io statistics
        track_functions: all                    # track all functions (none|pl|all)
        track_activity_query_size: 8192         # max query length in pg_stat_activity

        #----------------------------------------------------------------------
        # AUTOVACUUM
        #----------------------------------------------------------------------
        log_autovacuum_min_duration: 1s         # log autovacuum activity take more than 1s
        autovacuum_max_workers: 1               # default autovacuum worker 3 -> 1
        autovacuum_naptime: 1min                # default autovacuum naptime 1min
        autovacuum_vacuum_scale_factor: 0.08    # fraction of table size before vacuum   20% -> 8%
        autovacuum_analyze_scale_factor: 0.04   # fraction of table size before analyze  10% -> 4%
        autovacuum_vacuum_cost_delay: -1        # default vacuum cost delay: same as vacuum_cost_delay
        autovacuum_vacuum_cost_limit: -1        # default vacuum cost limit: same as vacuum_cost_limit
        autovacuum_freeze_max_age: 100000000    # age > 1 billion triggers force vacuum

        #----------------------------------------------------------------------
        # CLIENT
        #----------------------------------------------------------------------
        deadlock_timeout: 50ms                  # 50ms for deadlock
        idle_in_transaction_session_timeout: 10min  # 10min timeout for idle in transaction

        #----------------------------------------------------------------------
        # CUSTOMIZED OPTIONS
        #----------------------------------------------------------------------
        # extensions
        shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'

        # auto_explain
        auto_explain.log_min_duration: 1s       # auto explain query slower than 1s
        auto_explain.log_analyze: true          # explain analyze
        auto_explain.log_verbose: true          # explain verbose
        auto_explain.log_timing: true           # explain timing
        auto_explain.log_nested_statements: true

        # pg_stat_statements
        pg_stat_statements.max: 3000            # 5000 -> 3000 queries
        pg_stat_statements.track: all           # track all statements (all|top|none)
        pg_stat_statements.track_utility: off   # do not track query other than CRUD
        pg_stat_statements.track_planning: off  # do not track planning metrics


#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:

  #----------------------------------------------------------------------------
  # how to connect to postgres
  #----------------------------------------------------------------------------
  bin_dir: {{ pg_bin_dir }}
  data_dir: {{ pg_data }}
  config_dir: {{ pg_data }}
  pgpass: {{ pg_dbsu_home }}/.pgpass
  listen: {{ pg_listen }}:{{ pg_port }}
  connect_address: {{ inventory_hostname }}:{{ pg_port }}
  use_unix_socket: true # default: /var/run/postgresql, /tmp

  #----------------------------------------------------------------------------
  # who to connect to postgres
  #----------------------------------------------------------------------------
  authentication:
    superuser:
      username: {{ pg_dbsu }}
    replication:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'
    rewind:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'

  #----------------------------------------------------------------------------
  # how to react to database operations
  #----------------------------------------------------------------------------
  # event callback script log: /pg/log/callback.log
  callbacks:
    on_start: /pg/bin/pg-failover-callback
    on_stop: /pg/bin/pg-failover-callback
    on_reload: /pg/bin/pg-failover-callback
    on_restart: /pg/bin/pg-failover-callback
    on_role_change: /pg/bin/pg-failover-callback

  # rewind policy: data checksum should be enabled before using rewind
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: false

  #----------------------------------------------------------------------------
  # how to create replica
  #----------------------------------------------------------------------------
  # create replica method: default pg_basebackup
  create_replica_methods:
    - basebackup
  basebackup:
    - max-rate: '1000M'
    - checkpoint: fast
    - status-interva: 1s
    - verbose
    - progress

  #----------------------------------------------------------------------------
  # ad hoc parameters (overwrite with default)
  #----------------------------------------------------------------------------
  # parameters:

  #----------------------------------------------------------------------------
  # host based authentication, overwrite default pg_hba.conf
  #----------------------------------------------------------------------------
  # pg_hba:
  #   - local   all             postgres                                ident
  #   - local   all             all                                     md5
  #   - host    all             all            0.0.0.0/0                md5
  #   - local   replication     postgres                                ident
  #   - local   replication     all                                     md5
  #   - host    replication     all            0.0.0.0/0                md5

...

8.10.3 - OLAP

Patroni OLAP模板,针对高并行,长查询,高吞吐实例优化

Patroni OLAP模板主要针对吞吐量与计算并行度进行优化

此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。

#!/usr/bin/env patroni
#==============================================================#
# File      :   patroni.yml
# Ctime     :   2020-04-08
# Mtime     :   2020-12-22
# Desc      :   patroni cluster definition for {{ pg_cluster }} (olap)
# Path      :   /pg/bin/patroni.yml
# Real Path :   /pg/conf/{{ pg_instance }}.yml
# Link      :   /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note      :   Analysis Database Cluster Template
# Doc       :   https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# OLTP database are optimized for throughput
# typical spec: 64 Core | 400 GB RAM | PCI-E SSD xTB

---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/          # namespace
scope: {{ pg_cluster }}                 # cluster name
name: {{ pg_instance }}                 # instance name

#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
  level: INFO                           #  NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
  dir: /pg/log/                         #  default log file: /pg/log/patroni.log
  file_size: 100000000                  #  100MB log triggers a log rotate
  # format: '%(asctime)s %(levelname)s: %(message)s'

#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
  host: 127.0.0.1:8500
  consistency: default         # default|consistent|stale
  register_service: true
  service_check_interval: 15s
  service_tags:
    - {{ pg_cluster }}

#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
  listen: 0.0.0.0:{{ patroni_port }}
  connect_address: {{ inventory_hostname }}:{{ patroni_port }}
  authentication:
    verify_client: none                 # none|optional|required
    username: {{ pg_monitor_username }}
    password: '{{ pg_monitor_password }}'

#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
  optional:
    insecure: true
    # cacert: '/path/to/ca/cert'
    # certfile: '/path/to/cert/file'
    # keyfile: '/path/to/key/file'

#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
  nofailover: false
  clonefrom: true
  noloadbalance: false
  nosync: false
{% if pg_upstream is defined %}
  replicatefrom: {{ pg_upstream }}    # clone from another replica rather than primary
{% endif %}

#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
  mode: {{ patroni_watchdog_mode }}
  device: /dev/watchdog
  # safety_margin: 10s

#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:

  #----------------------------------------------------------------------------
  # bootstrap method
  #----------------------------------------------------------------------------
  method: initdb
  # add custom bootstrap method here

  # default bootstrap method: initdb
  initdb:
    - locale: C
    - encoding: UTF8
    # - data-checksums    # enable data-checksum


  #----------------------------------------------------------------------------
  # bootstrap users
  #---------------------------------------------------------------------------
  # additional users which need to be created after initializing new cluster
  # replication user and monitor user are required
  users:
    {{ pg_replication_username }}:
      password: '{{ pg_replication_password }}'
    {{ pg_monitor_username }}:
      password: '{{ pg_monitor_password }}'
    {{ pg_admin_username }}:
      password: '{{ pg_admin_password }}'

  # bootstrap hba, allow local and intranet password access & replication
  # will be overwritten later
  pg_hba:
    - local   all             postgres                                ident
    - local   all             all                                     md5
    - host    all             all            0.0.0.0/0                md5
    - local   replication     postgres                                ident
    - local   replication     all                                     md5
    - host    replication     all            0.0.0.0/0                md5


  #----------------------------------------------------------------------------
  # template
  #---------------------------------------------------------------------------
  # post_init: /pg/bin/pg-init

  #----------------------------------------------------------------------------
  # bootstrap config
  #---------------------------------------------------------------------------
  # this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
  # if will NOT take any effect after cluster bootstrap
  dcs:

{% if pg_role == 'primary' and pg_upstream is defined %}
    #----------------------------------------------------------------------------
    # standby cluster definition
    #---------------------------------------------------------------------------
    standby_cluster:
      host: {{ pg_upstream }}
      port: {{ pg_port }}
      # primary_slot_name: patroni     # must be create manually on upstream server, if specified
      create_replica_methods:
        - basebackup
{% endif %}

    #----------------------------------------------------------------------------
    # important parameters
    #---------------------------------------------------------------------------
    # constraint: ttl >: loop_wait + retry_timeout * 2

    # the number of seconds the loop will sleep. Default value: 10
    # this is patroni check loop interval
    loop_wait: 10

    # the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
    # config this according to your network condition to avoid false-positive failover
    ttl: 30

    # timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
    retry_timeout: 10

    # the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
    # Max RTO: 2 loop wait + master_start_timeout
    master_start_timeout: 10

    # import: candidate will not be promoted if replication lag is higher than this
    # maximum RPO: 16MB (analysis tolerate more data loss)
    maximum_lag_on_failover: 16777216

    # The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
    master_stop_timeout: 30

    # turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
    # set to true for RPO mode
    synchronous_mode: false

    # prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
    synchronous_mode_strict: false


    #----------------------------------------------------------------------------
    # postgres parameters
    #---------------------------------------------------------------------------
    postgresql:
      use_slots: true
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true

      parameters:
        #----------------------------------------------------------------------
        # IMPORTANT PARAMETERS
        #----------------------------------------------------------------------
        max_connections: 400                    # 100 -> 400
        superuser_reserved_connections: 10      # reserve 10 connection for su
        max_locks_per_transaction: 256          # 64 -> 256 (analysis)
        max_prepared_transactions: 0            # 0 disable 2PC
        track_commit_timestamp: on              # enabled xact timestamp
        max_worker_processes: 64                # default 8 -> 64, SET THIS ACCORDING TO YOUR CPU CORES
        wal_level: logical                      # logical
        wal_log_hints: on                       # wal log hints to support rewind
        max_wal_senders: 16                     # 10 -> 16
        max_replication_slots: 16               # 10 -> 16
        wal_keep_size: 100GB                    # keep at least 100GB WAL
        password_encryption: md5                # use traditional md5 auth

        #----------------------------------------------------------------------
        # RESOURCE USAGE (except WAL)
        #----------------------------------------------------------------------
        # memory: shared_buffers and maintenance_work_mem will be dynamically set
        shared_buffers: {{ pg_shared_buffers }}
        maintenance_work_mem: {{ pg_maintenance_work_mem }}
        work_mem: 128MB                         # 4MB -> 128MB (analysis)
        huge_pages: try                         # try huge pages
        temp_file_limit: 500GB                  # 0 -> 500GB (analysis)
        vacuum_cost_delay: 2ms                  # wait 2ms per 10000 cost
        vacuum_cost_limit: 10000                # 10000 cost each round
        bgwriter_delay: 10ms                    # check dirty page every 10ms
        bgwriter_lru_maxpages: 1600             # 100 -> 1600 (analysis)
        bgwriter_lru_multiplier: 5.0            # 2.0 -> 5.0  more cushion buffer
        max_parallel_workers: 64                # SET THIS ACCORDING TO YOUR CPU CORES
        max_parallel_workers_per_gather: 64     # SET THIS ACCORDING TO YOUR CPU CORES
        max_parallel_maintenance_workers: 4     # 2 -> 4

        #----------------------------------------------------------------------
        # WAL
        #----------------------------------------------------------------------
        wal_buffers: 16MB                       # max to 16MB
        wal_writer_delay: 20ms                  # wait period
        wal_writer_flush_after: 16MB            # max allowed data loss (analysis)
        min_wal_size: 100GB                     # at least 100GB WAL
        max_wal_size: 400GB                     # at most 400GB WAL
        commit_delay: 20                        # 200ms -> 20ms, increase speed
        commit_siblings: 10                     # 5 -> 10
        checkpoint_timeout: 60min               # checkpoint 5min -> 1h
        checkpoint_completion_target: 0.95      # 0.5 -> 0.95
        archive_mode: on
        archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'

        #----------------------------------------------------------------------
        # REPLICATION
        #----------------------------------------------------------------------
        # synchronous_standby_names: ''
        vacuum_defer_cleanup_age: 0             # 0 (default)
        promote_trigger_file: promote.signal    # default promote trigger file path
        max_standby_archive_delay: 10min        # max delay before canceling queries when reading WAL from archive;
        max_standby_streaming_delay: 3min       # max delay before canceling queries when reading streaming WAL;
        wal_receiver_status_interval: 1s        # send replies at least this often
        hot_standby_feedback: on                # send info from standby to prevent query conflicts
        wal_receiver_timeout: 60s               # time that receiver waits for
        max_logical_replication_workers: 8      # 4 -> 8
        max_sync_workers_per_subscription: 8    # 4 -> 8

        #----------------------------------------------------------------------
        # QUERY TUNING
        #----------------------------------------------------------------------
        # planner
        enable_partitionwise_join: on           # enable on analysis
        random_page_cost: 1.1                   # 4 for HDD, 1.1 for SSD
        effective_cache_size: 320GB             # max mem - shared buffer
        default_statistics_target: 1000         # stat bucket 100 -> 1000
        jit: on                                 # default on
        jit_above_cost: 100000                  # default jit threshold

        #----------------------------------------------------------------------
        # REPORTING AND LOGGING
        #----------------------------------------------------------------------
        log_destination: csvlog                 # use standard csv log
        logging_collector: on                   # enable csvlog
        log_directory: log                      # default log dir: /pg/data/log
        # log_filename: 'postgresql-%a.log'     # weekly auto-recycle
        log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
        log_checkpoints: on                     # log checkpoint info
        log_lock_waits: on                      # log lock wait info
        log_replication_commands: on            # log replication info
        log_statement: ddl                      # log ddl change
        log_min_duration_statement: 1000         # log slow query (>1s)

        #----------------------------------------------------------------------
        # STATISTICS
        #----------------------------------------------------------------------
        track_io_timing: on                     # collect io statistics
        track_functions: all                    # track all functions (none|pl|all)
        track_activity_query_size: 8192         # max query length in pg_stat_activity

        #----------------------------------------------------------------------
        # AUTOVACUUM
        #----------------------------------------------------------------------
        log_autovacuum_min_duration: 1s         # log autovacuum activity take more than 1s
        autovacuum_max_workers: 3               # default autovacuum worker 3
        autovacuum_naptime: 1min                # default autovacuum naptime 1min
        autovacuum_vacuum_scale_factor: 0.08    # fraction of table size before vacuum   20% -> 8%
        autovacuum_analyze_scale_factor: 0.04   # fraction of table size before analyze  10% -> 4%
        autovacuum_vacuum_cost_delay: -1        # default vacuum cost delay: same as vacuum_cost_delay
        autovacuum_vacuum_cost_limit: -1        # default vacuum cost limit: same as vacuum_cost_limit
        autovacuum_freeze_max_age: 100000000    # age > 1 billion triggers force vacuum

        #----------------------------------------------------------------------
        # CLIENT
        #----------------------------------------------------------------------
        deadlock_timeout: 50ms                  # 50ms for deadlock
        idle_in_transaction_session_timeout: 0  # Disable idle in xact timeout in analysis database

        #----------------------------------------------------------------------
        # CUSTOMIZED OPTIONS
        #----------------------------------------------------------------------
        # extensions
        shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'

        # auto_explain
        auto_explain.log_min_duration: 1s       # auto explain query slower than 1s
        auto_explain.log_analyze: true          # explain analyze
        auto_explain.log_verbose: true          # explain verbose
        auto_explain.log_timing: true           # explain timing
        auto_explain.log_nested_statements: true

        # pg_stat_statements
        pg_stat_statements.max: 10000           # 5000 -> 10000 queries
        pg_stat_statements.track: all           # track all statements (all|top|none)
        pg_stat_statements.track_utility: off   # do not track query other than CRUD
        pg_stat_statements.track_planning: off  # do not track planning metrics


#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:

  #----------------------------------------------------------------------------
  # how to connect to postgres
  #----------------------------------------------------------------------------
  bin_dir: {{ pg_bin_dir }}
  data_dir: {{ pg_data }}
  config_dir: {{ pg_data }}
  pgpass: {{ pg_dbsu_home }}/.pgpass
  listen: {{ pg_listen }}:{{ pg_port }}
  connect_address: {{ inventory_hostname }}:{{ pg_port }}
  use_unix_socket: true # default: /var/run/postgresql, /tmp

  #----------------------------------------------------------------------------
  # who to connect to postgres
  #----------------------------------------------------------------------------
  authentication:
    superuser:
      username: {{ pg_dbsu }}
    replication:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'
    rewind:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'

  #----------------------------------------------------------------------------
  # how to react to database operations
  #----------------------------------------------------------------------------
  # event callback script log: /pg/log/callback.log
  callbacks:
    on_start: /pg/bin/pg-failover-callback
    on_stop: /pg/bin/pg-failover-callback
    on_reload: /pg/bin/pg-failover-callback
    on_restart: /pg/bin/pg-failover-callback
    on_role_change: /pg/bin/pg-failover-callback

  # rewind policy: data checksum should be enabled before using rewind
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: false

  #----------------------------------------------------------------------------
  # how to create replica
  #----------------------------------------------------------------------------
  # create replica method: default pg_basebackup
  create_replica_methods:
    - basebackup
  basebackup:
    - max-rate: '1000M'
    - checkpoint: fast
    - status-interva: 1s
    - verbose
    - progress

  #----------------------------------------------------------------------------
  # ad hoc parameters (overwrite with default)
  #----------------------------------------------------------------------------
  # parameters:

  #----------------------------------------------------------------------------
  # host based authentication, overwrite default pg_hba.conf
  #----------------------------------------------------------------------------
  # pg_hba:
  #   - local   all             postgres                                ident
  #   - local   all             all                                     md5
  #   - host    all             all            0.0.0.0/0                md5
  #   - local   replication     postgres                                ident
  #   - local   replication     all                                     md5
  #   - host    replication     all            0.0.0.0/0                md5

...

8.10.4 - CRIT

Patroni CRIT模板,针对金融场景、不允许数据丢失错漏的场景进行优化。

Patroni CRIT模板主要针对RPO进行优化,采用同步复制,发生故障时确保不会有数据丢失。

此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。用户可以根据自己的实际机型进行调整。

#!/usr/bin/env patroni
#==============================================================#
# File      :   patroni.yml
# Ctime     :   2020-04-08
# Mtime     :   2020-12-22
# Desc      :   patroni cluster definition for {{ pg_cluster }} (crit)
# Path      :   /pg/bin/patroni.yml
# Real Path :   /pg/conf/{{ pg_instance }}.yml
# Link      :   /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note      :   Critical Database Cluster Template
# Doc       :   https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#

# CRIT database are optimized for security, integrity, RPO
# typical spec: 64 Core | 400 GB RAM | PCI-E SSD xTB

---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/          # namespace
scope: {{ pg_cluster }}                 # cluster name
name: {{ pg_instance }}                 # instance name

#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
  level: INFO                           #  NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
  dir: /pg/log/                         #  default log file: /pg/log/patroni.log
  file_size: 100000000                  #  100MB log triggers a log rotate
  # format: '%(asctime)s %(levelname)s: %(message)s'

#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
  host: 127.0.0.1:8500
  consistency: default         # default|consistent|stale
  register_service: true
  service_check_interval: 15s
  service_tags:
    - {{ pg_cluster }}

#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
  listen: 0.0.0.0:{{ patroni_port }}
  connect_address: {{ inventory_hostname }}:{{ patroni_port }}
  authentication:
    verify_client: none                 # none|optional|required
    username: {{ pg_monitor_username }}
    password: '{{ pg_monitor_password }}'

#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
  optional:
    insecure: true
    # cacert: '/path/to/ca/cert'
    # certfile: '/path/to/cert/file'
    # keyfile: '/path/to/key/file'

#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
  nofailover: false
  clonefrom: true
  noloadbalance: false
  nosync: false
{% if pg_upstream is defined %}
  replicatefrom: {{ pg_upstream }}    # clone from another replica rather than primary
{% endif %}

#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
  mode: {{ patroni_watchdog_mode }}
  device: /dev/watchdog
  # safety_margin: 10s

#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:

  #----------------------------------------------------------------------------
  # bootstrap method
  #----------------------------------------------------------------------------
  method: initdb
  # add custom bootstrap method here

  # default bootstrap method: initdb
  initdb:
    - locale: C
    - encoding: UTF8
    # - data-checksums    # enable data-checksum


  #----------------------------------------------------------------------------
  # bootstrap users
  #---------------------------------------------------------------------------
  # additional users which need to be created after initializing new cluster
  # replication user and monitor user are required
  users:
    {{ pg_replication_username }}:
      password: '{{ pg_replication_password }}'
    {{ pg_monitor_username }}:
      password: '{{ pg_monitor_password }}'
    {{ pg_admin_username }}:
      password: '{{ pg_admin_password }}'

  # bootstrap hba, allow local and intranet password access & replication
  # will be overwritten later
  pg_hba:
    - local   all             postgres                                ident
    - local   all             all                                     md5
    - host    all             all            0.0.0.0/0                md5
    - local   replication     postgres                                ident
    - local   replication     all                                     md5
    - host    replication     all            0.0.0.0/0                md5


  #----------------------------------------------------------------------------
  # template
  #---------------------------------------------------------------------------
  # post_init: /pg/bin/pg-init

  #----------------------------------------------------------------------------
  # bootstrap config
  #---------------------------------------------------------------------------
  # this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
  # if will NOT take any effect after cluster bootstrap
  dcs:

{% if pg_role == 'primary' and pg_upstream is defined %}
    #----------------------------------------------------------------------------
    # standby cluster definition
    #---------------------------------------------------------------------------
    standby_cluster:
      host: {{ pg_upstream }}
      port: {{ pg_port }}
      # primary_slot_name: patroni     # must be create manually on upstream server, if specified
      create_replica_methods:
        - basebackup
{% endif %}

    #----------------------------------------------------------------------------
    # important parameters
    #---------------------------------------------------------------------------
    # constraint: ttl >: loop_wait + retry_timeout * 2

    # the number of seconds the loop will sleep. Default value: 10
    # this is patroni check loop interval
    loop_wait: 10

    # the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
    # config this according to your network condition to avoid false-positive failover
    ttl: 30

    # timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
    retry_timeout: 10

    # the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
    # Max RTO: 2 loop wait + master_start_timeout
    master_start_timeout: 120   # more patient on critical database

    # import: candidate will not be promoted if replication lag is higher than this
    # maximum RPO: 0 for critical database
    maximum_lag_on_failover: 1

    # The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
    master_stop_timeout: 10   # more patient on critical database

    # turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
    # set to true for RPO mode
    synchronous_mode: true  # use sync replication on critical database

    # prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
    synchronous_mode_strict: false


    #----------------------------------------------------------------------------
    # postgres parameters
    #---------------------------------------------------------------------------
    postgresql:
      use_slots: true
      use_pg_rewind: true
      remove_data_directory_on_rewind_failure: true


      parameters:
        #----------------------------------------------------------------------
        # IMPORTANT PARAMETERS
        #----------------------------------------------------------------------
        max_connections: 400                    # 100 -> 400
        superuser_reserved_connections: 10      # reserve 10 connection for su
        max_locks_per_transaction: 128          # 64 -> 128
        max_prepared_transactions: 0            # 0 disable 2PC
        track_commit_timestamp: on              # enabled xact timestamp
        max_worker_processes: 8                 # default 8, set to cpu core
        wal_level: logical                      # logical
        wal_log_hints: on                       # wal log hints to support rewind
        max_wal_senders: 16                     # 10 -> 16
        max_replication_slots: 16               # 10 -> 16
        wal_keep_size: 100GB                    # keep at least 100GB WAL
        password_encryption: md5                # use traditional md5 auth

        #----------------------------------------------------------------------
        # RESOURCE USAGE (except WAL)
        #----------------------------------------------------------------------
        # memory: shared_buffers and maintenance_work_mem will be dynamically set
        shared_buffers: {{ pg_shared_buffers }}
        maintenance_work_mem: {{ pg_maintenance_work_mem }}
        work_mem: 32MB                          # 4MB -> 32MB
        huge_pages: try                         # try huge pages
        temp_file_limit: 100GB                  # 0 -> 100GB
        vacuum_cost_delay: 2ms                  # wait 2ms per 10000 cost
        vacuum_cost_limit: 10000                # 10000 cost each round
        bgwriter_delay: 10ms                    # check dirty page every 10ms
        bgwriter_lru_maxpages: 800              # 100 -> 800
        bgwriter_lru_multiplier: 5.0            # 2.0 -> 5.0  more cushion buffer

        #----------------------------------------------------------------------
        # WAL
        #----------------------------------------------------------------------
        wal_buffers: 16MB                       # max to 16MB
        wal_writer_delay: 20ms                  # wait period
        wal_writer_flush_after: 1MB             # max allowed data loss
        min_wal_size: 100GB                     # at least 100GB WAL
        max_wal_size: 400GB                     # at most 400GB WAL
        commit_delay: 20                        # 200ms -> 20ms, increase speed
        commit_siblings: 10                     # 5 -> 10
        checkpoint_timeout: 60min               # checkpoint 5min -> 1h
        checkpoint_completion_target: 0.95      # 0.5 -> 0.95
        archive_mode: on
        archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'

        #----------------------------------------------------------------------
        # REPLICATION
        #----------------------------------------------------------------------
        # synchronous_standby_names: ''
        vacuum_defer_cleanup_age: 50000         # 0->50000 last 50000 xact changes will not be vacuumed
        promote_trigger_file: promote.signal    # default promote trigger file path
        max_standby_archive_delay: 10min        # max delay before canceling queries when reading WAL from archive;
        max_standby_streaming_delay: 3min       # max delay before canceling queries when reading streaming WAL;
        wal_receiver_status_interval: 1s        # send replies at least this often
        hot_standby_feedback: on                # send info from standby to prevent query conflicts
        wal_receiver_timeout: 60s               # time that receiver waits for
        max_logical_replication_workers: 8      # 4 -> 8
        max_sync_workers_per_subscription: 8    # 4 -> 8

        #----------------------------------------------------------------------
        # QUERY TUNING
        #----------------------------------------------------------------------
        # planner
        # enable_partitionwise_join: on
        random_page_cost: 1.1                   # 4 for HDD, 1.1 for SSD
        effective_cache_size: 320GB             # max mem - shared buffer
        default_statistics_target: 1000         # stat bucket 100 -> 1000

        #----------------------------------------------------------------------
        # REPORTING AND LOGGING
        #----------------------------------------------------------------------
        log_destination: csvlog                 # use standard csv log
        logging_collector: on                   # enable csvlog
        log_directory: log                      # default log dir: /pg/data/log
        # log_filename: 'postgresql-%a.log'     # weekly auto-recycle
        log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
        log_checkpoints: on                     # log checkpoint info
        log_lock_waits: on                      # log lock wait info
        log_replication_commands: on            # log replication info
        log_statement: ddl                      # log ddl change
        log_min_duration_statement: 100         # log slow query (>100ms)

        #----------------------------------------------------------------------
        # STATISTICS
        #----------------------------------------------------------------------
        track_io_timing: on                     # collect io statistics
        track_functions: all                    # track all functions (none|pl|all)
        track_activity_query_size: 32768        # show full query on critical database

        #----------------------------------------------------------------------
        # AUTOVACUUM
        #----------------------------------------------------------------------
        log_autovacuum_min_duration: 1s         # log autovacuum activity take more than 1s
        autovacuum_max_workers: 3               # default autovacuum worker 3
        autovacuum_naptime: 1min                # default autovacuum naptime 1min
        autovacuum_vacuum_scale_factor: 0.08    # fraction of table size before vacuum   20% -> 8%
        autovacuum_analyze_scale_factor: 0.04   # fraction of table size before analyze  10% -> 4%
        autovacuum_vacuum_cost_delay: -1        # default vacuum cost delay: same as vacuum_cost_delay
        autovacuum_vacuum_cost_limit: -1        # default vacuum cost limit: same as vacuum_cost_limit
        autovacuum_freeze_max_age: 100000000    # age > 1 billion triggers force vacuum

        #----------------------------------------------------------------------
        # CLIENT
        #----------------------------------------------------------------------
        deadlock_timeout: 50ms                  # 50ms for deadlock
        idle_in_transaction_session_timeout: 1min  # 1min timeout for idle in transaction in critical database

        #----------------------------------------------------------------------
        # CUSTOMIZED OPTIONS
        #----------------------------------------------------------------------
        # extensions
        shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'

        # auto_explain
        auto_explain.log_min_duration: 1s       # auto explain query slower than 1s
        auto_explain.log_analyze: true          # explain analyze
        auto_explain.log_verbose: true          # explain verbose
        auto_explain.log_timing: true           # explain timing
        auto_explain.log_nested_statements: true

        # pg_stat_statements
        pg_stat_statements.max: 10000           # 5000 -> 10000 queries
        pg_stat_statements.track: all           # track all statements (all|top|none)
        pg_stat_statements.track_utility: on    # TRACK all queries on critical database
        pg_stat_statements.track_planning: off  # do not track planning metrics


#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:

  #----------------------------------------------------------------------------
  # how to connect to postgres
  #----------------------------------------------------------------------------
  bin_dir: {{ pg_bin_dir }}
  data_dir: {{ pg_data }}
  config_dir: {{ pg_data }}
  pgpass: {{ pg_dbsu_home }}/.pgpass
  listen: {{ pg_listen }}:{{ pg_port }}
  connect_address: {{ inventory_hostname }}:{{ pg_port }}
  use_unix_socket: true # default: /var/run/postgresql, /tmp

  #----------------------------------------------------------------------------
  # who to connect to postgres
  #----------------------------------------------------------------------------
  authentication:
    superuser:
      username: {{ pg_dbsu }}
    replication:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'
    rewind:
      username: {{ pg_replication_username }}
      password: '{{ pg_replication_password }}'

  #----------------------------------------------------------------------------
  # how to react to database operations
  #----------------------------------------------------------------------------
  # event callback script log: /pg/log/callback.log
  callbacks:
    on_start: /pg/bin/pg-failover-callback
    on_stop: /pg/bin/pg-failover-callback
    on_reload: /pg/bin/pg-failover-callback
    on_restart: /pg/bin/pg-failover-callback
    on_role_change: /pg/bin/pg-failover-callback

  # rewind policy: data checksum should be enabled before using rewind
  use_pg_rewind: true
  remove_data_directory_on_rewind_failure: true
  remove_data_directory_on_diverged_timelines: false

  #----------------------------------------------------------------------------
  # how to create replica
  #----------------------------------------------------------------------------
  # create replica method: default pg_basebackup
  create_replica_methods:
    - basebackup
  basebackup:
    - max-rate: '1000M'
    - checkpoint: fast
    - status-interva: 1s
    - verbose
    - progress

  #----------------------------------------------------------------------------
  # ad hoc parameters (overwrite with default)
  #----------------------------------------------------------------------------
  # parameters:

  #----------------------------------------------------------------------------
  # host based authentication, overwrite default pg_hba.conf
  #----------------------------------------------------------------------------
  # pg_hba:
  #   - local   all             postgres                                ident
  #   - local   all             all                                     md5
  #   - host    all             all            0.0.0.0/0                md5
  #   - local   replication     postgres                                ident
  #   - local   replication     all                                     md5
  #   - host    replication     all            0.0.0.0/0                md5
...

9 - 专业支持

需要专业支持?看看这儿!

Pigsty是一个开源系统,欢迎各位贡献PR或ISSUE。

但是,时间是是很宝贵的啊,同志们,如果我天天都来处理各种疑难杂症,可就没时间来写Bug了!

专业支持

Pigsty亦提供可选的专业支持,包括下列扩展内容与服务支持:

  • 管控界面
  • 完整的监控系统,包含约三千余项监控指标。
  • 安全加固
  • 额外的监控面板,提供更为丰富的集群监控信息。
  • 生产级部署运维管理方案
  • 元数据库建设,全局数据字典
  • 日志收集系统,日志摘要信息聚合汇总
  • 备份/恢复,并发备份、延时备份、备份校验等一条龙解决方案
  • 协助部署,系统集成,对接监控报警基础设施或接入已有数据库
  • 故障诊断服务
  • 答疑咨询培训
  • 其他定制化需求

详情咨询 @Vonng(rh@vonng.com)

9.1 - 同类对比

与其他PostgreSQL监控系统的横向对比

概览

下面是PostgreSQL的相关监控系统。

下面是一些候选竞品,但没一个能打的,还是得我行我上

横向对比

这里是指标数量的横向对比。这里只取和数据库相关的指标,也就是说机器CPU磁盘这些指标就抛开不计了。

有一些开源的、或者商业的,或者云厂商的PG监控系统,这里根据它们公开的代码或文档进行统计。一家之言有卖瓜自夸之嫌,欢迎各位指正。至少在数量级上,这个图还是没有太大问题的,详情参考文末连接。

有人可能会问,虽然指标很多看起来很厉害的样子,但这有什么实际意义呢?诚然,对于故障预警来说只需要有几个关键性指标就可以了。但是充分的指标覆盖率,能进一步提高我们对数据库的洞察力与掌控力,而这一点是再高也不过分的,多多益善。

竞品

PGWatch

PG Analyze

PGDash

PGMonitor

AWS RDS

Azure RDS

Aliyun RDS

参考连接

pgwatch

pgmonitor

datadog

pgDash

ClusterControl

pganalyze

Aliyun RDS

AWS RDS

Azure RDS

9.2 - 开源初心

为什么Pigsty会选择开源?

开发Pigsty的初心是希望弥补PostgreSQL开源生态中的遗憾。

Pigsty基于开源组件构建,因此也决定采用开源的方式回馈社区。

这个东西能不能卖钱呢,当然可以卖钱,所以也会提供可选的专业支持,供大户人家选购!

专业版会有更多的监控面板与指标,更美观的UI,更丰富的功能。但开源版本身对于生产使用也完全绰绰有余了。

那为什么要开源呢,除了打广告的因素,主要还真就是情怀了。

开源就是这样,靠的当然还是喜爱,热情与奉献。

就好比PostgreSQL,世界上最先进的开源关系型数据库,就是免费给大家用,多么有情怀。

我也算是吃PG这碗饭的,虽然写不出PG,但写个配得上PG的世界上最好的开源关系型数据库PostgreSQL的监控系统还是可以做到的。

Pigsty基于开源生态,回馈开源社区,希望Pigsty能在大家使用PG的过程中起到帮助,提升使用PG的体验和爽度!

9.3 - 群组

问题交流

Overview

9.4 - 路线图

Pigsty项目的下一步发展规划

版本规划

Pigsty当前版本为v0.8.0,仍处于Beta状态。但保证供给方案功能Freeze,API不再发生变化。

下一个版本v0.9.0将对监控系统指标,规则,可视化方案进行最后一次整体校订,进入RC状态(2021年5月)。

将于v1.0(2021年中)进入GA状态(2021年6月)。

v1.0后,供给方案部分不再添加新功能,着重关注监控系统指标、面板的开发与优化。

长期规划

将Pigsty做成完整的PostgreSQL私有云平台,包括完整的:

  • 监控系统
  • 供给方案
  • 管控系统

同时将着重开发Pigsty专业版功能,包括:

  • 图形化管理界面

  • 基于Grafana8新特性构建的崭新监控系统

  • 动态Inventory支持与状态回馈

  • CMDB集成与巡检日报

  • 快速安装部署组件pgup

  • 融合了pgbouncer, pg_exporter, patroni功能的代理组件 ppp

  • 基于云/容器的PostgreSQL供给方案

  • 集成Citus供给方案

  • 集成GreenPlum供给放哪

  • 进一步丰富管理剧本集合