欢迎来到Pigsty中文文档(v0.9)
Pigsty是 PostgreSQL In Graphic STYle 的缩写,即 “图形化Postgres”。
pigsty 一词的的本意是猪圈,读作 Pig Style (/ˈpɪɡˌstaɪ/) 。
中文文档 | English Docs
Pigsty提供业界顶尖的开源PostgreSQL监控系统,与开箱即用的高可用数据库供给方案。既可以用于监控、部署、管理大规模生产级高可用数据库集群,也可用于快速搭建单机测试&演示数据库环境。
Pigsty基于开源生态构建,针对大规模数据库集群监控与管理而设计;经过长期迭代演进,久经实际生产环境考验。Pigsty旨在为用户带来极致的可观测性与丝滑的数据库使用体验,降低PostgreSQL使用管理的门槛,让所有人都能轻松享受到数据库的乐趣。
Pigsty基于Apache 2.0协议开源,可免费用于商业目的。但不得改装为自有产品,须遵守显著声明义务。
1 - 概览
快速了解Pigsty所解决的问题,采用的技术,适用的场景。
Pigsty是什么?
- Pigsty是最顶尖的开源PostgreSQL监控系统
- Pigsty是最易用的开源PostgreSQL供给方案
- Pigsty是最开放的PostgreSQL解决方案,是开源软件
Pigsty是监控系统
You can’t manage what you don’t measure.
监控系统提供了对系统状态的度量,是运维管理工作的基石。
PostgreSQL是世界上最好的开源关系型数据库,但其生态中却缺少一个足够好的监控系统。
Pigsty旨在解决这一问题:交付最好的PostgreSQL监控系统。
与同类产品相比,Pigsty在指标覆盖率与监控面板丰富程度上一骑绝尘,无出其右,详见同类对比。
Pigsty是供给方案
授人以鱼,不如授人以渔。
Pigsty还是门槛最低的高可用数据库集群 供给方案。
供给方案不是数据库,而是数据库工厂。用户向工厂提交订单,供给系统会自动根据表单的内容,创建出对应的数据库集群。
Pigsty通过声明式的配置定义数据库集群,通过幂等的预置剧本自动创建所需的数据库集群,提供近似私有云般的使用体验。
Pigsty创建的数据库集群是分布式、高可用的数据库集群。只要集群中有任意实例存活,集群就可以对外提供完整的读写服务与只读服务。数据库集群中的每个数据库实例在使用上都是幂等的,任意实例都可以通过内建负载均衡组件提供完整的读写服务,提供分布式数据库的使用体验。数据库集群可以自动进行故障检测与主从切换,普通故障能在几秒到几十秒内自愈,且期间只读流量不受影响。
Pigsty采用简单成熟稳定的物理机/虚拟机部署方式,一行命令完成安装,真正做到傻瓜式部署。本地开发,公用测试,生产环境均可使用同一套方案,既可用于学习、开发、测试,又能用于大规模生产实践。
此外,Pigsty的监控系统可以脱离Pigsty供给方案独立部署,详见 仅监控部署。
Pigsty是开源软件
Pigsty基于Apache 2.0协议开源,可以免费使用,也提供可选的商业支持。
Pigsty的监控系统与供给方案大多基于开源组件,而PostgreSQL本身也是世界上最先进的开源关系型数据库。基于开源生态,回馈开源社区。Pigsty可以极大地降低PostgreSQL的使用与管理门槛,让更多人享受到PostgreSQL的便利,体验数据库的乐趣。
开发Pigsty的初衷是:作者需要对一个大规模PostgreSQL集群进行管理,但找遍所有市面上的开源与商业监控系统方案后,发现没有一个是“足够好用”的。本着“我行我上”的精神,开发设计了Pigsty监控系统。而监控系统要想发行与演示,必须要先有被监控的对象,所以顺便开发了Pigsty供给方案。
Pigsty将主从复制,故障切换,流量代理,连接池,服务发现,基本权限系统等生产级成熟部署方案打包至本项目中,并提供了沙箱环境用于演示与测试。沙箱配置文件只微量修改即可应用于生产环境部署,用户在自己的笔记本电脑上就可以充分探索与体验Pigsty提供的功能,真正做到开箱即用。
接下来做什么?
上手
- 快速开始:在本机上快速拉起Pigsty沙箱
- 探索实验:利用Pigsty体验数据库的乐趣
浏览
- 监控界面:查阅监控系统提供的功能与界面
- 基本概念:关于Pigsty的基本概念与重要信息
- 公开示例:访问公开的Pigsty演示环境。
实战
2 - 上手
如何快速拉起Pigsty
准备
安装Pigsty需要一个机器节点:规格至少为1核2GB,采用Linux内核,安装CentOS 7发行版,处理器为x86_64架构。该节点在生产环境中被用作元节点(管理节点),发出控制命令,采集监控数据,运行定时任务。
安装
安装需要root权限。使用带有sudo权限的用户(或root)执行以下命令即可完成安装:
curl -fsSL https://pigsty.cc/pigsty.tgz | gzip -d | tar -xC ~; cd ~/pigsty # 下载源码
make config # 配置环境
make install # 安装软件
在使用离线安装包的情况下,整个安装过程耗时约10~15分钟。
./configure
会自动检测环境。如果节点拥有多个IP地址,请指定一个主要IP地址。沙箱环境中的IP地址固定为10.10.10.10
。此外,如果离线安装包/tmp/pkg.tgz
不存在,程序会提示是否从网络下载。
沙箱
如果希望在本机运行Pigsty,可以使用虚拟机软件,或使用Pigsty沙箱。沙箱是本地演示/测试/开发环境,运行于由 Vagrant 托管的本地 Virtualbox 虚拟机上。这两者都是跨平台软件,可以在MacOS|Windows|Linux下运行。
以MacOS为例,在本机终端中依次执行以下命令,即可拉起沙箱。
make deps # 安装homebrew,并通过homebrew安装vagrant与virtualbox(需重启)
make dns # 向本机/etc/hosts写入静态域名 (需sudo输入密码)
make start # 使用Vagrant拉起单个meta节点 (start4则为4个节点)
make demo # 使用单节点Demo配置并安装 (demo4则为4节点demo)
使用
安装完毕后,用可以直接访问该节点上的端口来使用Pigsty监控系统。
例如,Pigsty监控系统默认使用3000端口,默认管理用户与密码均为:admin
。
在使用沙箱时,用户可以通过make dns
写入的默认本地域名访问Pigsty提供的相关服务,例如这里的:http://g.pigsty。Pigsty对外暴露的相关服务如下表所示:
当使用普通机器部署时,将这里的IP地址(10.10.10.10)换为用户自己的节点IP即可。
直接通过IP地址访问虽然方便,但更合适的做法是通过 nginx_upstream
为各个服务指定域名,并通过域名访问不同的服务。Pigsty自带的Nginx会默认通过80端口对外代理所有Web访问。
部署
Pigsty安装完成后,这台机器将作为Pigsty的元节点。用户可以从元节点发起控制,部署新PG集群。部署新数据库集群分为三步:
-
将用于部署的机器节点纳入管理
当前用户可以从当前节点免密码ssh登陆目标节点,并带有免密码的sudo权限。
-
定义数据库集群(配置文件或图形界面)
-
执行数据库集群部署剧本
如果用户通过make start4
与make demo4
启动沙箱,则无需配置直接执行此命令即可。
./pgsql.yml -l pg-test # 初始化pg-test数据库集群
更多信息请参考部署一章
FAQ
安装与使用过程中的常见问题,请参考 FAQ
接下来做什么?
2.1 - FAQ
Pigsty快速上手常见问题
下载问题
源码包从哪里下载?
Pigsty源码包:pigsty.tgz
可以从多个地方下载:Pigsty官网,Pigsty CDN,以及Github。
- Pigsty官网是最新最快都的下载地址,也是默认使用的地址。但只提供最新版本,不提供历史版本。
- Github Release 是最权威最全面的下载地址,包含所有历史版本。
- Pigsty CDN则主要用于下载历史版本,以及离线软件包。
https://pigsty.cc/pigsty.tgz # 官网最新
https://github.com/Vonng/pigsty/releases/download/v0.9/pigsty.tgz # Github
http://pigsty-1304147732.cos.accelerate.myqcloud.com/v0.9/pigsty.tgz # CDN
离线安装包从哪里下载?
默认情况下,用户不需要操心这个问题。configure
过程中如果发现离线安装包不存在,将会自动提示用户下载。但如果用户需要在没有互联网访问的环境下进行安装,就需要自行下载并将其上传至目标服务器。
离线安装包pkg.tgz
可以从Github Release 或CDN(专为大陆提供)下载。
https://github.com/Vonng/pigsty/releases/download/v0.9/pkg.tgz # Github
http://pigsty-1304147732.cos.accelerate.myqcloud.com/v0.9/pkg.tgz # CDN (China)
将其放置于安装机器的 /tmp/pkg.tgz
路径下,即可在安装过程中自动使用。离线软件包默认会解压至:/www/pigsty
。
不使用离线安装包?
离线安装包中包含了从各路Yum源与Github Release中收集下载的软件包。用户也可以选择不使用预先打包好的离线安装包,而是直接从原始上游下载。当用户使用非 CentOS 7.8 操作系统时,通常可以使用这种方式解决绝大多数依赖错漏问题。不使用离线安装包也很简单,在make config
提示时选择否 n
即可。
安装yum软件包时报错
默认的离线软件安装包基于CentOS 7.8环境制作,如果出现问题,可以删除/www/pigsty
中出现问题的相关rpm包,以及/www/pigsty/repo_complete
标记文件。执行make repo-download
重新下载与当前操作系统版本匹配的依赖软件包即可。
有些软件包下载速度太慢
Pigsty已经尽可能使用国内yum镜像进行下载,然而少量软件包仍然受到GFW的影响,导致下载缓慢,例如直接从Github下载的相关软件。有以下解决方案:
-
Pigsty提供离线软件安装包,预先打包了所有软件及其依赖。在make config
时会自动提示下载。
-
通过proxy_env
指定代理服务器,通过代理服务器下载,或直接使用墙外服务器。
-
通过URL直接下载的软件,Pigsty CDN提供了镜像(文件名不变,前缀换掉),例如:
http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/pg_exporter-0.3.2-1.el7.x86_64.rpm
Vagrant沙箱第一次启动太慢
Pigsty沙箱默认使用CentOS 7虚拟机,Vagrant首次启动虚拟机时,会下载CentOS/7
的ISO镜像Box,尺寸不小。(当然您也可以选择自己下一个CentOS 7 ISO然后用虚拟机安装)。使用代理可能会提高下载速度,好在这个下载只需要在第一次启动时进行。
版本问题
Pigsty源码有哪几种分支?
除了常规的语义版本号之外,Pigsty有三个主要分支:Default, Pro, Beta。pigsty.tgz
为标准开源版本,pigsty-beta.tgz
为BETA版本,pigsty-pro.tgz
为专业版本。普通用户使用默认的pigsty.tgz
即可,专业版目前不提供公开下载。
我需要要等1.0 GA吗?
Pigsty从0.3开始就实际应用于真实世界的生产环境中,并不是1.0才真正General Available。然而1.0计划了若干变更(例如监控指标的重新定义改造,PG14的支持),而Pigsty不会对v1.0前的版本提供升级支持。是否现在就用于生产请视自身情况考虑。
编辑Pigsty配置文件的GUI工具是什么?
那是一个单独的命令行工具pigsty-cli
,目前处于beta状态。将于Pigsty v1.0一同正式发布。
环境问题
Pigsty的安装环境
安装Pigsty需要至少一个机器节点:规格至少为1核2GB,采用Linux内核,安装CentOS 7发行版,处理器为x86_64架构。
在生产环境中,建议使用更高规格的机器,并部署多个元节点作为容灾冗余。生产环境中元节点将作为管理节点发出控制命令,管理部署数据库集群,采集监控数据,运行定时任务等。
Pigsty的操作系统要求
Pigsty强烈建议使用CentOS 7.8操作系统安装元节点与数据库节点,以免将精力消耗在无谓的问题上。
Pigsty的默认开发、测试、部署环境都基于CentOS 7.8,CentOS 7.6也经过充分的验证。其他CentOS 7.x及其等效版本RHEL7 , Oracle Linux 7在理论上都没有问题,但并未进行测试与验证。
在使用仅监控模式监控已有PostgreSQL数据库集群时,可以使用不同的Linux发行版。因为监控系统相关组件均为Go编写的二进制,可以兼容各种Linux发行版。 但这并不是官方支持的行为。
后续其他操作系统支持可能以容器镜像的形式提供。
为什么不使用Docker与Kubernetes?
虽然Docker对于环境兼容性破事有非常好的疗效,然而数据库并不属于容器使用的最佳场景。此外Docker与Kubernetes本身也是有使用门槛。为了满足“降低门槛”的主旨,Pigsty采用裸机部署。
但Pigsty在设计之初就考虑到容器化云化的需求,这体现在其配置定义的声明式实现中。并不需要太多修改就可以迁移改造为云原生解决方案。当时机成熟时,会考虑使用Kubernetes Operator的方式进行重构。
集成问题
是否可以监控已有的PG实例?
对于非Pigsty供给方案创建的外部数据库,可以使用仅监控模式部署,详情请参考文档。注意Pigsty部署需要目标机器ssh sudo权限。因此通常无法支持云厂商RDS,但例如MyBase for PostgreSQL的ECS托管云数据库是可以纳入监控的。
云厂商RDS监控不了有什么办法?
目前Pigsty官方不支持对纯RDS的监控,因为缺少机器指标的监控系统只能说是半成品。但用户可以通过本地部署PG Exporter远程连接监控RDS,以及Prometheus本地静态服务发现抓取本地Exporter,并通过手工配置Label的方式实现曲线救国。
监控系统问题
监控系统中的Dashboard与文档不一致?
为什么监控系统里只有10个Dashboard?因为开源版本的Pigsty只提供这些监控面板,当然也绝对够用了。
为什么PG Instance Log面板没有数据?
日志收集目前是一个Beta特性,需要额外的安装步骤。执行make logging
会安装loki
与promtail
,执行后该面板方可用。毕竟loki还是比较新的日志收集方案,不是所有人都愿意接受。
监控系统的数据量有多大?
这取决于您数据库的复杂程度(workload),作为参考:200个生产数据库实例1天产生的监控数据量约为16GB。Pigsty默认保留30天监控数据,可以通过参数调整。
架构问题
Pigsty都装了什么东西?
详情请参考系统架构。
Pigsty是一套带有完整运行时的数据库解决方案。在本机上,Pigsty可以作为开发、测试、数据分析的环境。在生产环境中,Pigsty可以用于部署,管理,监控大规模PostgreSQL集群。
Pigsty数据库如何保证高可用
Patroni 2.0作为HA Agent,Consul作为DCS,Haproxy作为默认流量分发器。Pigsty的数据库集群成员在使用上幂等:只要集群还有任意一个实例存活,读写与只读流量都可以继续工作。
DCS自身的可用性通过多节点共识保证,故生产环境中建议部署3~5个meta节点,或使用外部的DCS集群。
Pigsty问题交流群
3 - 概念
在使用Pigsty时需要了解的一些信息
Pigsty在逻辑上由两部分组成:监控系统 与 供给方案 。
监控系统负责监控PostgreSQL数据库集群,供给方案负责创建PostgreSQL数据库集群。了解Pigsty的监控系统与供给方案前,阅读 命名原则 与 整体架构 有助于对整体设计形成直观印象。
Pigsty的监控系统与供给方案可以独立使用,用户可以在不使用Pigsty供给方案的情况下,使用Pigsty监控系统监控现有PostgreSQL集群与实例,详见 仅监控部署。
监控系统
You can’t manage what you don’t measure.
监控系统提供了对系统状态的度量,是运维管理工作的基石。Pigsty提供最好的开源PostgreSQL监控系统。
Pigsty的监控系统在物理上分为两个部分:
- 服务端:部署于元节点上,包括时序数据库Prometheus,监控仪表盘Grafana,报警管理Altermanager,服务发现Consul等服务。
- 客户端:部署于数据库节点上,包括NodeExporter, PgExporter, Haproxy。被动接受Prometheus拉取,上。
Pigsty监控系统的核心概念如下:
供给方案
授人以鱼,不如授人以渔
供给方案(Provisioning Solution) ,指的是向用户交付数据库服务与监控系统的系统。供给方案不是数据库,而是数据库工厂,用户向供给系统提交一份配置,供给系统便会按照用户所需的规格在环境中创建出所需的数据库集群来,这类似于通过向Kubernetes提交YAML文件来创建系统所需的各类资源。
Pigsty的供给方案在部署上分为两个部分:
- 基础设施(Infra) :部署于元节点上,监控基础设施,DNS,NTP,DCS,本地源等关键服务。
- 数据库集群(PgSQL):部署于数据库节点上,以集群为单位对外提供数据库服务。
Pigsty的供给方案的部署对象分为两种:
- 元节点(Meta):部署基础设施,执行控制逻辑,每个Pigsty部署至少需要一个元节点,可复用为普通节点。
- 数据库节点(Node):用于部署数据库集群/实例,Pigsty采用节点与数据库实例一一对应的独占式部署。
Pigsty供给方案的相关概念如下:
3.1 - 命名原则
介绍Pigsty默认采用的实体命名原则
名之必可言也,言之必可行也。
概念及其命名是非常重要的东西,命名风格体现了工程师对系统架构的认知。定义不清的概念将导致沟通困惑,随意设定的名称将产生意想不到的额外负担。因此需要审慎地设计。本文介绍 Pigsty 中的相关实体,以及其命名所遵循的原则。
结论
Pigsty中,核心的四类实体为:集群(Cluster),服务(Service),实例(Instance),节点(Node)
- 集群(Cluster) 是基本自治单元,由用户指定唯一标识,表达业务含义,作为顶层命名空间。
- 集群在硬件层面上包含一系列的节点(Node),即物理机,虚机(或Pod),可以通过IP唯一标识。
- 集群在软件层面上包含一系列的实例(Instance),即软件服务器,可以通过IP:Port唯一标识。
- 集群在服务层面上包含一系列的服务(Service),即可访问的域名与端点,可以通过域名唯一标识。
- 集群的命名可以使用任意满足DNS域名规范的名称,不能带点(
[a-zA-Z0-9-]+
)。
- 节点命名采用集群名称作为前缀,后接
-
,再接一个整数序号(建议从0开始分配,与k8s保持一致)
- 因为Pigsty采用独占式部署,节点与实例一一对应。则实例命名可与节点命名保持一致,即
${cluster}-${seq}
的方式。
- 服务命名亦采用集群名称作为前缀,后接
-
连接服务具体内容,如primary
, replica
,offline
,delayed
等。
以上图为例,用于测试的数据库集群名为“pg-test
”,该集群由一主两从三个数据库服务器实例组成,部署在集群所属的三个节点上。pg-test
集群集群对外提供两种服务,读写服务pg-test-primary
与只读副本服务pg-test-replica
。
实体
在Postgres集群管理中,有如下实体概念:
集群(Cluster)
集群是基本的自治业务单元,这意味着集群能够作为一个整体组织对外提供服务。类似于k8s中Deployment的概念。注意这里的集群是软件层面的概念,不要与PG Cluster(数据库集簇,即包含多个PG Database的单个PG实例的数据目录)或Node Cluster(机器集群)混淆。
集群是管理的基本单位之一,是用于统合各类资源的组织单位。例如一个PG集群可能包括:
- 三个物理机器节点
- 一个主库实例,对外提供数据库读写服务。
- 两个从库实例,对外提供数据库只读副本服务。
- 两个对外暴露的服务:读写服务,只读副本服务。
每个集群都有用户根据业务需求定义的唯一标识符,本例中定义了一个名为pg-test
的数据库集群。
节点(Node)
节点是对硬件资源的一种抽象,通常指代一台工作机器,无论是物理机(bare metal)还是虚拟机(vm),或者是k8s中的Pod。这里注意k8s中Node是硬件资源的抽象,但在实际管理使用上,是k8s中的Pod而不是Node更类似于这里Node概念。总之,节点的关键要素是:
- 节点是硬件资源的抽象,可以运行一系列的软件服务
- 节点可以使用IP地址作为唯一标识符
尽管可以使用lan_ip
地址作为节点唯一标识符,但为了便于管理,节点应当拥有一个人类可读的充满意义的名称作为节点的Hostname,作为另一个常用的节点唯一标识。
服务(Service)
服务是对软件服务(例如Postgres,Redis)的一种命名抽象(named abastraction)。服务可以有各种各样的实现,但其的关键要素在于:
- 可以寻址访问的服务名称,用于对外提供接入,例如:
- 一个DNS域名(
pg-test-primary
)
- 一个Nginx/Haproxy Endpoint
- 服务流量路由解析与负载均衡机制,用于决定哪个实例负责处理请求,例如:
- DNS L7:DNS解析记录
- HTTP Proxy:Nginx/Ingress L7:Nginx Upstream配置
- TCP Proxy:Haproxy L4:Haproxy Backend配置
- Kubernetes:Ingress:Pod Selector 选择器。
同一个数据集簇中通常包括主库与从库,两者分别提供读写服务(primary)和只读副本服务(replica)。
实例(Instance)
实例指带一个具体的数据库服务器,它可以是单个进程,也可能是共享命运的一组进程,也可以是一个Pod中几个紧密关联的容器。实例的关键要素在于:
- 可以通过IP:Port唯一标识
- 具有处理请求的能力
例如,我们可以把一个Postgres进程,为之服务的独占Pgbouncer连接池,PgExporter监控组件,高可用组件,管理Agent看作一个提供服务的整体,视为一个数据库实例。
实例隶属于集群,每个实例在集群范围内都有着自己的唯一标识用于区分。
实例由服务负责解析,实例提供被寻址的能力,而Service将请求流量解析到具体的实例组上。
命名规则
一个对象可以有很多组标签(Tag)与元数据(Metadata/Annotation),但通常只能有一个名字(Name)。
管理数据库和软件与管理宠物类似,都需要花心思照顾。而起名字就是其中非常重要的一项工作。肆意的名字(例如 XÆA-12,NULL,史珍香)很可能会引入不必要的麻烦(额外复杂度),而设计得当的名字则可能会有意想不到的惊喜效果。
总体而言,对象起名应当遵循一些原则:
-
简洁直白,人类可读:名字是给人看的,因此要好记,便于使用。
-
体现功能,反映特征:名字需要反映对象的关键特征
-
独一无二,唯一标识:名字在命名空间内,自己的类目下应当是独一无二,可以惟一标识寻址的。
-
不要把太多无关的东西塞到名字里去:在名字中嵌入很多重要元数据是一个很有吸引力的想法,但维护起来会非常痛苦,例如反例:pg:user:profile:10.11.12.13:5432:replica:13
。
集群命名
集群名称,其实类似于命名空间的作用。所有隶属本集群的资源,都会使用该命名空间。
集群命名的形式,建议采用符合DNS标准 RFC1034 的命名规则,以免给后续改造埋坑。例如哪一天想要搬到云上去,发现以前用的名字不支持,那就要再改一遍名,成本巨大。
我认为更好的方式是采用更为严格的限制:集群的名称不应该包括点(dot)。应当仅使用小写字母,数字,以及减号连字符(hyphen)-
。这样,集群中的所有对象都可以使用这个名称作为前缀,用于各种各样的地方,而不用担心打破某些约束。即集群命名规则为:
cluster_name := [a-z][a-z0-9-]*
之所以强调不要在集群名称中用点,是因为以前很流行一种命名方式,例如com.foo.bar
。即由点分割的层次结构命名法。这种命名方式虽然简洁名快,但有一个问题,就是用户给出的名字里可能有任意多的层次,数量不可控。如果集群需要与外部系统交互,而外部系统对于命名有一些约束,那么这样的名字就会带来麻烦。一个最直观的例子是K8s中的Pod,Pod的命名规则中不允许出现.
。
集群命名的内涵,建议采用-
分隔的两段式,三段式名称,例如:
<集群类型>-<业务>-<业务线>
比如:pg-test-tt
就表示tt
业务线下的test
集群,类型为pg
。pg-user-fin
表示fin
业务线下的user
服务。
节点命名
节点命名建议采用与k8s Pod一致的命名规则,即
<cluster_name>-<seq>
Node的名称会在集群资源分配阶段确定下来,每个节点都会分配到一个序号${seq}
,从0开始的自增整型。这个与k8s中StatefulSet的命名规则保持一致,因此能够做到云上云下一致管理。
例如,集群pg-test
有三个节点,那么这三个节点就可以命名为:
pg-test-1
, pg-test-2
和pg-test-3
。
节点的命名,在整个集群的生命周期中保持不变,便于监控与管理。
实例命名
对于数据库来说,通常都会采用独占式部署方式,一个实例占用整个机器节点。PG实例与Node是一一对应的关系,因此可以简单地采用Node的标识符作为Instance的标识符。例如,节点pg-test-1
上的PG实例名即为:pg-test-1
,以此类推。
采用独占部署的方式有很大优势,一个节点即一个实例,这样能最小化管理复杂度。混部的需求通常来自资源利用率的压力,但虚拟机或者云平台可以有效解决这种问题。通过vm或pod的抽象,即使是每个redis(1核1G)实例也可以有一个独占的节点环境。
作为一种约定,每个集群中的0号节点(Pod),会作为默认主库。因为它是初始化时第一个分配的节点。
服务命名
通常来说,数据库对外提供两种基础服务:primary
读写服务,与replica
只读副本服务。
那么服务就可以采用一种简单的命名规则:
<cluster_name>-<service_name>
例如这里pg-test
集群就包含两个服务:读写服务pg-test-primary
与只读副本服务pg-test-replica
。
一种流行的实例/节点命名规则:<cluster_name>-<service_role>-<sequence>
,即把数据库的主从身份嵌入到实例名称中。这种命名方式有好处也有坏处。好处是管理的时候一眼就能看出来哪一个实例/节点是主库,哪些是从库。缺点是一但发生Failover,实例与节点的名称必须进行调整才能维持一执性,这就带来的额外的维护工作。此外,服务与节点实例是相对独立的概念,这种Embedding命名方式扭曲了这一关系,将实例唯一隶属至服务。但复杂的场景下这一假设可能并不满足。例如,集群可能有几种不同的服务划分方式,而不同的划分方式之间很可能会出现重叠。
- 可读从库(解析至包含主库在内的所有实例)
- 同步从库(解析至采用同步提交的备库)
- 延迟从库,备份实例(解析至特定具体实例)
因此不要把服务角色嵌入实例名称,而是在服务中维护目标实例列表。毕竟名字并非全能,不要把太多非必要的信息嵌入到对象名称中。
3.2 - 系统架构
介绍Pigsty的系统架构
一套Pigsty部署在架构上分为两个部分:
- 基础设施(Infra) :部署于元节点上,监控,DNS,NTP,DCS,Yum源等基础服务。
- 数据库集群(PgSQL):部署于数据库节点上,以集群为单位对外提供数据库服务。
同时,用于部署的 节点(物理机,虚拟机,Pod)也分为两种:
- 元节点(Meta):部署基础设施,执行控制逻辑,每个Pigsty部署至少需要一个元节点。
- 数据库节点(Node):用于部署数据库集群/实例,节点与数据库实例一一对应。
沙箱样例
以Pigsty附带的四节点沙箱环境为例,组件在节点上的分布如下图所示:
图:Pigsty沙箱中包含的节点与组件
沙箱由一个元节点与四个数据库节点组成(元节点也被复用为一个数据库节点),部署有一套基础设施与两套数据库集群。 meta
为元节点,部署有基础设施组件,同时被复用为普通数据库节点,部署有单主数据库集群pg-meta
。 node-1
,node-2
,node-3
为普通数据库节点,部署有数据库集群pg-test
。
基础设施
每一套 Pigsty 部署(Deployment) 中,都需要有一些基础设施,才能使整个系统正常工作。
基础设施通常由专业的运维团队或云厂商负责,但Pigsty作为一个开箱即用的产品解决方案,将基本的基础设施集成至供给方案中。
- 域名基础设施:Dnsmasq(部分请求转发至Consul DNS处理)
- 时间基础设施:NTP
- 监控基础设施:Prometheus
- 报警基础设施:Altermanager
- 可视化基础设施:Grafana
- 本地源基础设施:Yum/Nginx
- 分布式配置存储:etcd/consul
- Pigsty基础设施:元数据库MetaDB,管理组件Ansible,定时任务,与其他高级特性组件。
基础设施部署于 元节点 上。一套环境中包含一个或多个元节点,用于基础设施部署。
除了 分布式配置存储(DCS) 之外,所有基础设施组件都采用副本式部署;如果有多个元节点,元节点上的DCS(etcd/consul)会共同作为DCS Server。
元节点
在每套环境中,Pigsty最少需要一个元节点,该节点将作为整个环境的控制中心。元节点负责各种管理工作:保存状态,管理配置,发起任务,收集指标,等等。整个环境的基础设施组件,Nginx,Grafana,Prometheus,Alertmanager,NTP,DNS Nameserver,DCS都将部署在元节点上。
同时,元节点也将用于部署元数据库 (Consul 或 Etcd),用户也可以使用已有的外部DCS集群。如果将DCS部署至元节点上,建议在生产环境使用3个元节点,以充分保证DCS服务的可用性。DCS外的基础设施组件都将以对等副本的方式部署在所有元节点上。元节点的数量要求最少1个,推荐3个,建议不超过5个。
元节点上运行的服务如下所示:
组件 |
端口 |
默认域名 |
说明 |
Grafana |
3000 |
g.pigsty |
Pigsty监控系统图形界面 |
Prometheus |
9090 |
p.pigsty |
监控时序数据库 |
AlertManager |
9093 |
a.pigsty |
报警聚合管理组件 |
Consul |
8500 |
c.pigsty |
分布式配置管理,服务发现 |
Consul DNS |
8600 |
- |
Consul提供的DNS服务 |
Nginx |
80 |
pigsty |
所有服务的入口代理 |
Yum Repo |
80 |
yum.pigsty |
本地Yum源 |
Haproxy Index |
80 |
h.pigsty |
所有Haproxy管理界面的访问代理 |
NTP |
123 |
n.pigsty |
环境统一使用的NTP时间服务器 |
Dnsmasq |
53 |
- |
环境统一使用的DNS域名解析服务器 |
部署于元节点上的基础设置架构如下图所示:
其主要交互关系如下:
-
Dnsmasq提供环境内的DNS解析服务(可选,可使用已有Nameserver)
部分DNS解析将转交由Consul DNS进行
-
Nginx对外暴露所有Web服务,通过域名进行区分转发。
-
Yum Repo是Nginx的默认服务器,为环境中所有节点提供从离线安装软件的能力。
-
Grafana是Pigsty监控系统的载体,用于可视化Prometheus与CMDB中的数据。
-
Prometheus是监控用时序数据库。
- Prometheus默认从Consul获取所有需要抓取的Exporter,并为其关联身份信息。
- Prometheus从Exporter拉取监控指标数据,进行预计算加工后存入自己的TSDB中。
- Prometheus计算报警规则,将报警事件发往Alertmanager处理。
-
Consul Server用于保存DCS的状态,达成共识,服务元数据查询。
-
NTP服务用于同步环境内所有节点的时间(可选用外部NTP服务)
-
Pigsty相关组件:
- 用于执行剧本,发起控制的Ansible
- 用于支持各种高级功能的MetaDB(也是一个标准的数据库集群)
- 定时任务控制器(备份,清理,统计,巡检,高级特性暂未加入)
数据库集群
生产环境的数据库以集群为单位进行组织,集群是一个由主从复制所关联的一组数据库实例所构成的逻辑实体。每个数据库集群是一个自组织的业务服务单元,由至少一个数据库实例组成。
集群是基本的业务服务单元,下图展示了沙箱环境中的复制拓扑。其中pg-meta-1
单独构成一个数据库集群pg-meta
,而pg-test-1
,pg-test-2
,pg-test-3
共同构成另一个逻辑集群pg-test
。
pg-meta-1
(primary)
pg-test-1 -------------> pg-test-2
(primary) | (replica)
|
^-------> pg-test-3
(replica)
下图从数据库集群的视角重新排列pg-test
集群中相关组件的位置。
图:从数据库集群的逻辑视角审视架构(标准接入方案)
Pigsty是数据库供给方案,可以按需创建高可用数据库集群。只要集群中有任意实例存活,集群就可以对外提供完整的读写服务与只读服务。Pigsty可以自动进行故障切换,业务方只读流量不受影响;读写流量的影响视具体配置与负载,通常在几秒到几十秒的范围。
在Pigsty中,每个“数据库实例”在使用上是幂等的,采用类似NodePort的方式对外暴露 数据库服务。默认情况下,访问任意实例的5433端口即可访问主库,访问任意实例的5434端口即可访问从库。用户也可以灵活地同时使用不同的方式访问数据库,详情请参考:数据库接入。
数据库节点
数据库节点负责运行数据库实例, 在Pigsty中数据库实例固定采用独占式部署,一个节点上有且仅有一个数据库实例,因此节点与数据库实例可以互用唯一标识(IP地址与实例名)。
一个典型的数据库节点上运行的服务如下所示:
组件 |
端口 |
说明 |
Postgres |
5432 |
Postgres数据库服务 |
Pgbouncer |
6432 |
Pgbouncer连接池服务 |
Patroni |
8008 |
Patroni高可用组件 |
Consul |
8500 |
分布式配置管理,服务发现组件Consul的本地Agent |
Haproxy Primary |
5433 |
集群读写服务(主库连接池)代理 |
Haproxy Replica |
5434 |
集群只读服务(从库连接池)代理 |
Haproxy Default |
5436 |
集群主库直连服务(用于管理,DDL/DML变更) |
Haproxy Offline |
5438 |
集群离线读取服务(直连离线实例,用于ETL,交互式查询) |
Haproxy <Service> |
543x |
集群提供的额外自定义服务将依次分配端口 |
Haproxy Admin |
9101 |
Haproxy 监控指标与管理页面 |
PG Exporter |
9630 |
Postgres监控指标导出器 |
PGBouncer Exporter |
9631 |
Pgbouncer监控指标导出器 |
Node Exporter |
9100 |
机器节点监控指标导出器 |
Consul DNS |
8600 |
Consul提供的DNS服务 |
vip-manager |
x |
将VIP绑定至集群主库上 |
主要交互关系如下:
-
vip-manager
通过查询Consul获取集群主库信息,将集群专用L2 VIP绑定至主库节点(默认接入方案)。
-
Haproxy是数据库流量入口,用于对外暴露服务,使用不同端口(543x)区分不同的服务。
- Haproxy的9101端口暴露Haproxy的内部监控指标,同时提供Admin界面控制流量。
- Haproxy 5433端口默认指向集群主库连接池6432端口
- Haproxy 5434端口默认指向集群从库连接池6432端口
- Haproxy 5436端口默认直接指向集群主库5432端口
- Haproxy 5438端口默认直接指向集群离线实例5432端口
-
Pgbouncer用于池化数据库连接,缓冲故障冲击,暴露额外指标。
-
Postgres提供实际数据库服务,通过流复制构成主从数据库集群。
-
Patroni用于监管Postgres服务,负责主从选举与切换,健康检查,配置管理。
- Patroni使用Consul达成共识,作为集群领导者选举的依据。
-
Consul Agent用于下发配置,接受服务注册,服务发现,提供DNS查询。
-
PGB Exporter,PG Exporter, Node Exporter分别用于暴露数据库,连接池,节点的监控指标
节点与元节点交互
以单个 元节点 和 单个 数据库节点 构成的环境为例,架构如下图所示:
图:单个元节点与单个数据库节点(点击查看大图)
元节点与数据库节点之间的交互主要包括:
-
数据库集群/节点的域名依赖元节点的Nameserver进行解析。
-
数据库节点软件安装需要用到元节点上的Yum Repo。
-
数据库集群/节点的监控指标会被元节点的Prometheus收集。
-
Pigsty会从元节点上发起对数据库节点的管理
执行集群创建,扩缩容,用户、服务、HBA修改;日志收集、垃圾清理,备份,巡检等
-
数据库节点的Consul会向元节点的DCS同步本地注册的服务,并代理状态读写操作。
-
数据库节点会从元节点(或其他NTP服务器)同步时间
3.3 - 监控系统
Pigsty监控系统相关概念
3.3.1 - 可观测性
从原始信息到全局洞察
对于系统管理来说,最重要到问题之一就是可观测性(Observability),下图展示了Postgres的可观测性。
原图地址:https://pgstats.dev/
PostgreSQL 提供了丰富的观测接口,包括系统目录,统计视图,辅助函数。 这些都是用户可以观测的信息。这里列出的信息全部为Pigsty所收录。Pigsty通过精心的设计,将晦涩的指标数据,转换成了人类可以轻松理解的洞察。
可观测性
经典的监控模型中,有三类重要信息:
- 指标(Metrics):可累加的,原子性的逻辑计量单元,可在时间段上进行更新与统计汇总。
- 日志(Log):离散事件的记录与描述
- 追踪(Trace):与单次请求绑定的相关元数据
Pigsty重点关注 指标 信息,也会在后续加入对 日志 的采集、处理与展示,但Pigsty不会收集数据库的 追踪 信息。
指标
下面让以一个具体的例子来介绍指标的获取及其加工产物。
pg_stat_statements
是Postgres官方提供的统计插件,可以暴露出数据库中执行的每一类查询的详细统计指标。
图:pg_stat_statements
原始数据视图
这里pg_stat_statements
提供的原始指标数据以表格的形式呈现。每一类查询都分配有一个查询ID,紧接着是调用次数,总耗时,最大、最小、平均单次耗时,响应时间都标准差,每次调用平均返回的行数,用于块IO的时间这些指标,(如果是PG13,还有更为细化的计划时间、执行时间、产生的WAL记录数量等新指标)。
这些系统视图与系统信息函数,就是Pigsty中指标数据的原始来源。直接查阅这种数据表很容易让人眼花缭乱,失去焦点。需要将这种指标转换为洞察,也就是以直观图表的方式呈现。
图:加工后的相关监控面板,PG Cluster Query看板部分截图
这里的表格数据经过一系列的加工处理,最终呈现为若干监控面板。最基本的数据加工是对表格中的原始数据进行标红上色,但也足以提供相当实用的改进:慢查询一览无余,但这不过是雕虫小技。重要的是,原始数据视图只能呈现当前时刻的快照;而通过Pigsty,用户可以回溯任意时刻或任意时间段。获取更深刻的性能洞察。
上图是集群视角下的查询看板 (PG Cluster Query),用户可以看到整个集群中所有查询的概览,包括每一类查询的QPS与RT,平均响应时间排名,以及耗费的总时间占比。
当用户对某一类具体查询感兴趣时,就可以点击查询ID,跳转到查询详情页(PG Query Detail)中。如下图所示。这里会显示查询的语句,以及一些核心指标。
图:呈现单类查询的详细信息,PG Query Detail 看板截图
上图是实际生产环境中的一次慢查询优化记录,用户可以从右侧中间的Realtime Response Time 面板中发现一个突变。该查询的平均响应时间从七八秒突降到了七八毫秒。我们定位到了这个慢查询并添加了适当的索引,那么优化的效果就立刻在图表上以直观的形式展现出来,给出实时的反馈。
这就是Pigsty需要解决的核心问题:From observability to insight。
日志
除了指标外,还有一类重要的观测数据:日志(Log),日志是对离散事件的记录与描述。
如果说指标是对数据库系统的被动观测,那么日志就是数据库系统及其周边组件主动上报的信息。
Pigsty目前尚未对数据库日志进行挖掘,但在后续的版本中将集成pgbadger
与mtail
,引入日志统一收集、分析、处理的基础设施。并添加数据库日志相关的监控指标。
用户可以自行使用开源组件对PostgreSQL日志进行分析。
追踪
PostgreSQL提供了对DTrace的支持,用户也可以使用采样探针分析PostgreSQL查询执行时的性能瓶颈。但此类数据仅在某些特定场景会用到,实用性一般,因此Pigsty不会针对数据库收集Trace数据。
接下来?
只有指标并不够,我们还需要将这些信息组织起来,才能构建出体系来。阅读 监控层级 了解更多信息
3.3.2 - 监控层级
介绍Pigsty监控系统中的层次关系
正如 命名原则 中所介绍,Pigsty中的对象分为多个层次:集群,服务,实例,节点。
监控系统层次
Pigsty的监控系统中有着更多的层次,除了实例与集群这两个最为普遍层次,整个系统中还有着其他层次的组织。自顶向下可以分为7个层级:概览,分片,集群,服务,实例,数据库,对象。
图:Pigsty的监控面板被划分为7个逻辑层级与5个实现层级
逻辑层次
生产环境的数据库往往是以集群为单位组织的,集群是基本的业务服务单元,也是最为重要的监控层次。
集群是一个由主从复制所关联的一组数据库实例所构成的,实例是最基本的监控层次。
而多套数据库集群共同组成一个现实世界中的生产环境,概览(Overview) 层次的监控提供了对整个环境的整体描述。
按照水平拆分的模式服务于同一业务的多个数据库集群称为分片(Shard),分片层次的监控对于定位数据分布、倾斜等问题很有帮助。
服务 是夹在集群与实例中间的层次,服务通常与DNS,域名,VIP,NodePort等资源紧密关联。
数据库(Database) 是亚实例级对象,一个数据库集群/实例可能会同时有多个数据库存在,数据库层面的监控关注单个数据库内的活动。
对象(Object) 是数据库内的实体,包括表,索引,序列号,函数,查询,连接池等,对象层面的监控关注这些对象的统计指标,与业务紧密相关。
层次精简
作为一种精简,正如网络的OSI 7层模型在实际中被简化为TCP/IP五层模型一样,这七个层次也以 集群 和 实例 为界,简化为五个层次: 概览(Overview) ,集群(Cluster) , 服务(Service),实例(Instance) ,数据库(Database) 。
这样,最终的层次划分也变得十分简洁:所有集群层次以上的信息,都是 概览 层次,所有实例以下的监控都算作 数据库 层次,夹在 集群 与 实例 中间的,就是 服务 层次。
命名规则
分完层次后,最重要的问题就是命名问题:
-
需要一种方式来标识、引用系统中不同层次内的各个组件,
-
这种命名方式,应当合理地反映出系统中各个实体的层次关系
-
这种命名方式,应当可以按照规则自动生成,只有这样,才可以在集群扩容缩容,Failover时做到免维护自动化运行,
当我们理清了系统中存在的层次后,就可以着手为系统中的每个实体起名。
Pigsty所遵循的基本命名规则,请参考 命名原则 一节。
Pigsty使用独立的名称管理机制,实体的命名自成体系。
如果需要与外部系统对接,用户可以直接使用这套命名体系,或通过转接适配的方式采用自己的命名体系。
集群命名
Pigsty的集群名称由用户指定,满足[a-z0-9][a-z0-9-]*
的正则表达式,形如pg-test
,pg-meta
。
节点命名
Pigsty的节点从属于集群。Pigsty的节点名称由两部分组成:集群名 与 节点编号,并使用-
连接。
形式为${pg_cluster}-${pg_seq}
,例如pg-meta-1
,pg-test-2
。
在形式上,节点编号是长度合理的自然数(包括0),在集群范围内唯一,每个节点都有自己的编号。
实例的编号可以由用户显式指定并分配,通常采用从0或1开始分配,一旦分配,在集群生命周期内不再变更。
实例命名
Pigsty的实例从属于集群,采用独占节点式部署。
因为实例与节点存在一一对应关系,因此实例名与节点命保持一致。
服务命名
Pigsty的服务从属于集群。Pigsty的服务名称由两部分组成:集群名 与 角色(Role),并使用-
连接。
形式为${pg_cluster}-${pg_role}
,例如pg-meta-primary
,pg-test-replica
。
pg_role
的可选项包括:primary|replica|offline|delayed
。
primary
是特殊的角色,每个集群必须,且只能定义一个pg_role = primary
的实例作为主库。
其他的角色大体上由用户定义,其中replica|offline|delayed
是Pigsty预定义的角色。
接下来?
划分好监控的层级后,需要对为监控对象赋予身份,方能进行管理。
3.3.3 - 身份管理
Pigsty如何管理监控对象的身份
所有的实例都具有身份(Identity),身份信息是与实例关联的元数据,用于标识实例。
图:使用Consul服务发现时,Postgres服务带有的身份信息
身份参数
身份参数是任何集群与实例都必须定义的唯一标识符。
名称 |
变量 |
类型 |
说明 |
|
集群 |
pg_cluster |
核心身份参数 |
集群名称,集群内资源的顶层命名空间 |
|
角色 |
pg_role |
核心身份参数 |
实例角色,primary , replica , offline ,… |
|
标号 |
pg_seq |
核心身份参数 |
实例序号,正整数,集群内唯一。 |
|
实例 |
pg_instance |
衍生身份参数 |
${pg_cluster}-${pg_seq} |
|
服务 |
pg_service |
衍生身份参数 |
${pg_cluster}-${pg_role} |
|
|
|
|
|
|
身份关联
为系统中的对象命名后,还需要将 身份信息 关联至具体的实例上。
身份信息属于业务赋予的元数据,数据库实例本身不会意识到这些身份信息,它不知道自己为谁而服务,从属于哪个业务,或者自己是集群中的几号实例。
身份赋予可以有多种形式,最朴素的身份关联方式就是运维人员的记忆:DBA在脑海中记住了IP地址为10.2.3.4
上的数据库实例,是用于支付的实例,而另一台上的数据库实例则用于用户管理。更好的管理方式是通过配置文件,或者采用服务发现的方式来管理集群成员的身份。
Pigsty同时提供这两种身份管理的方式:基于Consul的服务发现,与基于配置文件的服务发现
参数 prometheus_sd_method (consul|static)
控制这一行为:
consul
:基于Consul进行服务发现,默认配置
static
:基于本地配置文件进行服务发现
Pigsty建议使用consul
服务发现,当服务器发生Failover时,监控系统会自动更正目标实例所注册的身份。
Consul服务发现
Pigsty默认采用 Consul服务发现的方式管理环境中的服务。
Pigsty内置了基于DCS的配置管理与自动服务发现,用户可以直观地察看系统中的所有节点与服务信息,以及健康状态。Pigsty中的所有服务都会自动注册至DCS中,因此创建、销毁、修改数据库集群时,元数据会自动修正,监控系统能够自动发现监控目标,无需手动维护配置。
用户亦可通过Consul提供的DNS与服务发现机制,实现基于DNS的自动流量切换。
Consul采用了Client/Server架构,整个环境中存在1~5个不等的Consul Server,用于实际的元数据存储。所有节点上都部署有Consul Agent,代理本机服务与Consul Server的通信。Pigsty默认通过本地Consul配置文件的方式注册服务。
服务注册
在每个节点上,都运行有 consul agent。服务通过JSON配置文件的方式,由consul agent注册至DCS中。
JSON配置文件的默认位置是/etc/consul.d/
,采用svc-<service>.json
的命名规则,以postgres
为例:
{
"service": {
"name": "postgres",
"port": {{ pg_port }},
"tags": [
"{{ pg_role }}",
"{{ pg_cluster }}"
],
"meta": {
"type": "postgres",
"role": "{{ pg_role }}",
"seq": "{{ pg_seq }}",
"instance": "{{ pg_instance }}",
"service": "{{ pg_service }}",
"cluster": "{{ pg_cluster }}",
"version": "{{ pg_version }}"
},
"check": {
"tcp": "127.0.0.1:{{ pg_port }}",
"interval": "15s",
"timeout": "1s"
}
}
}
其中meta
与tags
部分是服务的元数据,存储有实例的身份信息。
服务查询
用户可以通过Consul提供的DNS服务,或者直接调用Consul API发现注册到Consul中的服务
使用DNS API查阅consul服务的方式,请参阅Consul文档。
图:查询pg-bench-1
上的 pg_exporter
服务。
服务发现
Prometheus会自动通过consul_sd_configs
发现环境中的监控对象。同时带有pg
和exporter
标签的服务会自动被识别为抓取对象:
- job_name: pg
# https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config
consul_sd_configs:
- server: localhost:8500
refresh_interval: 5s
tags:
- pg
- exporter
图:被Prometheus发现的服务,身份信息已关联至实例的指标维度上。
服务维护
有时候,因为数据库主从发生切换,导致注册的角色与数据库实例的实际角色出现偏差。这时候需要通过反熵过程处理这种异常。
基于Patroni的故障切换可以正常地通过回调逻辑修正注册的角色,但人工完成的角色切换则需要人工介入处理。
使用以下脚本可以自动检测并修复数据库的服务注册。建议在数据库实例上配置Crontab,或在元节点上设置定期巡检任务。
/pg/bin/pg-register $(pg-role)
静态文件服务发现
static
服务发现依赖/etc/prometheus/targets/*.yml
中的配置进行服务发现。采用这种方式的优势是不依赖Consul。
当Pigsty监控系统与外部管控方案集成时,这种模式对原系统的侵入性较小。但是缺点是,当集群内发生主从切换时,用户需要自行维护实例角色信息。手动维护时,可以根据以下命令从配置文件生成Prometheus所需的监控对象配置文件并载入生效。
详见 Prometheus服务发现。
./infra.yml --tags=prometheus_targtes,prometheus_reload
Pigsty默认生成的静态监控对象文件示例如下:
#==============================================================#
# File : targets/all.yml
# Ctime : 2021-02-18
# Mtime : 2021-02-18
# Desc : Prometheus Static Monitoring Targets Definition
# Path : /etc/prometheus/targets/all.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
#======> pg-meta-1 [primary]
- labels: {cls: pg-meta, ins: pg-meta-1, ip: 10.10.10.10, role: primary, svc: pg-meta-primary}
targets: [10.10.10.10:9630, 10.10.10.10:9100, 10.10.10.10:9631, 10.10.10.10:9101]
#======> pg-test-1 [primary]
- labels: {cls: pg-test, ins: pg-test-1, ip: 10.10.10.11, role: primary, svc: pg-test-primary}
targets: [10.10.10.11:9630, 10.10.10.11:9100, 10.10.10.11:9631, 10.10.10.11:9101]
#======> pg-test-2 [replica]
- labels: {cls: pg-test, ins: pg-test-2, ip: 10.10.10.12, role: replica, svc: pg-test-replica}
targets: [10.10.10.12:9630, 10.10.10.12:9100, 10.10.10.12:9631, 10.10.10.12:9101]
#======> pg-test-3 [replica]
- labels: {cls: pg-test, ins: pg-test-3, ip: 10.10.10.13, role: replica, svc: pg-test-replica}
targets: [10.10.10.13:9630, 10.10.10.13:9100, 10.10.10.13:9631, 10.10.10.13:9101]
身份关联
无论是通过Consul服务发现,还是静态文件服务发现。最终的效果是实现身份信息与实例监控指标相互关联。
这一关联,是通过 监控指标 的维度标签实现的。
身份参数 |
维度标签 |
取值样例 |
pg_cluster |
cls |
pg-test |
pg_instance |
ins |
pg-test-1 |
pg_services |
svc |
pg-test-primary |
pg_role |
role |
primary |
node_ip |
ip |
10.10.10.11 |
阅读下一节 监控指标 ,了解这些指标是如何通过标签组织起来的。
3.3.4 - 监控指标
监控指标的形式,模型,数量,层次,衍生规则,
指标(Metric) 是Pigsty监控系统的核心概念。
指标形式
指标在形式上是可累加的,原子性的逻辑计量单元,可在时间段上进行更新与统计汇总。
指标通常以 带有维度标签的时间序列 的形式存在。举个例子,Pigsty沙箱中的pg:ins:qps_realtime
指展示了所有实例的实时QPS。
pg:ins:qps_realtime{cls="pg-meta", ins="pg-meta-1", ip="10.10.10.10", role="primary"} 0
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-1", ip="10.10.10.11", role="primary"} 327.6
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-2", ip="10.10.10.12", role="replica"} 517.0
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-3", ip="10.10.10.13", role="replica"} 0
用户可以对指标进行运算:求和、求导,聚合,等等。例如:
$ sum(pg:ins:qps_realtime) by (cls) -- 查询按集群聚合的 实时实例QPS
{cls="pg-meta"} 0
{cls="pg-test"} 844.6
$ avg(pg:ins:qps_realtime) by (cls) -- 查询每个集群中 所有实例的平均 实时实例QPS
{cls="pg-meta"} 0
{cls="pg-test"} 280
$ avg_over_time(pg:ins:qps_realtime[30m]) -- 过去30分钟内实例的平均QPS
pg:ins:qps_realtime{cls="pg-meta", ins="pg-meta-1", ip="10.10.10.10", role="primary"} 0
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-1", ip="10.10.10.11", role="primary"} 130
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-2", ip="10.10.10.12", role="replica"} 100
pg:ins:qps_realtime{cls="pg-test", ins="pg-test-3", ip="10.10.10.13", role="replica"} 0
指标模型
每一个指标(Metric),都是一类数据,通常会对应多个时间序列(time series)。同一个指标对应的不同时间序列通过维度进行区分。
指标 + 维度,可以具体定位一个时间序列。每一个时间序列都是由 (时间戳,取值)二元组构成的数组。
Pigsty采用Prometheus的指标模型,其逻辑概念可以用以下的SQL DDL表示。
-- 指标表,指标与时间序列构成1:n关系
CREATE TABLE metrics (
id INT PRIMARY KEY, -- 指标标识
name TEXT UNIQUE -- 指标名称,[...其他指标元数据,例如类型]
);
-- 时间序列表,每个时间序列都对应一个指标。
CREATE TABLE series (
id BIGINT PRIMARY KEY, -- 时间序列标识
metric_id INTEGER REFERENCES metrics (id), -- 时间序列所属的指标
dimension JSONB DEFAULT '{}' -- 时间序列带有的维度信息,采用键值对的形式表示
);
-- 时许数据表,保存最终的采样数据点。每个采样点都属于一个时间序列
CREATE TABLE series_data (
series_id BIGINT REFERENCES series(id), -- 时间序列标识
ts TIMESTAMP, -- 采样点时间戳
value FLOAT, -- 采样点指标值
PRIMARY KEY (series_id, ts) -- 每个采样点可以通过 所属时间序列 与 时间戳 唯一标识
);
这里我们以pg:ins:qps
指标为例:
-- 样例指标数据
INSERT INTO metrics VALUES(1, 'pg:ins:qps'); -- 该指标名为 pg:ins:qps ,是一个 GAUGE。
INSERT INTO series VALUES -- 该指标包含有四个时间序列,通过维度标签区分
(1001, 1, '{"cls": "pg-meta", "ins": "pg-meta-1", "role": "primary", "other": "..."}'),
(1002, 1, '{"cls": "pg-test", "ins": "pg-test-1", "role": "primary", "other": "..."}'),
(1003, 1, '{"cls": "pg-test", "ins": "pg-test-2", "role": "replica", "other": "..."}'),
(1004, 1, '{"cls": "pg-test", "ins": "pg-test-3", "role": "replica", "other": "..."}');
INSERT INTO series_data VALUES -- 每个时间序列底层的采样点
(1001, now(), 1000), -- 实例 pg-meta-1 在当前时刻QPS为1000
(1002, now(), 1000), -- 实例 pg-test-1 在当前时刻QPS为1000
(1003, now(), 5000), -- 实例 pg-test-2 在当前时刻QPS为1000
(1004, now(), 5001); -- 实例 pg-test-3 在当前时刻QPS为5001
pg_up
是一个指标,包含有4个时间序列。记录了整个环境中所有实例的存活状态。
pg_up{ins": "pg-test-1", ...}
是一个时间序列,记录了特定实例pg-test-1
的存活状态
指标来源
Pigsty的监控数据主要有四种主要来源: 数据库,连接池,操作系统,负载均衡器。通过相应的exporter对外暴露。
完整来源包括:
- PostgreSQL本身的监控指标
- PostgreSQL日志中的统计指标
- PostgreSQL系统目录信息
- Pgbouncer连接池中间价的指标
- PgExporter指标
- 数据库工作节点Node的指标
- 负载均衡器Haproxy指标
- DCS(Consul)工作指标
- 监控系统自身工作指标:Grafana,Prometheus,Nginx
- Blackbox探活指标
关于全部可用的指标清单,请查阅 参考-指标清单 一节
指标数量
那么,Pigsty总共包含了多少指标呢? 这里是一副各个指标来源占比的饼图。我们可以看到,右侧蓝绿黄对应的部分是数据库及数据库相关组件所暴露的指标,而左下方红橙色部分则对应着机器节点相关指标。左上方紫色部分则是负载均衡器的相关指标。
数据库指标中,与postgres本身有关的原始指标约230个,与中间件有关的原始指标约50个,基于这些原始指标,Pigsty又通过层次聚合与预计算,精心设计出约350个与DB相关的衍生指标。
因此,对于每个数据库集群来说,单纯针对数据库及其附件的监控指标就有621个。而机器原始指标281个,衍生指标83个一共364个。加上负载均衡器的170个指标,我们总共有接近1200类指标。
注意,这里我们必须辨析一下指标(metric)与时间序列( Time-series)的区别。
这里我们使用的量词是 类 而不是个 。 因为一个指标可能对应多个时间序列。例如一个数据库中有20张表,那么 pg_table_index_scan
这样的指标就会对应有20个对应的时间序列。
截止至2021年,Pigsty的指标覆盖率在所有作者已知的开源/商业监控系统中一骑绝尘,详情请参考横向对比。
指标层次
Pigsty还会基于现有指标进行加工处理,产出 衍生指标(Derived Metrics) 。
例如指标可以按照不同的层次进行聚合
从原始监控时间序列数据,到最终的成品图表,中间还有着若干道加工工序。
这里以TPS指标的衍生流程为例。
原始数据是从Pgbouncer抓取得到的事务计数器,集群中有四个实例,而每个实例上又有两个数据库,所以一个实例总共有8个DB层次的TPS指标。
而下面的图表,则是整个集群内每个实例的QPS横向对比,因此在这里,我们使用预定义的规则,首先对原始事务计数器求导获取8个DB层面的TPS指标,然后将8个DB层次的时间序列聚合为4个实例层次的TPS指标,最后再将这四个实例级别的TPS指标聚合为集群层次的TPS指标。
Pigsty共定义了360类衍生聚合指标,后续还会不断增加。衍生指标定义规则详见 参考-衍生指标
特殊指标
目录(Catalog) 是一种特殊的指标
Catalog与Metrics比较相似但又不完全相同,边界比较模糊。最简单的例子,一个表的页面数量和元组数量,应该算Catalog还是算Metrics?
跳过这种概念游戏,实践上Catalog和Metrics主要的区别是,Catalog里的信息通常是不怎么变化的,比如表的定义之类的,如果也像Metrics这样比如几秒抓一次,显然是一种浪费。所以我们会将这一类偏静态的信息划归Catalog。
Catalog主要由定时任务(例如巡检)负责抓取,而不由Prometheus采集。一些特别重要的Catalog信息,例如pg_class
中的一些信息,也会转换为指标被Prometheus所采集。
小结
了解了Pigsty指标后,不妨了解一下Pigsty的 报警系统 是如何将这些指标数据用于实际生产用途的。
3.3.5 - 报警规则
介绍Pigsty附带的数据库报警规则,以及如何定制报警规则
报警对于日常故障响应,提高系统可用性至关重要。
漏报会导致可用性降低,误报会导致敏感性下降,有必要对报警规则进行审慎的设计。
- 合理定义报警级别,以及相应的处理流程
- 合理定义报警指标,去除重复报警项,补充缺失报警项
- 根据历史监控数据科学配置报警阈值,减少误报率。
- 合理疏理特例规则,消除维护工作,ETL,离线查询导致的误报。
报警分类学
按紧急程度分类
-
P0:FATAL:产生重大场外影响的事故,需要紧急介入处理。例如主库宕机,复制中断。(严重事故)
-
P1:ERROR:场外影响轻微,或有冗余处理的事故,需要在分钟级别内进行响应处理。(事故)
-
P2:WARNING:即将产生影响,放任可能在小时级别内恶化,需在小时级别进行响应。(关注事件)
-
P3:NOTICE:需要关注,不会有即时的影响,但需要在天级别内进行响应。(偏差现象)
按报警层次分类
- 系统级:操作系统,硬件资源的报警。DBA只会特别关注CPU与磁盘报警,其他由运维负责。
- 数据库级:数据库本身的报警,DBA重点关注。由PG,PGB,Exporter本身的监控指标产生。
- 应用级:应用报警由业务方自己负责,但DBA会为QPS,TPS,Rollback,Seasonality等业务指标设置报警
按指标类型分类
- 错误:PG Down, PGB Down, Exporter Down, 流复制中断,单集簇多主
- 流量:QPS,TPS,Rollback,Seasonaility
- 延迟: 平均响应时间,复制延迟
- 饱和度:连接堆积,闲事务数,CPU,磁盘,年龄(事务号),缓冲区
报警可视化
Pigsty使用条状图呈现报警信息。横轴代表时间段,一段色条代表报警事件。只有处于 激发(Firing) 状态的报警才会显示在报警图表中。
报警规则详解
报警规则按类型可粗略分为四类:错误,延迟,饱和度,流量。其中:
- 错误:主要关注各个组件的存活性(Aliveness),以及网络中断,脑裂等异常情况,级别通常较高(P0|P1)。
- 延迟:主要关注查询响应时间,复制延迟,慢查询,长事务。
- 饱和度:主要关注CPU,磁盘(这两个属于系统监控但对于DB非常重要所以纳入),连接池排队,数据库后端连接数,年龄(本质是可用事物号的饱和度),SSD寿命等。
- 流量:QPS,TPS,Rollback(流量通常与业务指标有关属于业务监控范畴,但因为对于DB很重要所以纳入),QPS的季节性,TPS的突增。
错误报警
Postgres实例宕机区分主从,主库宕机触发P0报警,从库宕机触发P1报警。两者都需要立即介入,但从库通常有多个实例,且可以降级到主库上查询,有着更高的处理余量,所以从库宕机定为P1。
# primary|master instance down for 1m triggers a P0 alert
- alert: PG_PRIMARY_DOWN
expr: pg_up{instance=~'.*master.*'}
for: 1m
labels:
team: DBA
urgency: P0
annotations:
summary: "P0 Postgres Primary Instance Down: {{$labels.instance}}"
description: "pg_up = {{ $value }} {{$labels.instance}}"
# standby|slave instance down for 1m triggers a P1 alert
- alert: PG_STANDBY_DOWN
expr: pg_up{instance!~'.*master.*'}
for: 1m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Postgres Standby Instance Down: {{$labels.instance}}"
description: "pg_up = {{ $value }} {{$labels.instance}}"
Pgbouncer实例因为与Postgres实例一一对应,其存活性报警规则与Postgres统一。
# primary pgbouncer down for 1m triggers a P0 alert
- alert: PGB_PRIMARY_DOWN
expr: pgbouncer_up{instance=~'.*master.*'}
for: 1m
labels:
team: DBA
urgency: P0
annotations:
summary: "P0 Pgbouncer Primary Instance Down: {{$labels.instance}}"
description: "pgbouncer_up = {{ $value }} {{$labels.instance}}"
# standby pgbouncer down for 1m triggers a P1 alert
- alert: PGB_STANDBY_DOWN
expr: pgbouncer_up{instance!~'.*master.*'}
for: 1m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Pgbouncer Standby Instance Down: {{$labels.instance}}"
description: "pgbouncer_up = {{ $value }} {{$labels.instance}}"
Prometheus Exporter的存活性定级为P1,虽然Exporter宕机本身并不影响数据库服务,但这通常预示着一些不好的情况,而且监控数据的缺失也会产生某些相应的报警。Exporter的存活性是通过Prometheus自己的up
指标检测的,需要注意某些单实例多DB的特例。
# exporter down for 1m triggers a P1 alert
- alert: PG_EXPORTER_DOWN
expr: up{port=~"(9185|9127)"} == 0
for: 1m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Exporter Down: {{$labels.instance}} {{$labels.port}}"
description: "port = {{$labels.port}}, {{$labels.instance}}"
所有存活性检测的持续时间阈值设定为1分钟,对15s的默认采集周期而言是四个样本点。常规的重启操作通常不会触发存活性报警。
延迟报警
与复制延迟有关的报警有三个:复制中断,复制延迟高,复制延迟异常,分别定级为P1, P2, P3
-
其中复制中断是一种错误,使用指标:pg_repl_state_count{state="streaming"}
进行判断,当前streaming
状态的从库如果数量发生负向变动,则触发break报警。walsender
会决定复制的状态,从库直接断开会产生此现象,缓冲区出现积压时会从streaming
进入catchup
状态也会触发此报警。此外,采用-Xs
手工制作备份结束时也会产生此报警,此报警会在10分钟后自动Resolve。复制中断会导致客户端读到陈旧的数据,具有一定的场外影响,定级为P1。
-
复制延迟可以使用延迟时间或者延迟字节数判定。以延迟字节数为权威指标。常规状态下,复制延迟时间在百毫秒量级,复制延迟字节在百KB量级均属于正常。目前采用的是5s,15s的时间报警阈值。根据历史经验数据,这里采用了时间8秒与字节32MB的阈值,大致报警频率为每天个位数个。延迟时间更符合直觉,所以采用8s的P2报警,但并不是所有的从库都能有效取到该指标所以使用32MB的字节阈值触发P3报警补漏。
-
特例:antispam,stats,coredb
均经常出现复制延迟。
# replication break for 1m triggers a P0 alert. auto-resolved after 10 minutes.
- alert: PG_REPLICATION_BREAK
expr: pg_repl_state_count{state="streaming"} - (pg_repl_state_count{state="streaming"} OFFSET 10m) < 0
for: 1m
labels:
team: DBA
urgency: P0
annotations:
summary: "P0 Postgres Streaming Replication Break: {{$labels.instance}}"
description: "delta = {{ $value }} {{$labels.instance}}"
# replication lag greater than 8 second for 3m triggers a P1 alert
- alert: PG_REPLICATION_LAG
expr: pg_repl_replay_lag{application_name="walreceiver"} > 8
for: 3m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Postgres Replication Lagged: {{$labels.instance}}"
description: "lag = {{ $value }} seconds, {{$labels.instance}}"
# replication diff greater than 32MB for 5m triggers a P3 alert
- alert: PG_REPLICATOIN_DIFF
expr: pg_repl_lsn{application_name="walreceiver"} - pg_repl_replay_lsn{application_name="walreceiver"} > 33554432
for: 5m
labels:
team: DBA
urgency: P3
annotations:
summary: "P3 Postgres Replication Diff Deviant: {{$labels.instance}}"
description: "delta = {{ $value }} {{$labels.instance}}"
饱和度报警
饱和度指标主要资源,包含很多系统级监控的指标。主要包括:CPU,磁盘(这两个属于系统监控但对于DB非常重要所以纳入),连接池排队,数据库后端连接数,年龄(本质是可用事物号的饱和度),SSD寿命等。
堆积检测
堆积主要包含两类指标,一方面是PG本身的后端连接数与活跃连接数,另一方面是连接池的排队情况。
PGB排队是决定性的指标,它代表用户端可感知的阻塞已经出现,因此,配置排队超过15持续1分钟触发P0报警。
# more than 8 client waiting in queue for 1 min triggers a P0 alert
- alert: PGB_QUEUING
expr: sum(pgbouncer_pool_waiting_clients{datname!="pgbouncer"}) by (instance,datname) > 8
for: 1m
labels:
team: DBA
urgency: P0
annotations:
summary: "P0 Pgbouncer {{ $value }} Clients Wait in Queue: {{$labels.instance}}"
description: "waiting clients = {{ $value }} {{$labels.instance}}"
后端连接数是一个重要的报警指标,如果后端连接持续达到最大连接数,往往也意味着雪崩。连接池的排队连接数也能反映这种情况,但不能覆盖应用直连数据库的情况。后端连接数的主要问题是它与连接池关系密切,连接池在短暂堵塞后会迅速打满后端连接,但堵塞恢复后这些连接必须在默认约10min的Timeout后才被释放。因此收到短暂堆积的影响较大。同时外晚上1点备份时也会出现这种情况,容易产生误报。
注意后端连接数与后端活跃连接数不同,目前报警使用的是活跃连接数。后端活跃连接数通常在0~1,一些慢库在十几左右,离线库可能会达到20~30。但后端连接/进程数(不管活跃不活跃),通常均值可达50。后端连接数更为直观准确。
对于后端连接数,这里使用两个等级的报警:超过90持续3分钟P1,以及超过80持续10分钟P2,考虑到通常数据库最大连接数为100。这样做可以以尽可能低的误报率检测到雪崩堆积。
# num of backend exceed 90 for 3m
- alert: PG_BACKEND_HIGH
expr: sum(pg_db_numbackends) by (node) > 90
for: 3m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Postgres Backend Number High: {{$labels.instance}}"
description: "numbackend = {{ $value }} {{$labels.instance}}"
# num of backend exceed 80 for 10m (avoid pgbouncer jam false alert)
- alert: PG_BACKEND_WARN
expr: sum(pg_db_numbackends) by (node) > 80
for: 10m
labels:
team: DBA
urgency: P2
annotations:
summary: "P2 Postgres Backend Number Warn: {{$labels.instance}}"
description: "numbackend = {{ $value }} {{$labels.instance}}"
空闲事务
目前监控使用IDEL In Xact的绝对数量作为报警条件,其实 Idle In Xact的最长持续时间可能会更有意义。因为这种现象其实已经被后端连接数覆盖了。长时间的空闲是我们真正关注的,因此这里使用所有空闲事务中最高的闲置时长作为报警指标。设置3分钟为P2报警阈值。经常出现IDLE的非Offline库有:moderation
, location
, stats
,sms
, device
, moderationdevice
# max idle xact duration exceed 3m
- alert: PG_IDLE_XACT
expr: pg_activity_max_duration{instance!~".*offline.*", state=~"^idle in transaction.*"} > 180
for: 3m
labels:
team: DBA
urgency: P2
annotations:
summary: "P2 Postgres Long Idle Transaction: {{$labels.instance}}"
description: "duration = {{ $value }} {{$labels.instance}}"
资源报警
CPU, 磁盘,AGE
默认清理年龄为2亿,超过10Y报P1,既留下了充分的余量,又不至于让人忽视。
# age wrap around (progress in half 10Y) triggers a P1 alert
- alert: PG_XID_WRAP
expr: pg_database_age{} > 1000000000
for: 3m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Postgres XID Wrap Around: {{$labels.instance}}"
description: "age = {{ $value }} {{$labels.instance}}"
磁盘和CPU由运维配置,不变
流量
因为各个业务的负载情况不一,为流量指标设置绝对值是相对困难的。这里只对TPS和Rollback设置绝对值指标。而且较为宽松。
Rollback OPS超过4则发出P3警告,TPS超过24000发P2,超过30000发P1
# more than 30k TPS lasts for 1m triggers a P1 (pgbouncer bottleneck)
- alert: PG_TPS_HIGH
expr: rate(pg_db_xact_total{}[1m]) > 30000
for: 1m
labels:
team: DBA
urgency: P1
annotations:
summary: "P1 Postgres TPS High: {{$labels.instance}} {{$labels.datname}}"
description: "TPS = {{ $value }} {{$labels.instance}}"
# more than 24k TPS lasts for 3m triggers a P2
- alert: PG_TPS_WARN
expr: rate(pg_db_xact_total{}[1m]) > 24000
for: 3m
labels:
team: DBA
urgency: P2
annotations:
summary: "P2 Postgres TPS Warning: {{$labels.instance}} {{$labels.datname}}"
description: "TPS = {{ $value }} {{$labels.instance}}"
# more than 4 rollback per seconds lasts for 5m
- alert: PG_ROLLBACK_WARN
expr: rate(pg_db_xact_rollback{}[1m]) > 4
for: 5m
labels:
team: DBA
urgency: P2
annotations:
summary: "P2 Postgres Rollback Warning: {{$labels.instance}}"
description: "rollback per sec = {{ $value }} {{$labels.instance}}"
QPS的指标与业务高度相关,因此不适合配置绝对值,可以为QPS突增配置一个报警项
短时间(和10分钟)前比突增30%会触发一个P2警报,同时避免小QPS下的突发流量,设置一个绝对阈值10k
# QPS > 10000 and have a 30% inc for 3m triggers P2 alert
- alert: PG_QPS_BURST
expr: sum by(datname,instance)(rate(pgbouncer_stat_total_query_count{datname!="pgbouncer"}[1m]))/sum by(datname,instance) (rate(pgbouncer_stat_total_query_count{datname!="pgbouncer"}[1m] offset 10m)) > 1.3 and sum by(datname,instance) (rate(pgbouncer_stat_total_query_count{datname!="pgbouncer"}[1m])) > 10000
for: 3m
labels:
team: DBA
urgency: P1
annotations:
summary: "P2 Pgbouncer QPS Burst 30% and exceed 10000: {{$labels.instance}}"
description: "qps = {{ $value }} {{$labels.instance}}"
Prometheus报警规则
完整的报警规则详见:参考-报警规则
3.4 - 供给方案
Pigsty供给方案的相关概念
所谓供给方案(Provisioning Solution),指的是一套向用户交付数据库服务与监控系统的系统。
供给方案不是数据库,而是数据库工厂:
用户向供给系统提交一份配置,供给系统便会按照用户所需的规格在环境中创建出所需的数据库集群来。
这比较类似于向Kubernetes提交YAML文件,创建所需的各类资源。
定义数据库集群
例如,以下配置信息声明了一套名为pg-test
的PostgreSQL数据库集群。
#-----------------------------
# cluster: pg-test
#-----------------------------
pg-test: # define cluster named 'pg-test'
# - cluster members - #
hosts:
10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}
# - cluster configs - #
vars:
# basic settings
pg_cluster: pg-test # define actual cluster name
pg_version: 13 # define installed pgsql version
node_tune: tiny # tune node into oltp|olap|crit|tiny mode
pg_conf: tiny.yml # tune pgsql into oltp/olap/crit/tiny mode
# business users, adjust on your own needs
pg_users:
- name: test # example production user have read-write access
password: test # example user's password
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly|dbrole_offline
pgbouncer: true # production user that access via pgbouncer
comment: default test user for production usage
pg_databases: # create a business database 'test'
- name: test # use the simplest form
pg_default_database: test # default database will be used as primary monitor target
# proxy settings
vip_mode: l2 # enable/disable vip (require members in same LAN)
vip_address: 10.10.10.3 # virtual ip address
vip_cidrmask: 8 # cidr network mask length
vip_interface: eth1 # interface to add virtual ip
当执行 数据库供给 脚本 ./pgsql.yml
时,供给系统会根据清单中的定义,在10.10.10.11
,10.10.10.12
,10.10.10.13
这三台机器上生成一主两从的PostgreSQL集群pg-test
。并创建名为test
的用户与数据库。同时,Pigsty还会根据要求,声明一个10.10.10.3
的VIP绑定在集群的主库上面。结构如下图所示。
定义基础设施
用户能够定义的不仅仅是数据库集群,还包括了整个基础设施。
Pigsty通过154个变量实现了对数据库运行时环境的完整表述。
详细的可配置项,请参考 配置指南
供给方案的职责
供给方案通常只负责集群的创建。一旦集群创建完毕,日常的管理应当由管控平台负责。
尽管如此,Pigsty目前不包含管控平台部分,因此也提供了简单的资源回收销毁脚本,并亦可用于资源的更新与管理。但须知此并非供给方案的本职工作。
3.4.1 - 数据库接入
如何接入Pigsty所创建的数据库?
Pigsty提供了丰富的接入方式,用户可以根据自己的基础设施情况与喜好自行选择接入模式。
数据库访问方式
用户可以通过多种方式访问数据库服务。
在集群层次,用户可以通过集群域名+服务端口的方式访问集群提供的 四种默认服务,Pigsty强烈建议使用这种方式。当然用户也可以绕开域名,直接使用集群的VIP(L2 or L4)访问数据库集群。
在实例层次,用户可以通过节点IP/域名 + 5432端口直连Postgres数据库,也可以用6432端口经由Pgbouncer访问数据库。还可以通过Haproxy经由5433~543x访问实例所属集群提供的服务。
如何访问数据库,最终取决于数据库所使用的流量接入方案。
典型接入方案
Pigsty推荐使用基于Haproxy的接入方案(1/2),在生产环境中如果有基础设施支持,也可以使用基于L4VIP(或与之等效的负载均衡服务)的接入方案(3)。
DNS + Haproxy
方案简介
标准高可用接入方案,系统无单点。灵活性,适用性,性能的最佳平衡点。
集群中的Haproxy采用Node Port的方式统一对外暴露 服务。每个Haproxy都是幂等的实例,提供完整的负载均衡与服务分发功能。Haproxy部署于每一个数据库节点上,因此整个集群的每一个成员在使用效果上都是幂等的。(例如访问任何一个成员的5433端口都会连接至主库连接池,访问任意成员的5434端口都会连接至某个从库的连接池)
Haproxy本身的可用性通过幂等副本实现,每一个Haproxy都可以作为访问入口,用户可以使用一个、两个、多个,所有Haproxy实例,每一个Haproxy提供的功能都是完全相同的。
用户需要自行确保应用能够访问到任意一个健康的Haproxy实例。作为最朴素的一种实现,用户可以将数据库集群的DNS域名解析至若干Haproxy实例,并启用DNS轮询响应。而客户端可以选择完全不缓存DNS,或者使用长连接并实现建立连接失败后重试的机制。又或者参考方案2,在架构侧通过额外的L2/L4 VIP确保Haproxy本身的高可用。
方案优越性
-
无单点,高可用
-
VIP固定绑定至主库,可以灵活访问
方案局限性
-
多一跳
-
Client IP地址丢失,部分HBA策略无法正常生效
-
Haproxy本身的高可用通过幂等副本,DNS轮询与客户端重连实现
DNS应有轮询机制,客户端应当使用长连接,并有建连失败重试机制。以便单Haproxy故障时可以自动漂移至集群中的其他Haproxy实例。如果无法做到这一点,可以考虑使用接入方案2,使用L2/L4 VIP确保Haproxy高可用。
方案示意
L2 VIP + Haproxy
方案简介
Pigsty沙箱使用的标准接入方案,采用单个域名绑定至单个L2 VIP,VIP指向集群中的HAProxy。
集群中的Haproxy采用Node Port的方式统一对外暴露 服务。每个Haproxy都是幂等的实例,提供完整的负载均衡与服务分发功能。而Haproxy本身的可用性则通过L2 VIP来保证。
每个集群都分配有一个L2 VIP,固定绑定至集群主库。当主库发生切换时,该L2 VIP也会随之漂移至新的主库上。这是通过vip-manager
实现的:vip-manager
会查询Consul获取集群当前主库信息,然后在主库上监听VIP地址。
集群的L2 VIP有与之对应的域名。域名固定解析至该L2 VIP,在生命周期中不发生变化。
方案优越性
-
无单点,高可用
-
VIP固定绑定至主库,可以灵活访问
方案局限性
方案示意
L4 VIP + Haproxy
方案简介
接入方案1/2的另一种变体,通过L4 VIP确保Haproxy的高可用
方案优越性
- 无单点,高可用
- 可以同时使用所有的Haproxy实例,均匀承载流量。
- 所有候选主库不需要位于同一二层网络。
- 可以操作单一VIP完成流量切换(如果同时使用了多个Haproxy,不需要逐个调整)
方案局限性
- 多两跳,较为浪费,如果有条件可以直接使用方案4: L4 VIP直接接入。
- Client IP地址丢失,部分HBA策略无法正常生效
方案示意
L4 VIP
方案简介
大规模高性能生产环境建议使用 L4 VIP接入(FullNAT,DPVS)
方案优越性
- 性能好,吞吐量大
- 可以通过
toa
模块获取正确的客户端IP地址,HBA可以完整生效。
方案局限性
- 仍然多一条。
- 需要依赖外部基础设施,部署复杂。
- 未启用
toa
内核模块时,仍然会丢失客户端IP地址。
- 没有Haproxy屏蔽主从差异,集群中的每个节点不再“幂等”。
方案示意
Consul DNS
方案简介
L2 VIP并非总是可用,特别是所有候选主库必须位于同一二层网络的要求可能不一定能满足。
在这种情况下,可以使用DNS解析代替L2 VIP,进行
方案优越性
方案局限性
- 依赖Consul DNS
- 用户需要合理配置DNS缓存策略
方案示意
Static DNS
方案简介
传统静态DNS接入方式
方案优越性
方案局限性
方案示意
IP
方案简介
采用智能客户端直连数据库IP接入
方案优越性
- 直连数据库/连接池,少一条
- 不依赖额外组件进行主从区分,降低系统复杂性。
方案局限性
方案示意
3.4.2 - 数据库服务
如何在Pigsty中定义新的服务
服务(Service),是数据库集群对外提供功能的形式。通常来说,一个数据库集群至少应当提供两种服务:
- 读写服务(primary) :用户可以写入数据库
- 只读服务(replica) :用户可以访问只读副本
此外,根据具体的业务场景,可能还会有其他的服务:
- 离线从库服务(offline):不承接线上只读流量的专用从库,用于ETL与个人用户查询。
- 同步从库服务(standby) :采用同步提交,没有复制延迟的只读服务。
- 延迟从库服务(delayed) : 允许业务访问固定时间间隔之前的旧数据。
- 默认直连服务(default) : 允许(管理)用户绕过连接池直接管理数据库的服务
默认服务
Pigsty默认对外提供四种服务:primary
, replica
, default
, offline
服务 |
端口 |
用途 |
说明 |
primary |
5433 |
生产读写 |
通过连接池连接至集群主库 |
replica |
5434 |
生产只读 |
通过连接池连接至集群从库 |
default |
5436 |
管理 |
直接连接至集群主库 |
offline |
5438 |
ETL/个人用户 |
直接连接至集群可用的离线实例 |
服务 |
端口 |
说明 |
样例 |
primary |
5433 |
只有生产用户可以连接 |
postgres://test@pg-test:5433/test |
replica |
5434 |
只有生产用户可以连接 |
postgres://test@pg-test:5434/test |
default |
5436 |
管理员与DML执行者可以连接 |
postgres://dbuser_admin@pg-test:5436/test |
offline |
5438 |
ETL/STATS 个人用户可以连接 |
postgres://dbuser_stats@pg-test-tt:5438/test postgres://dbp_vonng@pg-test:5438/test |
Primary服务
Primary服务服务于线上生产读写访问,它将集群的5433端口,映射为 主库连接池(默认6432) 端口。
Primary服务选择集群中的所有实例作为其成员,但只有健康检查/primary
为真者,才能实际承接流量。
在集群中有且仅有一个实例是主库,只有其健康检查为真。
- name: primary # service name {{ pg_cluster }}_primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
Replica服务
Replica服务服务于线上生产只读访问,它将集群的5434端口,映射为 从库连接池(默认6432) 端口。
Replica服务选择集群中的所有实例作为其成员,但只有健康检查/read-only
为真者,才能实际承接流量,该健康检查对所有可以承接只读流量的实例(包括主库)返回成功。所以集群中的任何成员都可以承载只读流量。
但默认情况下,只有从库承载只读请求,Replica服务定义了selector_backup
,该选择器将集群的主库作为 备份实例 加入到Replica服务中。只要当Replica服务中所有其他实例,即所有从库宕机时,主库才会开始承接只读流量。
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
Default服务
Default服务服务于线上主库直连,它将集群的5436端口,映射为主库Postgres端口(默认5432)。
Default服务针对交互式的读写访问,包括:执行管理命令,执行DDL变更,连接至主库执行DML,执行CDC。交互式的操作不应当通过连接池访问,因此Default服务将流量直接转发至Postgres,绕过了Pgbouncer。
Default服务与Primary服务类似,采用相同的配置选项。出于演示目显式填入了默认参数。
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
Offline服务
Offline服务用于离线访问与个人查询。它将集群的5438端口,映射为离线实例Postgres端口(默认5432)。
Offline服务针对交互式的只读访问,包括:ETL,离线大型分析查询,个人用户查询。交互式的操作不应当通过连接池访问,因此Default服务将流量直接转发至离线实例的Postgres,绕过了Pgbouncer。
离线实例指的是 pg_role == offline
或带有pg_offline_query
标记的实例。离线实例外的其他其他从库将作为Offline的备份实例,这样当Offline实例宕机时,Offline服务仍然可以从其他从库获取服务。
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
服务定义
由服务定义对象构成的数组,定义了每一个数据库集群中对外暴露的服务。每一个集群都可以定义多个服务,每个服务包含任意数量的集群成员,服务通过端口进行区分。
服务通过 pg_services
与 pg_services_extra
进行定义。前者用于定义整个环境中通用的服务,后者用于定义集群特定的额外服务。两者都是由服务定义组成的数组,Pigsty默认服务的定义如下所示:
# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary # service name {{ pg_cluster }}_primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
必选项目
-
名称(service.name
):
服务名称,服务的完整名称以数据库集群名为前缀,以service.name
为后缀,通过-
连接。例如在pg-test
集群中name=primary
的服务,其完整服务名称为pg-test-primary
。
-
端口(service.port
):
在Pigsty中,服务默认采用NodePort的形式对外暴露,因此暴露端口为必选项。但如果使用外部负载均衡服务接入方案,您也可以通过其他的方式区分服务。
-
选择器(service.selector
):
选择器指定了服务的实例成员,采用JMESPath的形式,从所有集群实例成员中筛选变量。默认的[]
选择器会选取所有的集群成员。
可选项目
-
备份选择器(service.selector
):
可选的 备份选择器service.selector_backup
会选择或标记用于服务备份的实例列表,即集群中所有其他成员失效时,备份实例才接管服务。例如可以将primary
实例加入replica
服务的备选集中,当所有从库失效后主库依然可以承载集群的只读流量。
-
源端IP(service.src_ip
) :
表示服务对外使用的IP地址,默认为*
,即本机所有IP地址。使用vip
则会使用vip_address
变量取值,或者也可以填入网卡支持的特定IP地址。
-
宿端口(service.dst_port
):
服务的流量将指向目标实例上的哪个端口?postgres
会指向数据库监听的端口,pgbouncer
会指向连接池所监听的端口,也可以填入固定的端口号。
-
健康检查方式(service.check_method
):
服务如何检查实例的健康状态?目前仅支持HTTP
-
健康检查端口(service.check_port
):
服务检查实例的哪个端口获取实例的健康状态? patroni
会从Patroni(默认8008)获取,pg_exporter
会从PG Exporter(默认9630)获取,用户也可以填入自定义的端口号。
-
健康检查路径(service.check_url
):
服务执行HTTP检查时,使用的URL PATH。默认会使用/
作为健康检查,PG Exporter与Patroni提供了多样的健康检查方式,可以用于主从流量区分。例如,/primary
仅会对主库返回成功,/replica
仅会对从库返回成功。/read-only
则会对任何支持只读的实例(包括主库)返回成功。
-
健康检查代码(service.check_code
):
HTTP健康检查所期待的代码,默认为200
-
Haproxy特定配置(service.haproxy
) :
关于服务供应软件(HAproxy)的专有配置项
3.4.3 - 高可用
介绍可用性的概念,以及Pigsty在高可用上的实践
Pigsty创建的数据库集群是分布式、高可用的数据库集群。
从效果上讲,只要集群中有任意实例存活,集群就可以对外提供完整的读写服务与只读服务。
数据库集群中的每个数据库实例在使用上都是幂等的,任意实例都可以通过内建负载均衡组件提供完整的读写服务。
数据库集群可以自动进行故障检测与主从切换,普通故障能在几秒到几十秒内自愈,且期间只读流量不受影响。
高可用
两个核心场景:Switchover,Failover
四个核心问题:故障检测,Fencing,选主,流量切换
关于高可用的核心场景演练,请参考 高可用演练 一节。
基于Patroni的高可用方案
基于 Patroni 的高可用方案部署简单,不需要使用特殊硬件,具有大量实际生产使用案例背书。
Pigsty的高可用方案基于Patroni,vip-manager,haproxy
Patroni基于DCS(etcd/consul/zookeeper)达成选主共识。
Patroni的故障检测采用心跳包保活,DCS租约机制实现。主库持有租约,秦失其鹿,则天下共逐之。
Patroni的Fencing基于Linux内核模块watchdog
。
Patroni提供了主从健康检查,便于与外部负载均衡器相集成。
基于Haproxy与VIP的接入层方案
Pigsty沙箱默认使用基于L2 VIP与Haproxy的接入层方案。Pigsty提供多种可选的 数据库接入 方式。
Haproxy幂等地部署在集群的每个实例上,任何一个或多个Haproxy实例都可以作为集群的负载均衡器。
Haproxy采用类似Node Port的方式对外暴露服务,默认情况下,5433端口提供集群的读写服务,而5434端口提供集群的只读服务。
Haproxy本身的高可用性可通过以下几种方式达成:
- 使用智能客户端,利用Consul提供的DNS或服务发现机制连接至数据库。
- 使用智能客户端,利用Multi-Host特性填入集群中的所有实例。
- 使用绑定在Haproxy前的VIP(2层或4层)
- 使用外部负载均衡器保证
- 使用DNS轮询解析至多个Haproxy,客户端会在建连失败后重新执行DNS解析并重试。
Patroni在故障时的行为表现
场景 |
位置 |
Patroni的动作 |
PG Down |
replica |
尝试重新拉起PG |
Patroni Down |
replica |
PG随之关闭(维护模式下不变) |
Patroni Crash |
replica |
PG不会随Patroni一并关闭 |
DCS Network Partition |
replica |
无事 |
Promote |
replica |
将PG降为从库并重新挂至主库。 |
PG Down |
primary |
尝试重启PG 超过master_start_timeout 后执行Failover |
Patroni Down |
primary |
关闭PG并触发Failover |
Patroni Crash |
primary |
触发Failover,可能触发脑裂。 可通过watchdog fencing避免。 |
DCS Network Partition |
primary |
主库降级为从库,触发Failover |
DCS Down |
DCS |
主库降级为从库,集群中没有主库,不可写入。 |
同步模式下无可用备选 |
|
临时切换为异步复制。 恢复为同步复制前不会Failover |
合理配置Patroni可以应对绝大多数故障。不过DCS Down这种场景(Consul/Etcd宕机或网络不可达)会导致所有生产数据库集群不可写入,需要特别关注。必须确保DCS的可用性高于数据库的可用性。
Known Issue
请尽量确保服务器的时间同步服务先于Patroni启动。
3.4.4 - 目录结构
介绍Pigsty默认设置的目录结构
以下参数与Pigsty目录结构相关
概览
#------------------------------------------------------------------------------
# Create Directory
#------------------------------------------------------------------------------
# this assumes that
# /pg is shortcut for postgres home
# {{ pg_fs_main }} contains the main data (MUST ALREADY MOUNTED)
# {{ pg_fs_bkup }} contains archive and backup data (MUST ALREADY MOUNTED)
# cluster-version is the default parent folder for pgdata (e.g pg-test-12)
#------------------------------------------------------------------------------
# default variable:
# pg_fs_main = /export fast ssd
# pg_fs_bkup = /var/backups cheap hdd
#
# /pg -> /export/postgres/pg-test-12
# /pg/data -> /export/postgres/pg-test-12/data
#------------------------------------------------------------------------------
- name: Create postgresql directories
tags: pg_dir
become: yes
block:
- name: Make sure main and backup dir exists
file: path={{ item }} state=directory owner=root mode=0777
with_items:
- "{{ pg_fs_main }}"
- "{{ pg_fs_bkup }}"
# pg_cluster_dir: "{{ pg_fs_main }}/postgres/{{ pg_cluster }}-{{ pg_version }}"
- name: Create postgres directory structure
file: path={{ item }} state=directory owner={{ pg_dbsu }} group=postgres mode=0700
with_items:
- "{{ pg_fs_main }}/postgres"
- "{{ pg_cluster_dir }}"
- "{{ pg_cluster_dir }}/bin"
- "{{ pg_cluster_dir }}/log"
- "{{ pg_cluster_dir }}/tmp"
- "{{ pg_cluster_dir }}/conf"
- "{{ pg_cluster_dir }}/data"
- "{{ pg_cluster_dir }}/meta"
- "{{ pg_cluster_dir }}/stat"
- "{{ pg_cluster_dir }}/change"
- "{{ pg_backup_dir }}/postgres"
- "{{ pg_backup_dir }}/arcwal"
- "{{ pg_backup_dir }}/backup"
- "{{ pg_backup_dir }}/remote"
PG二进制目录结构
在RedHat/CentOS上,默认的Postgres发行版安装位置为
/usr/pgsql-${pg_version}/
安装剧本会自动创建指向当前安装版本的软连接,例如,如果安装了13版本的Postgres,则有:
/usr/pgsql -> /usr/pgsql-13
因此,默认的pg_bin_dir
为/usr/pgsql/bin/
,该路径会在/etc/profile.d/pgsql.sh
中添加至所有用户的PATH
环境变量中。
PG数据目录结构
Pigsty假设用于部署数据库实例的单个节点上至少有一块主数据盘(pg_fs_main
),以及一块可选的备份数据盘(pg_fs_bkup
)。通常主数据盘是高性能SSD,而备份盘是大容量廉价HDD。
#------------------------------------------------------------------------------
# Create Directory
#------------------------------------------------------------------------------
# this assumes that
# /pg is shortcut for postgres home
# {{ pg_fs_main }} contains the main data (MUST ALREADY MOUNTED)
# {{ pg_fs_bkup }} contains archive and backup data (MAYBE ALREADY MOUNTED)
# {{ pg_cluster }}-{{ pg_version }} is the default parent folder
# for pgdata (e.g pg-test-12)
#------------------------------------------------------------------------------
# default variable:
# pg_fs_main = /export fast ssd
# pg_fs_bkup = /var/backups cheap hdd
#
# /pg -> /export/postgres/pg-test-12
# /pg/data -> /export/postgres/pg-test-12/data
PG数据库集簇目录结构
# basic
{{ pg_fs_main }} /export # contains all business data (pg,consul,etc..)
{{ pg_dir_main }} /export/postgres # contains postgres main data
{{ pg_cluster_dir }} /export/postgres/pg-test-13 # contains cluster `pg-test` data (of version 13)
/export/postgres/pg-test-13/bin # binary scripts
/export/postgres/pg-test-13/log # misc logs
/export/postgres/pg-test-13/tmp # tmp, sql files, records
/export/postgres/pg-test-13/conf # configurations
/export/postgres/pg-test-13/data # main data directory
/export/postgres/pg-test-13/meta # identity information
/export/postgres/pg-test-13/stat # stats information
/export/postgres/pg-test-13/change # changing records
{{ pg_fs_bkup }} /var/backups # contains all backup data (pg,consul,etc..)
{{ pg_dir_bkup }} /var/backups/postgres # contains postgres backup data
{{ pg_backup_dir }} /var/backups/postgres/pg-test-13 # contains cluster `pg-test` backup (of version 13)
/var/backups/postgres/pg-test-13/backup # base backup
/var/backups/postgres/pg-test-13/arcwal # WAL archive
/var/backups/postgres/pg-test-13/remote # mount NFS/S3 remote resources here
# links
/pg -> /export/postgres/pg-test-12 # pg root link
/pg/data -> /export/postgres/pg-test-12/data # real data dir
/pg/backup -> /var/backups/postgres/pg-test-13/backup # base backup
/pg/arcwal -> /var/backups/postgres/pg-test-13/arcwal # WAL archive
/pg/remote -> /var/backups/postgres/pg-test-13/remote # mount NFS/S3 remote resources here
Pgbouncer配置文件结构
Pgbouncer使用Postgres用户运行,配置文件位于/etc/pgbouncer
。配置文件包括:
pgbouncer.ini
,主配置文件
userlist.txt
:列出连接池中的用户
pgb_hba.conf
:列出连接池用户的访问权限
database.txt
:列出连接池中的数据库
3.4.5 - 访问控制
介绍Pigsty中的访问控制模型
PostgreSQL提供了两类访问控制机制:认证(Authentication) 与 权限(Privileges)
Pigsty带有基本的访问控制模型,足以覆盖绝大多数应用场景。
用户体系
Pigsty的默认权限系统包含四个默认用户与四类默认角色 。
用户可以通过修改 pg_default_roles
来修改默认用户的名字,但默认角色的名字不建议新用户自行修改。
默认角色
Pigsty带有四个默认角色:
- 只读角色(
dbrole_readonly
):只读
- 读写角色(
dbrole_readwrite
):读写,继承dbrole_readonly
- 管理角色(
dbrole_admin
):执行DDL变更,继承dbrole_readwrite
- 离线角色(
dbrole_offline
):只读,用于执行慢查询/ETL/交互查询,仅允许在特定实例上访问。
默认用户
Pigsty带有四个默认用户:
- 超级用户(
postgres
),数据库的拥有者与创建者,与操作系统用户一致
- 复制用户(
replicator
),用于主从复制的用户。
- 监控用户(
dbuser_monitor
),用于监控数据库指标的用户。
- 管理员(
dbuser_admin
),执行日常管理操作与数据库变更。(通常供DBA使用)
name |
attr |
roles |
desc |
dbrole_readonly |
Cannot login |
|
role for global readonly access |
dbrole_readwrite |
Cannot login |
dbrole_readonly |
role for global read-write access |
dbrole_offline |
Cannot login |
|
role for restricted read-only access (offline instance) |
dbrole_admin |
Cannot login Bypass RLS |
pg_monitor pg_signal_backend dbrole_readwrite |
role for object creation |
postgres |
Superuser Create role Create DB Replication Bypass RLS |
|
system superuser |
replicator |
Replication Bypass RLS |
pg_monitor dbrole_readonly |
system replicator |
dbuser_monitor |
16 connections |
pg_monitor dbrole_readonly |
system monitor user |
dbuser_admin |
Bypass RLS Superuser |
dbrole_admin |
system admin user |
相关配置
以下是8个默认用户/角色的相关变量
默认用户有专用的用户名与密码配置选项,会覆盖 pg_default_roles
中的选项。因此无需在其中为默认用户配置密码。
出于安全考虑,不建议为DBSU配置密码,故pg_dbsu
没有专门的密码配置项。如有需要,用户可以在pg_default_roles
中为超级用户指定密码。
# - system roles - #
pg_replication_username: replicator # system replication user
pg_replication_password: DBUser.Replicator # system replication password
pg_monitor_username: dbuser_monitor # system monitor user
pg_monitor_password: DBUser.Monitor # system monitor password
pg_admin_username: dbuser_admin # system admin user
pg_admin_password: DBUser.Admin # system admin password
# - default roles - #
# chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
pg_default_roles:
# common production readonly user
- name: dbrole_readonly # production read-only roles
login: false
comment: role for global readonly access
# common production read-write user
- name: dbrole_readwrite # production read-write roles
login: false
roles: [dbrole_readonly] # read-write includes read-only access
comment: role for global read-write access
# offline have same privileges as readonly, but with limited hba access on offline instance only
# for the purpose of running slow queries, interactive queries and perform ETL tasks
- name: dbrole_offline
login: false
comment: role for restricted read-only access (offline instance)
# admin have the privileges to issue DDL changes
- name: dbrole_admin
login: false
bypassrls: true
comment: role for object creation
roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]
# dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
- name: postgres
superuser: true
comment: system superuser
# default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
- name: replicator
replication: true
roles: [pg_monitor, dbrole_readonly]
comment: system replicator
# default replication user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
- name: dbuser_monitor
connlimit: 16
comment: system monitor user
roles: [pg_monitor, dbrole_readonly]
# default admin user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
- name: dbuser_admin
bypassrls: true
comment: system admin user
roles: [dbrole_admin]
# default stats user, for ETL and slow queries
- name: dbuser_stats
password: DBUser.Stats
comment: business offline user for offline queries and ETL
roles: [dbrole_offline]
Pgbouncer用户
Pgbouncer的操作系统用户将与数据库超级用户保持一致,默认都使用postgres
。
Pigsty默认会使用Postgres管理用户作为Pgbouncer的管理用户,使用Postgres的监控用户同时作为Pgbouncer的监控用户。
Pgbouncer的用户权限通过/etc/pgbouncer/pgb_hba.conf
进行控制。
Pgbounce的用户列表通过/etc/pgbouncer/userlist.txt
文件进行控制。
定义用户时,只有显式添加pgbouncer: true
的用户,才会被加入到Pgbouncer的用户列表中。
用户的定义
Pigsty中的用户可以通过以下两个参数进行声明,两者使用同样的形式:
用户的创建
Pigsty的用户可以通过 pgsql-createuser.yml
剧本完成创建
权限模型
默认情况下,角色拥有的权限如下所示:
GRANT USAGE ON SCHEMAS TO dbrole_readonly
GRANT SELECT ON TABLES TO dbrole_readonly
GRANT SELECT ON SEQUENCES TO dbrole_readonly
GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
GRANT USAGE ON SCHEMAS TO dbrole_offline
GRANT SELECT ON TABLES TO dbrole_offline
GRANT SELECT ON SEQUENCES TO dbrole_offline
GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
GRANT CREATE ON SCHEMAS TO dbrole_admin
GRANT USAGE ON TYPES TO dbrole_admin
其他业务用户默认都应当属于四种默认角色之一:只读,读写,管理员,离线访问。
Owner |
Schema |
Type |
Access privileges |
username |
|
function |
=X/postgres |
|
|
|
postgres=X/postgres |
|
|
|
dbrole_readonly=X/postgres |
|
|
|
dbrole_offline=X/postgres |
username |
|
schema |
postgres=UC/postgres |
|
|
|
dbrole_readonly=U/postgres |
|
|
|
dbrole_offline=U/postgres |
|
|
|
dbrole_admin=C/postgres |
username |
|
sequence |
postgres=rwU/postgres |
|
|
|
dbrole_readonly=r/postgres |
|
|
|
dbrole_readwrite=wU/postgres |
|
|
|
dbrole_offline=r/postgres |
username |
|
table |
postgres=arwdDxt/postgres |
|
|
|
dbrole_readonly=r/postgres |
|
|
|
dbrole_readwrite=awd/postgres |
|
|
|
dbrole_offline=r/postgres |
|
|
|
dbrole_admin=Dxt/postgres |
所有用户都可以访问所有模式,只读用户可以读取所有表,读写用户可以对所有表进行DML操作,管理员可以执行DDL变更操作。离线用户与只读用户类似,但只允许访问pg_role == 'offline'
或带有 pg_offline_query = true
的实例。
数据库权限
数据库有三种权限:CONNECT
, CREATE
, TEMP
,以及特殊的属主OWNERSHIP
。数据库的定义由参数 pg_database
控制。一个完整的数据库定义如下所示:
pg_databases:
- name: meta # name is the only required field for a database
owner: postgres # optional, database owner
template: template1 # optional, template1 by default
encoding: UTF8 # optional, UTF8 by default
locale: C # optional, C by default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
默认情况下,如果数据库没有配置属主,那么数据库超级用户dbsu
将会作为数据库的默认OWNER
,否则将为指定用户。
默认情况下,所有用户都具有对新创建数据库的CONNECT
权限,如果希望回收该权限,设置 revokeconn == true
,则该权限会被回收。只有默认用户(dbsu|admin|monitor|replicator)与数据库的属主才会被显式赋予CONNECT
权限。同时,admin|owner
将会具有CONNECT
权限的GRANT OPTION
,可以将CONNECT
权限转授他人。
如果希望实现不同数据库之间的访问隔离,可以为每一个数据库创建一个相应的业务用户作为owner
,并全部设置revokeconn
选项。这种配置对于多租户实例尤为实用。
创建新对象
默认情况下,出于安全考虑,Pigsty会撤销PUBLIC
用户在数据库下CREATE
新模式的权限,同时也会撤销PUBLIC
用户在public
模式下创建新关系的权限。数据库超级用户与管理员不受此限制,他们总是可以在任何地方执行DDL变更。
Pigsty非常不建议使用业务用户执行DDL变更,因为PostgreSQL的ALTER DEFAULT PRIVILEGE
仅针对“由特定用户创建的对象”生效,默认情况下超级用户postgres
和dbuser_admin
创建的对象拥有默认的权限配置,如果用户希望授予业务用户dbrole_admin
,请在使用该业务管理员执行DDL变更时首先执行:
SET ROLE dbrole_admin; -- dbrole_admin 创建的对象具有正确的默认权限
在数据库中创建对象的权限与用户是否为数据库属主无关,这只取决于创建该用户时是否为该用户赋予管理员权限。
pg_users:
- {name: test1, password: xxx , groups: [dbrole_readwrite]} # 不能创建Schema与对象
- {name: test2, password: xxx , groups: [dbrole_admin]} # 可以创建Schema与对象
认证模型
HBA是Host Based Authentication的缩写,可以将其视作IP黑白名单。
HBA配置方式
在Pigsty中,所有实例的HBA都由配置文件生成而来,最终生成的HBA规则取决于实例的角色(pg_role
)
Pigsty的HBA由下列变量控制:
pg_hba_rules
: 环境统一的HBA规则
pg_hba_rules_extra
: 特定于实例或集群的HBA规则
pgbouncer_hba_rules
: 链接池使用的HBA规则
pgbouncer_hba_rules_extra
: 特定于实例或集群的链接池HBA规则
每个变量都是由下列样式的规则组成的数组:
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
基于角色的HBA
role = common
的HBA规则组会安装到所有的实例上,而其他的取值,例如(role : primary
)则只会安装至pg_role = primary
的实例上。因此用户可以通过角色体系定义灵活的HBA规则。
作为一个特例,role: offline
的HBA规则,除了会安装至pg_role == 'offline'
的实例,也会安装至pg_offline_query == true
的实例上。
默认配置
在默认配置下,主库与从库会使用以下的HBA规则:
- 超级用户通过本地操作系统认证访问
- 其他用户可以从本地用密码访问
- 复制用户可以从局域网段通过密码访问
- 监控用户可以通过本地访问
- 所有人都可以在元节点上使用密码访问
- 管理员可以从局域网通过密码访问
- 所有人都可以从内网通过密码访问
- 读写用户(生产业务账号)可以通过本地(链接池)访问
(部分访问控制转交链接池处理)
- 在从库上:只读用户(个人)可以从本地(链接池)访问。
(意味主库上拒绝只读用户连接)
pg_role == 'offline'
或带有pg_offline_query == true
的实例上,会添加允许dbrole_offline
分组用户访问的HBA规则。
#==============================================================#
# Default HBA
#==============================================================#
# allow local su with ident"
local all postgres ident
local replication postgres ident
# allow local user password access
local all all md5
# allow local/intranet replication with password
local replication replicator md5
host replication replicator 127.0.0.1/32 md5
host all replicator 10.0.0.0/8 md5
host all replicator 172.16.0.0/12 md5
host all replicator 192.168.0.0/16 md5
host replication replicator 10.0.0.0/8 md5
host replication replicator 172.16.0.0/12 md5
host replication replicator 192.168.0.0/16 md5
# allow local role monitor with password
local all dbuser_monitor md5
host all dbuser_monitor 127.0.0.1/32 md5
#==============================================================#
# Extra HBA
#==============================================================#
# add extra hba rules here
#==============================================================#
# primary HBA
#==============================================================#
#==============================================================#
# special HBA for instance marked with 'pg_offline_query = true'
#==============================================================#
#==============================================================#
# Common HBA
#==============================================================#
# allow meta node password access
host all all 10.10.10.10/32 md5
# allow intranet admin password access
host all +dbrole_admin 10.0.0.0/8 md5
host all +dbrole_admin 172.16.0.0/12 md5
host all +dbrole_admin 192.168.0.0/16 md5
# allow intranet password access
host all all 10.0.0.0/8 md5
host all all 172.16.0.0/12 md5
host all all 192.168.0.0/16 md5
# allow local read/write (local production user via pgbouncer)
local all +dbrole_readonly md5
host all +dbrole_readonly 127.0.0.1/32 md5
#==============================================================#
# Ad Hoc HBA
#===========================================================
4 - 界面
了解Pigsty提供的图形化用户界面
Pigsty提供了专业且易用的PostgreSQL监控系统,浓缩了业界监控的最佳实践。
用户可以方便地进行修改与定制;复用监控基础设施,或与其他监控系统相集成。
注:加粗的面板是Pigsty默认提供的监控面板,其他则是专业版提供的额外特性。
默认监控已经足以覆盖绝大多数场景,如果您需要更加深入的掌控与洞察,请联系 专业支持
4.1.1 - Home
Home面板简介
Home Dashboard是Pigsty的默认主页,包含了到其他系统的导航连接。
您可以在这里发布公告,添加业务系统的导航,集成其他的监控面板等。
4.1.2 - PG Overview
PG Overview面板简介
PG Overview是总揽整个环境中所有数据库集群的地方。
这里提供了到所有数据库集群与数据库实例的快捷导航,并直观地呈现出整个环境的资源状态,异常事件,系统饱和度等等。
PG Overview的图表主要以集群为基本单位进行呈现,主要用于从全局视角快速定位异常集群。
长图
4.1.3 - PG Shard
PG Shard针对水平分片的并行集群而专门设计。
PG Shard针对水平分片的并行集群而专门设计。
水平分片是Pigsty专业版本提供的高级特性,可以将较大(TB到PB)的业务数据拆分为多个水平的业务集群对外提供服务。
PG Shard提供的指标与PG Overview类似,但会通过预定义的正则表达式筛选出所有同属于一个Shard的所有Cluster。
因此用户可以直观的比较不同分片之间的活动与负载,对于定位数据倾斜问题特别有帮助。
4.1.4 - PG Alert
PG Alert面板简介
PG Alert是总揽整个环境中所有报警信息的地方。包括所有与报警相关指标的快速面板。
4.1.5 - PG KPI
PG KPI 展示了环境中关键指标的概览
PG KPI 展示了环境中关键指标的概览,您可以在这里快速定位整个环境中的异常指标与异常实例。
4.1.6 - PG Capacity
PG Capacity 展示了数据库的水位状态
PG Capacity 展示了数据库的水位状态,这是Pigsty专业版提供的面板。
4.1.7 - PG Change
PG Change包含了整个环境中所发布的历史DDL变更。
PG Change包含了整个环境中所发布的历史DDL变更。
该面板必须与 Pigsty专业版特性: DDL发布系统 共同使用,在此不列出
4.1.8 - PG Monitor
PG Monitor面板简介
PG Monitor是监控系统的自我监控,包括Grafana,Prometheus,Consul,Nginx的监控。
自我监控属于Pigsty企业版特性。
4.2 - 集群监控
集群级别的监控面板
DB监控:PG集群
PG集群监控是最常用的Dashboard,因为PG以集群为单位提供服务,因此集群层面集合了最完整全面的信息。
大多数监控图都是实例级监控的泛化与上卷,即从展示单个实例内的细节,变为展现集群内每个实例的信息,以及集群和服务层次聚合后的指标。
集群概览
Cluster级别的集群概览相比实例级别多了一些东西:
- 时间线与领导权,当数据库发生Failover或Switchover时,时间线会步进,领导权会发生变化。
- 集群拓扑,集群拓扑展现了集群中的复制拓扑,以及采用的复制方式(同步/异步)。
- 集群负载,包括整个集群实时、1分钟、5分钟、15分钟的负载情况。以及集群中每个节点的Load1
- 集群报警与事件。
4.2.1 - PG Cluster
PG Cluster面板简介
PG Cluster 关注单个集群的整体情况,并提供到其他集群信息的导航。
DB监控:PG集群
PG集群监控是最常用的Dashboard,因为PG以集群为单位提供服务,因此Cluster集合了最完整全面的信息。
大多数监控图都是实例级监控的泛化与上卷,即从展示单个实例内的细节,变为展现集群内每个实例的信息,以及集群和服务层次聚合后的指标。
集群概览
Cluster级别的集群概览相比实例级别多了一些东西:
- 时间线与领导权,当数据库发生Failover或Switchover时,时间线会步进,领导权会发生变化。
- 集群拓扑,集群拓扑展现了集群中的复制拓扑,以及采用的复制方式(同步/异步)。
- 集群负载,包括整个集群实时、1分钟、5分钟、15分钟的负载情况。以及集群中每个节点的Load1
- 集群报警与事件。
集群复制
Cluster级别的Dashboard与Instance级别Dashboard最重要的区别之一就是提供了整个集群的复制全景。包括:
-
集群中的主库与级联桥接库。集群是否启用同步提交,同步从库名称。桥接库与级联库数量,最大从库配置
-
成对出现的Walsender与Walreceiver列表,体现一对主从关系的复制状态
-
以秒和字节衡量的复制延迟(通常1秒的复制延迟对应10M~100M不等的字节延迟),复制槽堆积量。
-
从库视角的复制延迟
-
集群中从库的数量,备份或拉取从库时可以从这里看到异常。
-
集群的LSN进度,用于整体展示集群的复制状态与持久化状态。
节点指标
PG机器的相关指标,按照集群进行聚合。
事务与查询
与实例级别的类似,但添加了Service层次的聚合(一个集群通常提供primary
与standby
两种Service)。
其他指标与实例级别差别不大。
4.2.2 - PG Cluster Replication
PG Cluster Replication 关注单个集群内的复制活动。
PG Cluster Replication 关注单个集群内的复制活动。
总览
4.2.3 - PG Cluster Activity
PG Cluster Activity 关注特定集群的活动状态,包括事务,查询,锁,等等。
PG Cluster Activity 关注单个集群的活动,包括事务,查询,锁,等等。
4.2.4 - PG Cluster Session
PG Cluster Session 关注特定集群中连接、连接池的工作状态。
PG Cluster Session 关注特定集群中连接、连接池的工作状态。
4.2.5 - PG Cluster Node
PG Cluster Node关注整个集群的机器资源使用情况
PG Cluster Node关注整个集群的机器资源使用情况
4.2.6 - PG Cluster Persist
PG Cluster Persist 关注集群的持久化,检查点与IO状态。
PG Cluster Persist 关注集群的持久化,检查点与IO状态。
4.2.7 - PG Cluster Database
PG Cluster Database 关注特定集群中与数据库有关的指标:TPS,增删改查,年龄等。
PG Cluster Activity 关注单个集群的活动,包括事务,查询,锁,等等。
4.2.8 - PG Cluster Stat
PG Cluster Stat 用于展示集群在过去一段统计周期内的用量信息
PG Cluster Stat 用于展示集群在过去一段统计周期内的用量信息
4.2.9 - PG Cluster Table
PG Cluster Table 关注单个集群中所有表的增删改查情况
PG Cluster Table 关注单个集群中所有表的增删改查情况
4.2.10 - PG Cluster Table Detail
PG Cluster Table Detail关注单个集群中某张特定表的增删改查情况
PG Cluster Table Detail关注单个集群中某张特定表的增删改查情况
您可以从该面板跳转到
- PG Cluster Table: 上卷至集群中的所有表
- PG Instance Table Detail:查看这张表在集群中的单个特定实例上的详细状态。
4.2.11 - PG Cluster Query
PG Cluster Query 关注特定集群内所有的查询状况
PG Cluster Query 关注特定集群内所有的查询状况
DB监控:PG慢查询平台
显示慢查询相关的指标,上方是本实例的查询总览。鼠标悬停查询ID可以看到查询语句,点击查询ID会跳转到对应的查询细分指标页(Query Detail)。
- 左侧是格式化后的查询语句,右侧是查询的主要指标,包括
- 每秒查询数量:QPS
- 实时的平均响应时间(RT Realtime)
- 每次查询平均返回的行数
- 每次查询平均用于BlockIO的时长
- 响应时间的均值,标准差,最小值,最大值(自从上一次统计周期以来)
- 查询最近一天的调用次数,返回行数,总耗时。以及自重置以来的总调用次数。
- 下方是指定时间段的查询指标图表,是概览指标的细化。
4.2.12 - PG Cluster Health
PG Cluster Health基于规则对集群进行健康度评分
PG Cluster Health基于规则对集群进行健康度评分。
4.2.13 - PG Cluster Log
PG Cluster Log面板简介
PG Cluster Log 关注单个集群内的所有日志事件。
该面板提供了到外部的基于Pgbadger的日志摘要平台的连接,这是一个专业版特性(也就是还没弄到开源版里)。
4.2.14 - PG Cluster All
PG Cluster All 包含了集群中所有的监控信息,用于细节对比与分析。
PG Cluster All 包含了集群中所有的监控信息,用于细节对比与分析。
4.3 - 服务监控
服务级别的监控面板
服务级监控
一个典型的数据库集群提供两种服务
读写服务:主库
只读服务:从库
而服务往往与域名、解析、负载均衡,路由,流量分发紧密相关
服务级监控主要关注以下内容
-
主从流量分发与权重
-
后端服务器健康检测
-
负载均衡器统计信息
4.3.1 - PG Service
PG Service关注数据库角色层次的聚合信息,DNS解析,域名,代理流量权重等。
PG Service 关注数据库对外暴露的服务
注意这里的监控指标只有当启用Haproxy作为 service provided时才可用。
旧PG Service Dashboard
旧PG Service Dashboard按照角色层次进行信息聚合,呈现DNS解析,域名,代理流量权重等。现在已经弃用。
4.3.2 - PG DNS
PG DNS 关注服务域名的解析情况
PG DNS 关注服务域名的解析情况。以及与之绑定的VIP
但是鉴于各个用户定义与管理服务的方式不一,Pigsty不在公开发行版本提供更多关于服务级别的监控面板
4.4 - 实例监控
实例级监控关注单个组件的实例
实例级监控
实例级监控关注于单个实例,无论是一台机器,一个数据库实例,一个连接池实例,还是负载均衡器,指标导出器,都可以在实例级监控中找到最详细的信息。
4.4.1 - PG Instance
PG Instance 详细展示了单个数据库实例的完整指标信息
PG Instance 详细展示了单个数据库实例的完整指标信息
DB监控:PG实例
实例概览
- 实例身份信息:集群名,ID,所属节点,软件版本,所属集群其他成员等
- 实例配置信息:一些关键配置,目录,端口,配置路径等
- 实例健康信息,实例角色(Primary,Standby)等。
- 黄金指标:PG Load,复制延迟,活跃后端,排队连接,查询延迟,TPS,数据库年龄
- 数据库负载:实时(Load0),1分钟,5分钟,15分钟
- 数据库警报与提醒事件
节点概览
- 四大基本资源:CPU,内存,磁盘,网卡的配置规格,关键功能,与核心指标
- 右侧是网卡详情与磁盘详情
单日统计
以最近1日为周期的统计信息(从当前时刻算起的前24小时),比如最近一天的查询总数,返回的记录总数等。上面两行是节点级别的统计,下面两行是主要是PG相关的统计指标。
对于计量计费,水位评估特别有用。
复制
- 当前节点的Replication配置
- 复制延迟:以秒计,以字节计的复制延迟,复制槽堆积量
- 下游节点对应的Walsender统计
- 各种LSN进度,综合展示集群的复制状况与持久化状态。
- 下游节点数量统计,可以看出复制中断的问题
事务
事务部分用于洞悉实例中的活动情况,包括TPS,响应时间,锁等。
-
TPS概览信息:TPS,TPS与过去两天的DoD环比。DB事务数与回滚数
-
回滚事务数量与回滚率
-
TPS详情:绿色条带为±1σ,黄色条带为±3σ,以过去30分钟作为计算标准,通常超出黄色条带可认为TPS波动过大
-
Xact RT,事务平均响应时间,从连接池抓取。绿色条带为±1σ,黄色条带为±3σ。
-
TPS与RT的偏离程度,是一个无量纲的可横向比较的值,越大表示指标抖动越厉害。$(μ/σ)^2$
-
按照DB细分的TPS与事务响应时间,通常一个实例只有一个DB,但少量实例有多个DB。
-
事务数,回滚数(TPS来自连接池,而这两个指标直接来自DB本身)
-
锁的数量,按模式聚合(8种表锁),按大类聚合(读锁,写锁,排他锁)
查询
大多数指标与事务中的指标类似,不过统计单位从事务变成了查询语句。查询部分可用于分析实例上的慢查询,定位性能瓶颈。
- QPS 每秒查询数,与Query RT查询平均响应时间,以及这两者的波动程度,QPS的周期环比等
- 生产环境对查询平均响应时间有要求:1ms为黄线,100ms为红线
语句
语句展示了查询中按语句细分的指标。每条语句(查询语法树抽离常量变量后如果一致,则算同一条查询)都会有一个查询ID,可以在慢查询平台中获取到具体的语句与详细指标与统计。
- 左侧慢查询列表是按
pg_stat_statments
中的平均响应时间从大到小排序的,点击查询ID会自动跳转到慢查询平台
- 这里列出的查询,是累计查询耗时最长的32个查询,但排除只有零星调用的长耗时单次查询与监控查询。
- 右侧包括了每个查询的实时QPS,平均响应时间。按照RT与总耗时的排名。
后端进程
后端进程用于显示与PG本身的连接,后端进程相关的统计指标。特别是按照各种维度进行聚合的结果,特别适合定位雪崩,慢查询,其他疑难杂症。
- 后端进程数按种类聚合,后端进程按状态聚合,后端进程按DB聚合,后端进程按等待事件类型聚合。
- 活跃状态的进程/连接,在事务中空闲的连接,长事务。
连接池
连接池部分与后端进程部分类似,但全都是从Pgbouncer中间件上获取的监控指标
- 连接池后端连接的状态:活跃,刚用过,空闲,测试过,登录状态。
- 分别按照User,按照DB,按照Pool(User:DB)聚合的前端连接,用于排查异常连接问题。
- 等待客户端数(重要),以及队首客户端等待的时长,用于定位连接堆积问题。
- 连接池可用连接使用比例。
数据库概览
Database部分主要来自pg_stat_database
与pg_database
,包含数据库相关的指标:
- WAL Rate,标识数据库的写入负载,每秒产生的WAL字节数量。
- Buffer Hit Rate,数据库 ShareBuffer 命中率,未命中的页面将从操作系统PageCache和磁盘获取。
- 每秒增删改查的记录条数
- 临时文件数量与临时文件大小,可以定位大型查询问题。
持久化
持久化主要包含数据落盘,Checkpoint,块访问相关的指标
- 重要的持久化参数,比如是否出现数据校验和验证失败(如果启用可以检测到数据腐坏)
- 数据库文件(DB,WAL,Log)的大小与增速。
- 检查点的数量与检查点耗时。
- 每秒分配的块,与每秒刷盘的块。每秒访问的块,以及每秒从磁盘中读取的块。(以字节计,注意一个Buffer Page是8192,一个Disk Block是4096)
监控Exporter
Exporter展示了监控系统组件本身的监控指标,包括:
- Exporter是否存活,Uptime,Exporter每分钟被抓取的次数
- 每个监控查询的耗时,产生的指标数量与错误数量。
4.4.2 - PG Instance Log
PG Instance Log展示单个数据库实例的日志信息
PG Instance 详细展示了单个数据库实例的完整指标信息。
Pigsty日志基于Loki 与 Promtail,是可选的额外模组。
您必须先在元节点上执行 infra-loki.yml
并在普通数据节点上执行 pgsql-promtail.yml
方能启用本功能。
用户可以从这里查阅 每个实例上 Postgres, Pgbouncer, Patroni的相关日志。
上方的三个图表显示的是当前时间段中的Log Rate,单位时间内的日志数量。
Search框中可以填入关键字搜索,右上角的Log Rate显示的是包含该关键字的Log Rate。
4.4.3 - Node
Node详细展示了单个机器节点的指标,该面板可用于任何安装有Node Exporter的节点
Node详细展示了单个机器节点的指标,该面板可用于任何安装有Node Exporter的节点
4.4.4 - PG Pgbouncer
PG Instance 详细展示了单个数据库实例的完整指标信息
PG Pgbouncer 详细展示了单个数据库连接池实例的完整指标信息
4.4.5 - PG Proxy
PG Proxy 详细展示了单个数据库代理 Haproxy 的状态信息
PG Proxy 详细展示了单个数据库代理 Haproxy 的状态信息
4.4.6 - PG Exporter
PG Exporter 详细展示了单个数据库实例的监控指标导出器本身的健康状态
PG Exporter 详细展示了单个数据库实例的监控指标导出器本身的健康状态
4.4.7 - PG Setting
PG Setting 详细展示了单个数据库实例的配置信息
PG Setting 详细展示了单个数据库实例的完整指标信息
4.4.8 - PG Stat Activity
PG Stat Activity 详细展示了单个数据库实例内的实时活动
PG Stat Activity 详细展示了单个数据库实例内的实时活动,注意这里的数据是从Catalog中实时获取,而非监控系统采集。
4.4.9 - PG Stat Statements
PG Stat Statements 详细展示了单个数据库实例内实时的查询状态统计
PG Stat Statements 详细展示了单个数据库实例内实时的查询状态统计
4.5 - 数据库监控
数据库级别的监控面板
数据库级监控
数据库级监控更像是“业务级”监控,它会展现出系统中每一张表,每一个索引,每一个函数的详细使用情况。
对于业务优化与故障分析而言有着巨大的作用。
但是当心监控信息也可能透露出关键的业务数据,例如对用户表的更新QPS可能反映出业务的日活数。请在生产环境中对Grafana做好权限控制,避免不必要的风险。
4.5.1 - PG Database
PG Database 关注单个数据库内发生的细节
PG Database 关注单个数据库内发生的详细情况,对于单实例多DB的情况尤其实用。
4.5.2 - PG Pool
PG Pool关注连接池中的单个连接池,即用户与数据库构成的二元组
PG Pool关注连接池中的单个User-DB对,当您使用多租户特性时,这个面板对于连接池问题的排查会很有帮助。
4.5.3 - PG Query
PG Query 关注单个数据库内发生的查询细节
PG Query 关注单个数据库内发生的整体查询细节
您可以用本面板定位出实例内的具体异常查询,然后跳转到PG Query Detail面板查看具体查询的详细信息
Query Overview
Database Statementes
Statemente RT
Statement Time Spend per Second
Statement RT Ranking
4.5.4 - PG Table Catalog
PG Catalog可以直接从数据库目录中获取并展示特定表的元数据
PG Catalog可以直接从数据库目录中获取并展示特定表的元数据
请注意,Catalog类型的信息是直接连接至数据库目录进行查询的,可能导致不必要的安全风险。
身份信息
基本指标
标识符
表特性
关键数值描述
持久化
访问权限
表选项
统计指标
垃圾清理
分析诊断
IO统计
字段详情
索引详情
关系大小
4.5.5 - PG Table
PG Table关注单个数据库中的所有表的增删改查等。
PG Table关注单个数据库中的所有表,增删改查,访问等。
您可以点击具体的表,跳转至PG Table Detail查阅这张表的详细指标。
4.5.6 - PG Table Detail
PG Table Detail关注单个数据库中的单张表
PG Table Detail关注单个数据库中的单张表
您可以在本面板中跳转至 PG Cluster Table Detail,来了解这张表在整个集群的不同实例上的工作状态。
4.5.7 - PG Query Detail
PG Query Detail关注单个数据库内发生的单个查询的细节
PG Query Detail关注单个数据库内发生的单个查询的细节。
请注意,这里的查询都使用QueryID进行标识。
您可以使用PG Stat Statementes面板提供的实时查询接口获取查询对应的语句。
直接在面板中展示SQL语句可能会导致不必要的安全风险,但该特性会在Pigsty专业版中提供。
5 - 部署
如何将Pigsty部署至生产环境
无论是沙箱环境还是实际生产环境,Pigsty都采用同样的三步走部署流程:准备资源,修改配置,执行剧本
Pigsty在部署前需要进行一些准备工作:配置带有正确权限配置的节点,下载安装相关软件。置备完成后,用户应当按照自己的需求修改配置。并执行剧本将系统调整至配置描述的状态。
如果用户希望使用Pigsty监控现有数据库集群,或只希望部署Pigsty监控系统部分,请参考 仅监控部署 。
5.1 - 准备资源
如何完成Pigsty资源准备工作
节点置备
在部署Pigsty前,用户需要准备机器节点资源,包括至少一个元节点,与任意数量的数据库节点。
数据库节点可以使用任意SSH可达节点:物理机、虚拟机、容器等,但目前Pigsty仅支持CentOS 7操作系统。
Pigsty推荐使用物理机与虚拟机进行部署。使用本地沙箱环境时,Pigsty基于Vagrant与Virtualbox快速拉起本地虚拟机资源,详情请参考 Vagrant教程。
元节点置备
Pigsty需要元节点作为整个环境的控制中心,并提供 基础设施 服务。元节点的数量要求最少1个,推荐3个,建议不超过5个。如果将DCS部署至元节点上,建议在生产环境使用3个元节点,以充分保证DCS服务的可用性。
用户应当确保自己可以登录元节点,并能从元节点上 免密码SSH登录 其他节点,并 免密码 执行sudo
命令。
用户应当确保自己可以直接或间接访问元节点的80端口,以访问Pigsty提供的用户界面。
软件置备
用户应当在元节点上 下载本项目,以及 离线软件包(可选)。
使用本地沙箱拉起Pigsty时,用户还需要在宿主机上额外安装:
5.1.1 - Vagrant
如何安装使用Vagrant
通常为了测试“数据库集群”这样的系统,用户需要事先准备若干台虚拟机。尽管云服务已经非常方便,但本地虚拟机访问通常比云虚拟机访问方便,响应迅速,成本低廉。本地虚拟机配置相对繁琐,Vagrant 可解决这一问题。
Pigsty用户无需了解vagrant的原理,只需要知道vagrant可以简单、快捷地按照用户的需求,在笔记本、PC或Mac上拉起若干台虚拟机。用户需要完成的工作,就是将自己的虚拟机需求,以vagrant配置文件的形式表达出来。
Vagrant安装
访问Vagrant官网
https://www.vagrantup.com/downloads
下载Vagrant
最新版本为2.2.14
安装Vagrant
点击 vagrant.pkg 执行安装,安装过程需要输入密码。https://www.virtualbox.org/
Vagrant配置文件
https://github.com/Vonng/pigsty/blob/master/vagrant/Vagrantfile 提供了一个Vagrantfile样例。
这是Pigsty沙箱所使用的Vagrantfile,定义了四台虚拟机,包括一台2核/4GB的中控机/元节点,和3台 1核/1GB 的数据库节点。
vagrant
二进制程序根据 Vagrantfile 中的定义,默认调用 Virtualbox 完成本地虚拟机的创建工作。
进入Pigsty根目录下的vagrant
目录,执行vagrant up
,即可拉起所有的四台虚拟机。
IMAGE_NAME = "centos/7"
N=3 # 数据库机器节点数量,可修改为0
Vagrant.configure("2") do |config|
config.vm.box = IMAGE_NAME
config.vm.box_check_update = false
config.ssh.insert_key = false
# 元节点
config.vm.define "meta", primary: true do |meta| # 元节点默认的ssh别名为`meta`
meta.vm.hostname = "meta"
meta.vm.network "private_network", ip: "10.10.10.10"
meta.vm.provider "virtualbox" do |v|
v.linked_clone = true
v.customize [
"modifyvm", :id,
"--memory", 4096, "--cpus", "2", # 元节点的内存与CPU核数:默认为2核/4GB
"--nictype1", "virtio", "--nictype2", "virtio",
"--hwv·irtex", "on", "--ioapic", "on", "--rtcuseutc", "on", "--vtxvpid", "on", "--largepages", "on"
]
end
meta.vm.provision "shell", path: "provision.sh"
end
# 初始化N个数据库节点
(1..N).each do |i|
config.vm.define "node-#{i}" do |node| # 数据库节点默认的ssh别名分别为`node-{1,2,3}`
node.vm.box = IMAGE_NAME
node.vm.network "private_network", ip: "10.10.10.#{i + 10}"
node.vm.hostname = "node-#{i}"
node.vm.provider "virtualbox" do |v|
v.linked_clone = true
v.customize [
"modifyvm", :id,
"--memory", 2048, "--cpus", "1", # 数据库节点的内存与CPU核数:默认为1核/2GB
"--nictype1", "virtio", "--nictype2", "virtio",
"--hwvirtex", "on", "--ioapic", "on", "--rtcuseutc", "on", "--vtxvpid", "on", "--largepages", "on"
]
end
node.vm.provision "shell", path: "provision.sh"
end
end
end
定制Vagrantfile
如果用户的机器配置不足,则可以考虑使用更小的N
值,减少数据库节点的数量。如果只希望运行单个元节点,将其修改为0即可。
用户还可以修改每台机器的CPU核数和内存资源等,如配置文件中的注释所述,详情参阅Vagrant与Pigsty文档。
沙箱环境默认使用IMAGE_NAME = "centos/7"
,首次执行时会从vagrant官方下载centos 7.8
virtualbox 镜像,确保宿主机拥有合适的网络访问权限(科学上网)!
快捷方式
Pigsty已经提供了对常用vagrant命令的包装,用户可以在项目的Makefile中看到虚拟机管理的相关命令:
make # 启动集群
make new # 销毁并创建新集群
make dns # 将Pigsty域名记录写入本机/etc/hosts (需要sudo权限)
make ssh # 将虚拟机SSH配置信息写入 ~/.ssh/config
make clean # 销毁现有本地集群
make cache # 制作离线安装包,并拷贝至宿主机本地,加速后续集群创建
make upload # 将离线安装缓存包 pkg.tgz 上传并解压至默认目录 /www/pigsty
更多信息,请参考Makefile
###############################################################
# vm management
###############################################################
clean:
cd vagrant && vagrant destroy -f --parallel; exit 0
up:
cd vagrant && vagrant up
halt:
cd vagrant && vagrant halt
down: halt
status:
cd vagrant && vagrant status
suspend:
cd vagrant && vagrant suspend
resume:
cd vagrant && vagrant resume
provision:
cd vagrant && vagrant provision
# sync ntp time
sync:
echo meta node-1 node-2 node-3 | xargs -n1 -P4 -I{} ssh {} 'sudo ntpdate pool.ntp.org'; true
# echo meta node-1 node-2 node-3 | xargs -n1 -P4 -I{} ssh {} 'sudo chronyc -a makestep'; true
# show vagrant cluster status
st: status
start: up ssh sync
stop: halt
# only init partial of cluster
meta-up:
cd vagrant && vagrant up meta
node-up:
cd vagrant && vagrant up node-1 node-2 node-3
node-new:
cd vagrant && vagrant destroy -f node-1 node-2 node-3
cd vagrant && vagrant up node-1 node-2 node-3
5.1.2 - Virtualbox
如何在MacOS上安装Virtualbox
在MacOS上安装Virtualbox非常简单,其他操作系统上与之类似。
前往Virtualbox官网
https://www.virtualbox.org/
下载Virtualbox
最新版本为6.1.18
安装Virtualbox
点击 VirtualBox.pkg 执行安装,安装过程需要输入密码并重启。
如果安装失败,请检查您的 系统偏好设置 - 安全性与隐私 - 通用 - 允许以下位置的App中点击“允许”按钮。
就这?
没错,您已经成功安装完Oracle Virtualbox了!
5.1.3 - Ansible
如何安装使用Vagrant
Ansible是一个流行的简单的自动化IT工具,广泛用于运维管理与软件部署。
Ansible是Pigsty剧本的执行载体,如果不需要定制本项目,用户并不需要了解太多Ansible的细节,将其看作一个高级的Shell或Python解释器即可。
如何安装
Ansible可以通过包管理器安装
brew install ansible # macos
yum install ansible # linux
检查安装的软件版本:
$ echo $(ansible --version)
ansible 2.10.3
建议使用2.9以上版本的Ansible,更低版本的Ansible可能遭遇兼容性问题。
如何使用
Pigsty项目根目录下提供了一系列Ansible剧本,在其开头的Hashbang中调用ansible-playbook
来执行自己。
#!/usr/bin/env ansible-playbook
因此,您通常不需要关心Ansible如何使用,安装完成后,直接使用下面的方式执行Ansible剧本即可。
离线安装Ansible
Pigsty依赖Ansible进行环境初始化。但如果元节点本身没有安装Ansible,也没有互联网访问怎么办?
离线安装包中本身带有 Ansible,可以直接通过本地文件Yum源的方式使用,假设用户已经将离线安装包解压至默认位置:/www/pigsty
。
那么将以下Repo文件写入/etc/yum.repos.d/pigsty-local.repo
中,就可以直接使用该源。
[pigsty-local]
name=Local Yum Repo pigsty
baseurl=file:///www/pigsty
skip_if_unavailable = 1
enabled = 1
priority = 1
gpgcheck = 0
执行以下命令,在元节点上离线安装Ansible :
yum clean all
yum makecache
yum install ansible
5.1.4 - 管理用户
如何配置SSH免密码登陆,以及免密码sudo
Pigsty需要一个管理用户,该用户能够从元节点上免密码SSH登陆其他节点,并免密码执行sudo
命令。
管理用户
Pigsty推荐将管理用户的创建,权限配置与密钥分发放在虚拟机的Provisioning阶段完成,作为交付内容的一部分。
沙箱环境的默认用户vagrant
默认已经配置有免密登陆和免密sudo,您可以从宿主机或沙箱元节点使用vagrant登陆所有的数据库节点。对于生产环境来说,即机器交付时,应当已经配置有这样一个具有免密远程SSH登陆并执行免密sudo的用户。
如果没有,则需要用户自行创建。如果用户拥有root权限,也可以用root身份直接执行初始化,Pigsty可以在初始化过程中完成管理用户的创建。相关配置参数包括:
是否在每个节点上创建管理员用户(免密sudo与ssh),默认会创建。
Pigsty默认会创建名为admin (uid=88)
的管理用户,可以从元节点上通过SSH免密访问环境中的其他节点并执行免密sudo。
管理员用户的uid
,默认为88
管理员用户的名称,默认为admin
是否在当前执行命令的机器之间相互交换管理员用户的SSH密钥?
默认会执行交换,这样管理员可以在机器间快速跳转。
写入到管理员~/.ssh/authorized_keys
中的密钥
持有对应私钥的用户可以以管理员身份登陆。
Pigsty默认会创建uid=88
的管理员用户admin
,并将该用户的密钥在集群范围内进行交换。
node_admin_pks 中给出的公钥会被安装至管理员账户的authorized_keys
中,持有对应私钥的用户可以直接远程免密登陆。
配置SSH免密访问
在元节点上,假设执行命令的用户名为vagrant
。
生成密钥
以vagrant
用户的身份执行以下命令,会为vagrant
生成公私钥对,用于登陆。
- 默认公钥:
~/.ssh/id_rsa.pub
- 默认私钥:
~/.ssh/id_rsa
安装密钥
将公钥添加至需要登陆机器的对应用户上:/home/vagrant/.ssh/authorized_keys
如果您已经可以直接通过密码访问远程机器,可以直接通过ssh-copy-id
的方式拷贝公钥。
# 输入密码以完成公钥拷贝
ssh-copy-id <ip>
# 直接将密码嵌入命令中,避免交互式密码输入
sshpass -p <password> ssh-copy-id <ip>
然后便可以通过该用户免密码SSH登陆远程机器。
配置免密SUDO
假设用户名为vagrant
,则通过visudo
命令,或创建/etc/sudoers.d/vagrant
文件添加以下记录:
%vagrant ALL=(ALL) NOPASSWD: ALL
则 vagrant 用户即可免密sudo
执行所有命令
5.1.5 - 软件置备
如何离线安装Pigsty
用户需要将Pigsty项目下载至元节点(在沙箱环境中,也可以使用宿主机发起控制)
下载Pigsty源码
用户可以使用 git 直接从 Github 克隆项目,或从 Github Release 页面下载最新版本的Pigsty源码包:
git clone https://github.com/Vonng/pigsty
git clone git@github.com:Vonng/pigsty.git
也可以从 Pigsty CDN 下载最新版本的Pigsty: pigsty.tar.gz
http://pigsty-1304147732.cos.accelerate.myqcloud.com/latest/pigsty.tar.gz
下载离线安装包
Pigsty自带了一个沙箱环境,沙箱环境的离线安装包默认放置于files
目录中,可以从Github Release页面下载。
cd <pigsty>/files/
wget https://github.com/Vonng/pigsty/releases/download/v0.6.0/pkg.tgz
Pigsty的官方CDN也提供最新版本的 pkg.tgz
下载,只需要执行以下命令即可。
make downlaod
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz -o files/pkg.tgz
离线安装包的具体使用方法,请参考 离线安装 一节。
仅监控模式资源
如果用户希望采用仅监控部署,通常建议使用拷贝监控组件二进制的方式部署监控Agent。因此需要预先将Linux Binary下载并放置于files
目录中。
files
^---- pg_exporter (linux amd64 binary)
^---- node_exporter (linux amd64 binary)
自带脚本 files/download-exporter.sh
会自动互联网上下载最新版本的 node_exporter
与 pg_exporter
5.1.6 - 离线安装
如何离线安装Pigsty
Pigsty是一个复杂的软件系统,为了确保系统的稳定,Pigsty会在初始化过程中从互联网下载所有依赖的软件包并建立本地仓库 (本地Yum源)。
所有依赖的软件总大小约1GB左右,下载速度取决于用户的网络情况。尽管Pigsty已经尽量使用镜像源以加速下载,但少量包的下载仍可能受到防火墙的阻挠,可能出现非常慢的情况。用户可以通过 proxy_env
配置项设置下载代理,以完成首次下载。
如果您使用了不同于CentOS 7.8的操作系统,通常建议用户采用完整的在线下载安装流程。并在首次初始化完成后缓存下载的软件,参见制作离线安装包。
如果您希望跳过漫长的下载过程,或者执行控制的元节点没有互联网访问,则可以考虑下载预先打包好的离线安装包。
离线安装包的内容
为了快速拉起Pigsty,建议使用离线下载软件包并上传的方式完成安装。
离线安装包收纳了本地Yum源的所有软件包。默认情况下,Pigsty会在基础设施初始化时创建本地Yum源,
{{ repo_home }}
|---- {{ repo_name }}.repo
^---- {{ repo_name}}/repo_complete
^---- {{ repo_name}}/**************.rpm
默认情况下,{{ repo_home }}
是Nginx静态文件服务器的根目录,默认为/www
,repo_name
是自定义的本地源名称,默认为pigsty
以默认情况为例,/www/pigsty
目录包含了所有 RPM 软件包,离线安装包实际上就是 /www/pigsty
目录的压缩包 。
离线安装包的原理是,Pigsty在执行基础设施初始化的过程中,会检查本地Yum源相关文件是否已经存在。如果已经存在,则会跳过下载软件包及其依赖的过程。
检测所用的标记文件为{{ repo_home }}/{{ repo_name }}/repo_complete
,默认情况下为/www/pigsty/repo_complete
,如果该标记文件存在,(通常是由Pigsty在创建本地源之后设置),则表示本地源已经建立完成,可以直接使用。否则,Pigsty会执行常规的下载逻辑。下载完毕后,您可以将该目录压缩复制归档,用于加速其他环境的初始化。
沙箱环境
下载离线安装包
Pigsty自带了一个沙箱环境,沙箱环境的离线安装包默认放置于files
目录中,可以从Github Release页面下载。
cd <pigsty>/files/
wget https://github.com/Vonng/pigsty/releases/download/v0.6.0/pkg.tgz
Pigsty的官方CDN也提供最新版本的pkg.tgz
下载,只需要执行以下命令即可。
make downlaod
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz -o files/pkg.tgz
上传离线安装包
使用Pigsty沙箱时,下载离线安装至本地files
目录后,则可以直接使用 Makefile 提供的快捷指令make upload
上传离线安装包至元节点上。
使用 make upload
,也会将本地的离线安装包(Yum缓存)拷贝至元节点上。
# upload rpm cache to meta controller
upload:
ssh -t meta "sudo rm -rf /tmp/pkg.tgz"
scp -r files/pkg.tgz meta:/tmp/pkg.tgz
ssh -t meta "sudo mkdir -p /www/pigsty/; sudo rm -rf /www/pigsty/*; sudo tar -xf /tmp/pkg.tgz --strip-component=1 -C /www/pigsty/"
制作离线安装包
使用 Pigsty 沙箱时,可以通过 make cache
将沙箱中元节点的缓存制为离线安装包,并拷贝到本地。
# cache rpm packages from meta controller
cache:
rm -rf pkg/* && mkdir -p pkg;
ssh -t meta "sudo tar -zcf /tmp/pkg.tgz -C /www pigsty; sudo chmod a+r /tmp/pkg.tgz"
scp -r meta:/tmp/pkg.tgz files/pkg.tgz
ssh -t meta "sudo rm -rf /tmp/pkg.tgz"
在生产环境离线安装包
在生产环境使用离线安装包前,您必须确保生产环境的操作系统与制作该离线安装包的机器操作系统一致。Pigsty提供的离线安装包默认使用CentOS 7.8。
使用不同操作系统版本的离线安装包可能会出错,也可能不会,我们强烈建议不要这么做。
如果需要在其他版本的操作系统(例如CentOS7.3,7.7等)上运行Pigsty,建议用户在安装有同版本操作系统的沙箱中完整执行一遍初始化流程,不使用离线安装包,而是直接从上游源下载的方式进行初始化。对于没有网络访问的生产环境元节点而言,制作离线软件包是至关重要的。
常规初始化完成后,用户可以通过make cache
或手工执行相关命令,将特定操作系统的软件缓存打为离线安装包。供生产环境使用。
从初始化完成的本地元节点构建离线安装包:
tar -zcf /tmp/pkg.tgz -C /www pigsty # 制作离线软件包
在生产环境使用离线安装包与沙箱环境类似,用户需要将pkg.tgz
复制到元节点上,然后将离线安装包解压至目标地址。
这里以默认的 /www/pigsty
为例,将压缩包中的所有内容(RPM包,repo_complete标记文件,repodata 源的元数据库等)解压至目标目录/www/pigsty
中,可以使用以下命令。
mkdir -p /www/pigsty/
sudo rm -rf /www/pigsty/*
sudo tar -xf /tmp/pkg.tgz --strip-component=1 -C /www/pigsty/
5.2 - 修改配置
如何根据环境修改Pigsty配置
用户可以通过下列 配置项,对基础设施与数据库集群进行配置。
通常而言,大多数参数可以直接使用默认值。
基础设施部分需要修改的内容很少,通常涉及到的唯一修改只是对元节点的IP地址进行文本替换。
相比之下,用户需要关注 数据库集群 的定义与配置。数据库集群会部署在数据库节点上,用户必须提供数据库集群的 身份信息与数据库节点的连接信息。身份信息 (如集群名,实例号)用于描述数据库集群中的实体,而连接信息 (如IP地址)则用于访问数据库节点。同时,用户应当在创建集群时,一并定义默认业务用户与业务数据库。
此外,用户也可以通过修改参数,定制默认的访问控制模型,模板数据库,对外暴露的服务。
数据库定制
在Pigsty中,数据库初始化分为五个部分:
安装什么版本,安装哪些插件,用什么用户
通常这一部分的参数不需要修改任何内容即可直接使用(当PG版本升级时需要进行调整)。
在哪创建目录,创建什么用途的集群,监听哪些IP端口,采用何种连接池模式
在这一部分中,身份信息 是必选参数,除此之外需要修改默认参数的地方很少。
通过 pg_conf
可以使用默认的数据库集群模板(普通事务型 OLTP/普通分析型 OLAP/核心金融型 CRIT/微型虚机 TINY)。如果希望创建自定义的模板,可以在roles/postgres/templates
中克隆默认配置并自行修改后采用,详见 Patroni模板定制。
创建哪些角色、用户、数据库、模式,启用哪些扩展,如何设置权限与白名单
需重点关注,因为这里是业务声明自己所需数据库的地方。用户可以通过数据库模板定制:
- 业务用户:(使用哪些用户访问数据库?属性,限制,角色,权限……)
- 业务数据库:(需要什么样的数据库?扩展,模式,参数,权限……)
- 默认模板数据库 (template1) (模式、扩展、默认权限)
- 访问控制系统(角色,用户,HBA)
- 暴露的服务 (使用哪些端口,将流量导向哪些实例,健康检测,权重……)
部署Pigsty监控系统组件
通常情况下不需要调整,但在 仅监控部署 模式下需要重点关注,进行调整。
通过HAproxy/VIP对外提供数据库服务
除非用户希望定义额外的服务,否则不需要调整这里的配置。
配置项参考
大多数参数都提供了合理的默认值,请参考配置项手册按需修改。
5.2.1 - 配置身份信息
如何配置数据库集群与节点的身份信息
Pigsty基于 身份标识(Identity) 管理数据库对象。
身份参数
身份参数是定义数据库集群时必须提供的信息,包括:
身份参数的内容遵循 Pigsty命名原则 。其中 pg_cluster
,pg_role
,pg_seq
属于核心身份参数,是定义数据库集群所需的最小必须参数集。核心身份参数必须显式指定,手工分配。
pg_cluster
标识了集群的名称,在集群层面进行配置,作为集群资源的顶层命名空间。
pg_role
在实例层面进行配置,标识了实例在集群中扮演的角色。可选值包括:
primary
:集群中的唯一主库,集群领导者,提供写入服务。
replica
:集群中的普通从库,承接常规生产只读流量。
offline
:集群中的离线从库,承接ETL/SAGA/个人用户/交互式/分析型查询。
standby
:集群中的同步从库,采用同步复制,没有复制延迟。
delayed
:集群中的延迟从库,显式指定复制延迟,用于执行回溯查询与数据抢救。
pg_seq
用于在集群内标识实例,通常采用从0或1开始递增的整数,一旦分配不再更改。
pg_shard
用于标识集群所属的上层 分片集簇,只有当集群是水平分片集簇的一员时需要设置。
pg_sindex
用于标识集群的分片集簇编号,只有当集群是水平分片集簇的一员时需要设置。
定义数据库集群
以下配置文件定义了一个名为pg-test
的集群。集群中包含三个实例:pg-test-1
, pg-test-2
,pg-test-3
,分别为主库,从库,离线库。该配置是一个集群定义所需的最小配置。
pg-test:
vars: { pg_cluster: pg-test }
hosts:
10.10.10.11: {pg_seq: 1, pg_role: primary}
10.10.10.12: {pg_seq: 2, pg_role: replica}
10.10.10.13: {pg_seq: 3, pg_role: offline}
pg_cluster
,pg_role
,pg_seq
属于 身份参数
除了IP地址外,这三个参数是定义一套新的数据库集群的最小必须参数集,如下面的配置所示。
其他参数都可以继承自全局配置或默认配置,但身份参数必须显式指定,手工分配。
pg_cluster
标识了集群的名称,在集群层面进行配置。
pg_role
在实例层面进行配置,标识了实例的角色,只有primary
角色会进行特殊处理,如果不填,默认为replica
角色,此外,还有特殊的delayed
与offline
角色。
pg_seq
用于在集群内标识实例,通常采用从0或1开始递增的整数,一旦分配不再更改。
{{ pg_cluster }}-{{ pg_seq }}
被用于唯一标识实例,即pg_instance
{{ pg_cluster }}-{{ pg_role }}
用于标识集群内的服务,即pg_service
定义水平分片数据库集簇
pg_shard
与 pg_sindex
用于定义特殊的分片数据库集簇,是可选的身份参数。
假设用户有一个水平分片的 分片数据库集簇(Shard) ,名称为test
。这个集簇由四个独立的集群组成:pg-test1
, pg-test2
,pg-test3
,pg-test-4
。则用户可以将 pg_shard: test
的身份绑定至每一个数据库集群,将pg_sindex: 1|2|3|4
分别绑定至每一个数据库集群上。如下所示:
pg-test1:
vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 1}
hosts: {10.10.10.10: {pg_seq: 1, pg_role: primary}}
pg-test2:
vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 2}
hosts: {10.10.10.11: {pg_seq: 1, pg_role: primary}}
pg-test3:
vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 3}
hosts: {10.10.10.12: {pg_seq: 1, pg_role: primary}}
pg-test4:
vars: {pg_cluster: pg-test1, pg_shard: test, pg_sindex: 4}
hosts: {10.10.10.13: {pg_seq: 1, pg_role: primary}}
数据库节点与数据库实例
数据库集群需要部署在数据库节点上,Pigsty使用数据库节点与数据库实例一一对应的部署模式。
数据库节点使用IP地址作为标识符,数据库实例使用形如pg-test-1
的标识符。 数据库节点(Node) 与 数据库实例(Instance) 的标识符可以相互对应,相互转换。
连接信息
如果说身份参数是数据库集群的标识,那么连接信息就是数据库节点的标识。
例如在 定义数据库集群 的例子中,数据库集群pg_cluster = pg-test
中 pg_seq = 1
的数据库实例(pg-test-1
)部署在IP地址为10.10.10.11
的数据库节点上。这里的IP地址10.10.10.11
就是连接信息。
Pigsty使用IP地址作为数据库节点的唯一标识,该IP地址必须是数据库实例监听并对外提供服务的IP地址。
这一点非常重要,即使您是通过跳板机或SSH代理访问该数据库节点,也应当在配置时保证这一点。
其他连接方式
如果您的目标机器藏在SSH跳板机之后,或者无法通过ssh ip
的方式直接方案,则可以考虑使用Ansible提供的连接参数。
例如下面的例子中,ansible_host
通过SSH别名的方式告知Pigsty通过ssh node-1
的方式而不是ssh 10.10.10.11
的方式访问目标数据库节点。
pg-test:
vars: { pg_cluster: pg-test }
hosts:
10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}
通过这种方式,用户可以自由指定数据库节点的连接方式,并将连接配置保存在管理用户的~/.ssh/config
中。
接下来
完成身份参数配置后,用户可以对数据库集群进行进一步定制。
5.2.2 - 定制业务用户
配置Pigsty中的业务用户
可以通过 pg_users
定制集群特定的业务用户。该配置项通常用于在数据库集群层面定义业务用户,与 pg_default_roles
采用相同的形式。
样例
一个完整的用户定义由一个JSON/YAML对象构成,如下所示:
# complete example of user/role definition for production user
- name: dbuser_meta # example production user have read-write access
password: DBUser.Meta # example user's password, can be encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
connlimit: -1 # connection limit, -1 disable limit
expire_at: '2030-12-31' # 'timestamp' when this role is expired
expire_in: 365 # now + n days when this role is expired (OVERWRITE expire_at)
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
parameters: # user's default search path
search_path: public
comment: test user
说明
一个用户对象由以下键值构成,只有用户名是必选项,其他参数均为可选,不添加相应键则会使用默认值。
-
name(string)
: 用户名称,必选项
-
password(string)
: 用户的密码,可以是以md5
, sha
开头的密文密码。
-
login(bool)
:用户是否可以登录,默认为真;如果这里是业务角色,应当将其设置为假。
-
superuser(bool)
: 用户是否具有超级用户权限,默认为假
-
createdb(bool)
: 用户是否具有创建数据库的权限,默认没有
-
createrole(bool)
: 用户是否具有创建新角色的权限,默认没有。
-
inherit(bool)
: 用户是否继承其角色的权限?默认继承
-
replication(bool)
: 用户是否具有复制权限?默认没有
-
bypassrls(bool)
: 用户是否可以绕过行级安全策略?默认不行
-
connlimit(number)
: 是否限制用户的连接数量?留空或-1不限,默认不限
-
expire_at(date)
: 用户过期时间,默认不过期
-
expire_in(number)
: 自创建n天后用户将过期,如果设置将覆盖expire_at
-
roles(string[])
: 用户所属的角色/用户组
-
pgbouncer(bool)
: 是否将用户加入连接池用户列表中?默认不加入,通过连接池访问的生产用户应当显式设置此项为真,交互式个人用户/ETL用户应当设置未假或留空。
-
parameters(dict)
: 针对用户修改配置参数,k-v结构
-
comment(string)
: 用户备注说明信息
Pigsty建议采用dbuser_
与 dbrole_
的前缀区分用户与角色,用户的login
选项应当设置为true
以允许登录,角色的login
选项应当设置为false
以拒绝登录。
pg_users
与 pg_default_roles
都是 user
对象构成的数组,两者会依照定义顺序依次创建,因此后创建的用户可以属于先前创建的角色。
实现
pg_default_roles
中的用户会渲染为集群主库上的单个SQL文件:
/pg/tmp/pg-init-roles.sql
pg_users
中的用户会渲染为集群主库上的SQL文件,每个用户一个:
/pg/tmp/pg-db-{{ database.name }}.sql
并依次执行。一个实际渲染的例子如下所示:
----------------------------------------------------------------------
-- File : pg-user-dbuser_meta.sql
-- Path : /pg/tmp/pg-user-dbuser_meta.sql
-- Time : 2021-03-22 22:52
-- Note : managed by ansible, DO NOT CHANGE
-- Desc : creation sql script for user dbuser_meta
----------------------------------------------------------------------
--==================================================================--
-- EXECUTION --
--==================================================================--
-- run as dbsu (postgres by default)
-- createuser -w -p 5432 'dbuser_meta';
-- psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_meta.sql
--==================================================================--
-- CREATE USER --
--==================================================================--
CREATE USER "dbuser_meta" ;
--==================================================================--
-- ALTER USER --
--==================================================================--
-- options
ALTER USER "dbuser_meta" ;
-- password
ALTER USER "dbuser_meta" PASSWORD 'DBUser.Meta';
-- expire
-- expire at 2022-03-22 in 365 days since 2021-03-22
ALTER USER "dbuser_meta" VALID UNTIL '2022-03-22';
-- conn limit
-- remove conn limit
-- ALTER USER "dbuser_meta" CONNECTION LIMIT -1;
-- parameters
ALTER USER "dbuser_meta" SET search_path = public;
-- comment
COMMENT ON ROLE "dbuser_meta" IS 'test user';
--==================================================================--
-- GRANT ROLE --
--==================================================================--
GRANT "dbrole_readwrite" TO "dbuser_meta";
--==================================================================--
-- PGBOUNCER USER --
--==================================================================--
-- user will not be added to pgbouncer user list by default,
-- unless pgbouncer is explicitly set to 'true', which means production user
-- User 'dbuser_meta' will be added to /etc/pgbouncer/userlist.txt via
-- /pg/bin/pgbouncer-create-user 'dbuser_meta' 'DBUser.Meta'
--==================================================================--
连接池
Pgbouncer有自己的用户定义文件,通常是PG用户的一个子集。
在Pigsty中,Pgbouncer的用户定义文件位于:/etc/pgbouncer/userlist.txt
$ cat userlist.txt
"postgres" ""
"dbuser_monitor" "md57bbcca538453edba8be026725c530b05"
只有在该文件中出现的用户,才可以通过PGbouncer访问数据库。
只有pgbouncer
选项显式配置为true
的用户,会被添加至连接池用户列表中。
修改该配置文件需要reload
Pgbouncer方可生效。
导出
以下SQL查询可以使用JSON格式导出数据库中的用户(但需要少量修正)
SELECT row_to_json(u) FROM
(SELECT r.rolname AS name,
a.rolpassword AS password,
r.rolcanlogin AS login,
r.rolsuper AS superuser,
r.rolcreatedb AS createdb,
r.rolcreaterole AS createrole,
r.rolinherit AS inherit,
r.rolreplication AS replication,
r.rolbypassrls AS bypassrls,
r.rolconnlimit AS connlimit,
r.rolvaliduntil AS expire_at,
setconfig AS parameters,
ARRAY(SELECT b.rolname FROM pg_catalog.pg_auth_members m JOIN pg_catalog.pg_roles b ON (m.roleid = b.oid) WHERE m.member = r.oid) as roles,
pg_catalog.shobj_description(r.oid, 'pg_authid') AS comment
FROM pg_catalog.pg_roles r
LEFT JOIN pg_db_role_setting rs ON r.oid = rs.setrole
LEFT JOIN pg_authid a ON r.oid = a.oid
WHERE r.rolname !~ '^pg_'
ORDER BY 1) u;
创建
请尽可能通过声明的方式创建业务用户与业务数据库,而不是在数据库中手工创建。因为业务用户与业务数据库需要同时在数据库与连接池中进行变更。详情请参考:创建业务用户
在运行中的数据库集群中创建新的业务用户,首先应在集群级配置中添加新用户的定义,例如在pg-test.vars.pg_users
加入新的用户对象。然后可以使用pgsql-createuser
剧本创建用户:
例如,在pg-test
集群中创建或修改名为dbuser_test
的用户,可以执行以下命令。
./pgsql-createuser.yml -l <pg_cluster> -e pg_user=dbuser_test
如果dbuser_test
的定义不存在,则会在检查阶段报错。
5.2.3 - 定制业务数据库
配置Pigsty中的业务数据库
可以通过 pg_databases
定制集群特定的业务数据库。
样例
一个完整的数据库定义由一个JSON/YAML对象构成,如下所示:
- name: meta # name is the only required field for a database
owner: postgres # optional, database owner
template: template1 # optional, template1 by default
encoding: UTF8 # optional, UTF8 by default , must same as template database, leave blank to set to db default
locale: C # optional, C by default , must same as template database, leave blank to set to db default
lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default
lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
schemas: [public,monitor] # create additional schema
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
说明
一个数据库对象由以下键值构成,只有数据库名是必选项,其他参数均为可选,不添加相应键则会使用默认值。
-
name(string)
: 数据库名称,必选项
-
owner(string)
:数据库的属主,必须为已存在的用户(用户先于数据库创建)。
-
template(string)
:创建数据库时所使用的模板,默认为template1
。
-
encoding(enum)
:数据库使用的字符集编码,默认为UTF8
,必须与实例和模板数据库保持一致。
-
locale(enum)
:数据库使用的本地化规则,默认与实例和模板数据库保持一致,建议不要修改。
-
lc_collate(enum)
:数据库使用的本地化字符串排序规则,默认为与实例和模板数据库保持一致,建议不要修改。
-
lc_ctype(enum)
:数据库使用的本地化规则,默认与实例和模板数据库保持一致,建议不要修改。
-
allowconn(bool)
:是否允许连接至数据库,默认允许。
-
revokeconn(bool)
:是否回收PUBLIC默认连接至数据库的权限?默认不回收,建议在多DB实例上开启。
-
tablespace(string)
:数据库的默认表空间,默认为pg_default
。
-
connlimit(number)
: 是否限制数据库的连接数量?留空或-1不限,默认不限
-
schemas(string[])
:需要在该数据库中额外创建的模式(默认会创建monitor
模式)
-
extensions(extension[])
:数据库中额外安装的扩展,每个扩展包括name
与schema
两个字段。
例如{name: postgis, schema: public}
指示Pigsty在该数据库的public模式下安装PostGIS扩展
-
pgbouncer(bool)
: 是否将数据库加入连接池DB列表中?默认加入
-
parameters(dict)
: 针对数据库额外修改配置参数,k-v结构
-
comment(string)
: 数据库备注说明信息
实现
pg_databases
是数据库定义对象构成的数组,会依次渲染为主库上的SQL文件:
/pg/tmp/pg-db-{{ database.name }}.sql
并依次执行。一个实际渲染的例子如下所示:
----------------------------------------------------------------------
-- File : pg-db-meta.sql
-- Path : /pg/tmp/pg-db-meta.sql
-- Time : 2021-03-22 22:52
-- Note : managed by ansible, DO NOT CHANGE
-- Desc : creation sql script for database meta
----------------------------------------------------------------------
--==================================================================--
-- EXECUTION --
--==================================================================--
-- run as dbsu (postgres by default)
-- createdb -w -p 5432 'meta';
-- psql meta -p 5432 -AXtwqf /pg/tmp/pg-db-meta.sql
--==================================================================--
-- CREATE DATABASE --
--==================================================================--
-- create database with following commands
-- CREATE DATABASE "meta" ;
-- following commands are executed within database "meta"
--==================================================================--
-- ALTER DATABASE --
--==================================================================--
-- owner
-- tablespace
-- allow connection
ALTER DATABASE "meta" ALLOW_CONNECTIONS True;
-- connection limit
ALTER DATABASE "meta" CONNECTION LIMIT -1;
-- parameters
ALTER DATABASE "meta" SET enable_partitionwise_join = True;
-- comment
COMMENT ON DATABASE "meta" IS 'pigsty meta database';
--==================================================================--
-- REVOKE/GRANT CONNECT --
--==================================================================--
--==================================================================--
-- REVOKE/GRANT CREATE --
--==================================================================--
-- revoke create (schema) privilege from public
REVOKE CREATE ON DATABASE "meta" FROM PUBLIC;
-- only admin role have create privilege
GRANT CREATE ON DATABASE "meta" TO "dbrole_admin";
-- revoke public schema creation
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
-- admin can create objects in public schema
GRANT CREATE ON SCHEMA public TO "dbrole_admin";
--==================================================================--
-- CREATE SCHEMAS --
--==================================================================--
-- create schemas
--==================================================================--
-- CREATE EXTENSIONS --
--==================================================================--
-- create extensions
CREATE EXTENSION IF NOT EXISTS "postgis" WITH SCHEMA "public";
--==================================================================--
-- PGBOUNCER DATABASE --
--==================================================================--
-- database will be added to pgbouncer database list by default,
-- unless pgbouncer is explicitly set to 'false', means hidden database
-- Database 'meta' will be added to /etc/pgbouncer/database.txt via
-- /pg/bin/pgbouncer-create-db 'meta'
--==================================================================--
连接池
Pgbouncer有自己的数据库定义文件,通常是PG数据库的一个子集。
在Pigsty中,Pgbouncer的数据库定义文件位于:/etc/pgbouncer/database.txt
$ cat database.txt
meta = host=/var/run/postgresql
只有在该文件中出现的数据库,才可以通过PGbouncer访问。pgbouncer
选项显式配置为false
的数据库不会被添加至连接池DB列表中。修改该配置文件需要reload
Pgbouncer方可生效。
导出
以下SQL查询可以以JSON格式导出当前数据库的定义(需少量修正)
psql -AXtw <<-EOF
SELECT jsonb_pretty(row_to_json(final)::JSONB)
FROM (SELECT datname AS name,
datdba::RegRole::Text AS owner,
encoding,
datcollate AS lc_collate,
datctype AS lc_ctype,
datallowconn AS allowconn,
datconnlimit AS connlimit,
(SELECT json_agg(nspname) AS schemas FROM pg_namespace WHERE nspname !~ '^pg_' AND nspname NOT IN ('information_schema', 'monitor', 'repack')),
(SELECT json_agg(row_to_json(ex)) AS extensions FROM (SELECT extname, extnamespace::RegNamespace AS schema FROM pg_extension WHERE extnamespace::RegNamespace::TEXT NOT IN ('information_schema', 'monitor', 'repack', 'pg_catalog')) ex),
(SELECT json_object_agg(substring(cfg, 0 , strpos(cfg, '=')), substring(cfg, strpos(cfg, '=')+1)) AS value FROM
(SELECT unnest(setconfig) AS cfg FROM pg_db_role_setting s JOIN pg_database d ON d.oid = s.setdatabase WHERE d.datname = current_database()) cf
)
FROM pg_database WHERE datname = current_database()
) final;
EOF
创建
请尽可能通过声明的方式创建业务数据库,而不是在数据库中手工创建。因为业务用户与业务数据库需要同时在数据库与连接池中进行变更。
在运行中的数据库集群中创建新的业务数据库,首先应当在集群级配置中添加新数据库的定义,例如在pg-test.vars.pg_databases
加入新的数据库对象。然后可以使用pgsql-createdb
剧本创建数据库:
例如,在pg-test
集群中创建或修改名为test
的数据库,可以执行以下命令。
./pgsql-createdb.yml -l <pg_cluster> -e pg_database=test
如果数据库test
的定义不存在,则会在检查阶段报错。
5.2.4 - 定制模板数据库
定制Pigsty中的模板数据库
相关参数
用户可以使用 PG模板 配置项,对集群中的模板数据库 template1
进行定制。
通过这种方式确保任何在该数据库集群中新创建的数据库都带有相同的默认配置:模式,扩展,默认权限。
^---/pg/bin/pg-init
|
^---(1)--- /pg/tmp/pg-init-roles.sql
^---(2)--- /pg/tmp/pg-init-template.sql
^---(3)--- <other customize logic in pg-init>
# 业务用户与数据库并不是在模版定制中创建的
^-------------(4)--- /pg/tmp/pg-user-{{ user.name }}.sql
^-------------(5)--- /pg/tmp/pg-db-{{ db.name }}.sql
pg-init
是用于自定义初始化模板的Shell脚本路径,该脚本将以postgres用户身份,仅在主库上执行,执行时数据库集群主库已经被拉起,可以执行任意Shell命令,或通过psql执行任意SQL命令。
如果不指定该配置项,Pigsty会使用默认的pg-init
Shell脚本,如下所示。
#!/usr/bin/env bash
set -uo pipefail
#==================================================================#
# Default Roles #
#==================================================================#
psql postgres -qAXwtf /pg/tmp/pg-init-roles.sql
#==================================================================#
# System Template #
#==================================================================#
# system default template
psql template1 -qAXwtf /pg/tmp/pg-init-template.sql
# make postgres same as templated database (optional)
psql postgres -qAXwtf /pg/tmp/pg-init-template.sql
#==================================================================#
# Customize Logic #
#==================================================================#
# add your template logic here
如果用户需要执行复杂的定制逻辑,可在该脚本的基础上进行追加。注意pg-init
用于定制数据库集群,通常这是通过修改 模板数据库 实现的。在该脚本执行时,数据库集群已经启动,但业务用户与业务数据库尚未创建。因此模板数据库的修改会反映在默认定义的业务数据库中。
pg-init-roles.sql
在 pg_default_roles
中可以自定义全局统一的角色体系。其中的定义会被渲染为/pg/tmp/pg-init-roles.sql
,pg-test
集群中的渲染样例如下所示:
```sql
----------------------------------------------------------------------
-- File : pg-init-roles.sql
-- Path : /pg/tmp/pg-init-roles
-- Time : 2021-03-16 21:24
-- Note : managed by ansible, DO NOT CHANGE
-- Desc : creation sql script for default roles
----------------------------------------------------------------------
–###################################################################–
– dbrole_readonly –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –no-login’dbrole_readonly';
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_readonly.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbrole_readonly” NOLOGIN;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbrole_readonly” NOLOGIN;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “dbrole_readonly” IS ‘role for global readonly access’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbrole_readonly’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– dbrole_readwrite –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –no-login’dbrole_readwrite';
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_readwrite.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbrole_readwrite” NOLOGIN;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbrole_readwrite” NOLOGIN;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “dbrole_readwrite” IS ‘role for global read-write access’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
GRANT “dbrole_readonly” TO “dbrole_readwrite”;
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbrole_readwrite’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– dbrole_offline –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –no-login’dbrole_offline';
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_offline.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbrole_offline” NOLOGIN;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbrole_offline” NOLOGIN;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “dbrole_offline” IS ‘role for restricted read-only access (offline instance)';
–==================================================================–
– GRANT ROLE –
–==================================================================–
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbrole_offline’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– dbrole_admin –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –no-login’dbrole_admin’;
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbrole_admin.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbrole_admin” NOLOGIN BYPASSRLS;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbrole_admin” NOLOGIN BYPASSRLS;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “dbrole_admin” IS ‘role for object creation’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
GRANT “dbrole_readwrite” TO “dbrole_admin”;
GRANT “pg_monitor” TO “dbrole_admin”;
GRANT “pg_signal_backend” TO “dbrole_admin”;
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbrole_admin’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– postgres –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –superuser’postgres';
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-postgres.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “postgres” SUPERUSER;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “postgres” SUPERUSER;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “postgres” IS ‘system superuser’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘postgres’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– replicator –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –replication’replicator';
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-replicator.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “replicator” REPLICATION BYPASSRLS;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “replicator” REPLICATION BYPASSRLS;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “replicator” IS ‘system replicator’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
GRANT “pg_monitor” TO “replicator”;
GRANT “dbrole_readonly” TO “replicator”;
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘replicator’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– dbuser_monitor –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 ‘dbuser_monitor’;
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_monitor.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbuser_monitor” ;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbuser_monitor” ;
– password
– expire
– conn limit
ALTER USER “dbuser_monitor” CONNECTION LIMIT 16;
– parameters
– comment
COMMENT ON ROLE “dbuser_monitor” IS ‘system monitor user’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
GRANT “pg_monitor” TO “dbuser_monitor”;
GRANT “dbrole_readonly” TO “dbuser_monitor”;
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbuser_monitor’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– dbuser_admin –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 –superuser’dbuser_admin';
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_admin.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbuser_admin” SUPERUSER BYPASSRLS;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbuser_admin” SUPERUSER BYPASSRLS;
– password
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “dbuser_admin” IS ‘system admin user’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
GRANT “dbrole_admin” TO “dbuser_admin”;
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbuser_admin’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–###################################################################–
– dbuser_stats –
–###################################################################–
– run as dbsu (postgres by default)
– createuser -w -p 5432 ‘dbuser_stats’;
– psql -p 5432 -AXtwqf /pg/tmp/pg-user-dbuser_stats.sql
–==================================================================–
– CREATE USER –
–==================================================================–
CREATE USER “dbuser_stats” ;
–==================================================================–
– ALTER USER –
–==================================================================–
– options
ALTER USER “dbuser_stats” ;
– password
ALTER USER “dbuser_stats” PASSWORD ‘DBUser.Stats’;
– expire
– conn limit
– parameters
– comment
COMMENT ON ROLE “dbuser_stats” IS ‘business offline user for offline queries and ETL’;
–==================================================================–
– GRANT ROLE –
–==================================================================–
GRANT “dbrole_offline” TO “dbuser_stats”;
–==================================================================–
– PGBOUNCER USER –
–==================================================================–
– user will not be added to pgbouncer user list by default,
– unless pgbouncer is explicitly set to ‘true’, which means production user
– User ‘dbuser_stats’ will NOT be added to /etc/pgbouncer/userlist.txt
–==================================================================–
–==================================================================–
– PASSWORD OVERWRITE –
–==================================================================–
ALTER ROLE “replicator” PASSWORD ‘DBUser.Replicator’;
ALTER ROLE “dbuser_monitor” PASSWORD ‘DBUser.Monitor’;
ALTER ROLE “dbuser_admin” PASSWORD ‘DBUser.Admin’;
–==================================================================–
</details>
## pg-init-template.sql
[`pg-init-template.sql`](https://github.com/Vonng/pigsty/blob/master/roles/postgres/templates/pg-init-template.sql) 是用于初始化 `template1` 数据的脚本模板。PG模板中的变量,大抵都是通过该SQL模板渲染为最终执行的SQL命令。该模板会被渲染至集群主库的`/pg/tmp/pg-init-template.sql`并执行。
Pigsty强烈建议通过提供自定义的`pg-init`脚本完成复杂的定制。如无必要,尽量不要改动`pg-init-template.sql`中的原有逻辑。
```sql
--==================================================================--
-- Executions --
--==================================================================--
-- psql template1 -AXtwqf /pg/tmp/pg-init-template.sql
-- this sql scripts is responsible for post-init procedure
-- it will
-- * create system users such as replicator, monitor user, admin user
-- * create system default roles
-- * create schema, extensions in template1 & postgres
-- * create monitor views in template1 & postgres
--==================================================================--
-- Default Privileges --
--==================================================================--
{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE {{ pg_dbsu }} {{ priv }};
{% endfor %}
{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE {{ pg_admin_username }} {{ priv }};
{% endfor %}
-- for additional business admin, they can SET ROLE to dbrole_admin
{% for priv in pg_default_privileges %}
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" {{ priv }};
{% endfor %}
--==================================================================--
-- Schemas --
--==================================================================--
{% for schema_name in pg_default_schemas %}
CREATE SCHEMA IF NOT EXISTS "{{ schema_name }}";
{% endfor %}
-- revoke public creation
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
--==================================================================--
-- Extensions --
--==================================================================--
{% for extension in pg_default_extensions %}
CREATE EXTENSION IF NOT EXISTS "{{ extension.name }}"{% if 'schema' in extension %} WITH SCHEMA "{{ extension.schema }}"{% endif %};
{% endfor %}
默认的模板初始化逻辑还会创建监控模式,扩展与相关视图。
```sql
--==================================================================--
-- Monitor Views --
--==================================================================--
– cleanse
CREATE SCHEMA IF NOT EXISTS monitor;
GRANT USAGE ON SCHEMA monitor TO “{{ pg_monitor_username }}";
GRANT USAGE ON SCHEMA monitor TO “{{ pg_admin_username }}";
GRANT USAGE ON SCHEMA monitor TO “{{ pg_replication_username }}";
DROP VIEW IF EXISTS monitor.pg_table_bloat_human;
DROP VIEW IF EXISTS monitor.pg_index_bloat_human;
DROP VIEW IF EXISTS monitor.pg_table_bloat;
DROP VIEW IF EXISTS monitor.pg_index_bloat;
DROP VIEW IF EXISTS monitor.pg_session;
DROP VIEW IF EXISTS monitor.pg_kill;
DROP VIEW IF EXISTS monitor.pg_cancel;
DROP VIEW IF EXISTS monitor.pg_seq_scan;
– Table bloat estimate
CREATE OR REPLACE VIEW monitor.pg_table_bloat AS
SELECT CURRENT_CATALOG AS datname, nspname, relname , bs * tblpages AS size,
CASE WHEN tblpages - est_tblpages_ff > 0 THEN (tblpages - est_tblpages_ff)/tblpages::FLOAT ELSE 0 END AS ratio
FROM (
SELECT ceil( reltuples / ( (bs-page_hdr)fillfactor/(tpl_size100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff,
tblpages, fillfactor, bs, tblid, nspname, relname, is_na
FROM (
SELECT
( 4 + tpl_hdr_size + tpl_data_size + (2 * ma)
- CASE WHEN tpl_hdr_size % ma = 0 THEN ma ELSE tpl_hdr_size % ma END
- CASE WHEN ceil(tpl_data_size)::INT % ma = 0 THEN ma ELSE ceil(tpl_data_size)::INT % ma END
) AS tpl_size, (heappages + toastpages) AS tblpages, heappages,
toastpages, reltuples, toasttuples, bs, page_hdr, tblid, nspname, relname, fillfactor, is_na
FROM (
SELECT
tbl.oid AS tblid, ns.nspname , tbl.relname, tbl.reltuples,
tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
coalesce(toast.reltuples, 0) AS toasttuples,
coalesce(substring(array_to_string(tbl.reloptions, ' ‘) FROM ‘fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
current_setting(‘block_size’)::numeric AS bs,
CASE WHEN version()~‘mingw32’ OR version()~‘64-bit|x86_64|ppc64|ia64|amd64’ THEN 8 ELSE 4 END AS ma,
24 AS page_hdr,
23 + CASE WHEN MAX(coalesce(s.null_frac,0)) > 0 THEN ( 7 + count(s.attname) ) / 8 ELSE 0::int END
+ CASE WHEN bool_or(att.attname = ‘oid’ and att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size,
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0) ) AS tpl_data_size,
bool_or(att.atttypid = ‘pg_catalog.name’::regtype)
OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.attname) AS is_na
FROM pg_attribute AS att
JOIN pg_class AS tbl ON att.attrelid = tbl.oid
JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname
LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
WHERE NOT att.attisdropped AND tbl.relkind = ‘r’ AND nspname NOT IN (‘pg_catalog’,‘information_schema’)
GROUP BY 1,2,3,4,5,6,7,8,9,10
) AS s
) AS s2
) AS s3
WHERE NOT is_na;
COMMENT ON VIEW monitor.pg_table_bloat IS ‘postgres table bloat estimate’;
– Index bloat estimate
CREATE OR REPLACE VIEW monitor.pg_index_bloat AS
SELECT CURRENT_CATALOG AS datname, nspname, idxname AS relname, relpages::BIGINT * bs AS size,
COALESCE((relpages - ( reltuples * (6 + ma - (CASE WHEN index_tuple_hdr % ma = 0 THEN ma ELSE index_tuple_hdr % ma END)
+ nulldatawidth + ma - (CASE WHEN nulldatawidth % ma = 0 THEN ma ELSE nulldatawidth % ma END))
/ (bs - pagehdr)::FLOAT + 1 )), 0) / relpages::FLOAT AS ratio
FROM (
SELECT nspname,
idxname,
reltuples,
relpages,
current_setting(‘block_size’)::INTEGER AS bs,
(CASE WHEN version() ~ ‘mingw32’ OR version() ~ ‘64-bit|x86_64|ppc64|ia64|amd64’ THEN 8 ELSE 4 END) AS ma,
24 AS pagehdr,
(CASE WHEN max(COALESCE(pg_stats.null_frac, 0)) = 0 THEN 2 ELSE 6 END) AS index_tuple_hdr,
sum((1.0 - COALESCE(pg_stats.null_frac, 0.0)) *
COALESCE(pg_stats.avg_width, 1024))::INTEGER AS nulldatawidth
FROM pg_attribute
JOIN (
SELECT pg_namespace.nspname,
ic.relname AS idxname,
ic.reltuples,
ic.relpages,
pg_index.indrelid,
pg_index.indexrelid,
tc.relname AS tablename,
regexp_split_to_table(pg_index.indkey::TEXT, ' ‘) :: INTEGER AS attnum,
pg_index.indexrelid AS index_oid
FROM pg_index
JOIN pg_class ic ON pg_index.indexrelid = ic.oid
JOIN pg_class tc ON pg_index.indrelid = tc.oid
JOIN pg_namespace ON pg_namespace.oid = ic.relnamespace
JOIN pg_am ON ic.relam = pg_am.oid
WHERE pg_am.amname = ‘btree’ AND ic.relpages > 0 AND nspname NOT IN (‘pg_catalog’, ‘information_schema’)
) ind_atts ON pg_attribute.attrelid = ind_atts.indexrelid AND pg_attribute.attnum = ind_atts.attnum
JOIN pg_stats ON pg_stats.schemaname = ind_atts.nspname
AND ((pg_stats.tablename = ind_atts.tablename AND pg_stats.attname = pg_get_indexdef(pg_attribute.attrelid, pg_attribute.attnum, TRUE))
OR (pg_stats.tablename = ind_atts.idxname AND pg_stats.attname = pg_attribute.attname))
WHERE pg_attribute.attnum > 0
GROUP BY 1, 2, 3, 4, 5, 6
) est
LIMIT 512;
COMMENT ON VIEW monitor.pg_index_bloat IS ‘postgres index bloat estimate (btree-only)';
– table bloat pretty
CREATE OR REPLACE VIEW monitor.pg_table_bloat_human AS
SELECT nspname || ‘.’ || relname AS name,
pg_size_pretty(size) AS size,
pg_size_pretty((size * ratio)::BIGINT) AS wasted,
round(100 * ratio::NUMERIC, 2) as ratio
FROM monitor.pg_table_bloat ORDER BY wasted DESC NULLS LAST;
COMMENT ON VIEW monitor.pg_table_bloat_human IS ‘postgres table bloat pretty’;
– index bloat pretty
CREATE OR REPLACE VIEW monitor.pg_index_bloat_human AS
SELECT nspname || ‘.’ || relname AS name,
pg_size_pretty(size) AS size,
pg_size_pretty((size * ratio)::BIGINT) AS wasted,
round(100 * ratio::NUMERIC, 2) as ratio
FROM monitor.pg_index_bloat;
COMMENT ON VIEW monitor.pg_index_bloat_human IS ‘postgres index bloat pretty’;
– pg session
CREATE OR REPLACE VIEW monitor.pg_session AS
SELECT coalesce(datname, ‘all’) AS datname,
numbackends,
active,
idle,
ixact,
max_duration,
max_tx_duration,
max_conn_duration
FROM (
SELECT datname,
count() AS numbackends,
count() FILTER ( WHERE state = ‘active’ ) AS active,
count() FILTER ( WHERE state = ‘idle’ ) AS idle,
count() FILTER ( WHERE state = ‘idle in transaction’
OR state = ‘idle in transaction (aborted)’ ) AS ixact,
max(extract(epoch from now() - state_change))
FILTER ( WHERE state = ‘active’ ) AS max_duration,
max(extract(epoch from now() - xact_start)) AS max_tx_duration,
max(extract(epoch from now() - backend_start)) AS max_conn_duration
FROM pg_stat_activity
WHERE backend_type = ‘client backend’
AND pid <> pg_backend_pid()
GROUP BY ROLLUP (1)
ORDER BY 1 NULLS FIRST
) t;
COMMENT ON VIEW monitor.pg_session IS ‘postgres session stats’;
– pg kill
CREATE OR REPLACE VIEW monitor.pg_kill AS
SELECT pid,
pg_terminate_backend(pid) AS killed,
datname AS dat,
usename AS usr,
application_name AS app,
client_addr AS addr,
state,
extract(epoch from now() - state_change) AS query_time,
extract(epoch from now() - xact_start) AS xact_time,
extract(epoch from now() - backend_start) AS conn_time,
substring(query, 1, 40) AS query
FROM pg_stat_activity
WHERE backend_type = ‘client backend’
AND pid <> pg_backend_pid();
COMMENT ON VIEW monitor.pg_kill IS ‘kill all backend session’;
– quick cancel view
DROP VIEW IF EXISTS monitor.pg_cancel;
CREATE OR REPLACE VIEW monitor.pg_cancel AS
SELECT pid,
pg_cancel_backend(pid) AS cancel,
datname AS dat,
usename AS usr,
application_name AS app,
client_addr AS addr,
state,
extract(epoch from now() - state_change) AS query_time,
extract(epoch from now() - xact_start) AS xact_time,
extract(epoch from now() - backend_start) AS conn_time,
substring(query, 1, 40)
FROM pg_stat_activity
WHERE state = ‘active’
AND backend_type = ‘client backend’
and pid <> pg_backend_pid();
COMMENT ON VIEW monitor.pg_cancel IS ‘cancel backend queries’;
– seq scan
DROP VIEW IF EXISTS monitor.pg_seq_scan;
CREATE OR REPLACE VIEW monitor.pg_seq_scan AS
SELECT schemaname AS nspname,
relname,
seq_scan,
seq_tup_read,
seq_tup_read / seq_scan AS seq_tup_avg,
idx_scan,
n_live_tup + n_dead_tup AS tuples,
n_live_tup / (n_live_tup + n_dead_tup) AS dead_ratio
FROM pg_stat_user_tables
WHERE seq_scan > 0
and (n_live_tup + n_dead_tup) > 0
ORDER BY seq_tup_read DESC
LIMIT 50;
COMMENT ON VIEW monitor.pg_seq_scan IS ‘table that have seq scan’;
{% if pg_version >= 13 %}
– pg_shmem auxiliary function
– PG 13 ONLY!
CREATE OR REPLACE FUNCTION monitor.pg_shmem() RETURNS SETOF
pg_shmem_allocations AS $$ SELECT * FROM pg_shmem_allocations;$$ LANGUAGE SQL SECURITY DEFINER;
COMMENT ON FUNCTION monitor.pg_shmem() IS ‘security wrapper for pg_shmem’;
{% endif %}
–==================================================================–
– Customize Logic –
–==================================================================–
– This script will be execute on primary instance among a newly created
– postgres cluster. it will be executed as dbsu on template1 database
– put your own customize logic here
– make sure they are idempotent
</details>
一个实际的渲染样例(`pg-test`)如下所示:
<details>
```sql
----------------------------------------------------------------------
-- File : pg-init-template.sql
-- Ctime : 2018-10-30
-- Mtime : 2021-02-27
-- Desc : init postgres cluster template
-- Path : /pg/tmp/pg-init-template.sql
-- Author : Vonng(fengruohang@outlook.com)
-- Copyright (C) 2018-2021 Ruohang Feng
----------------------------------------------------------------------
--==================================================================--
-- Executions --
--==================================================================--
-- psql template1 -AXtwqf /pg/tmp/pg-init-template.sql
-- this sql scripts is responsible for post-init procedure
-- it will
-- * create system users such as replicator, monitor user, admin user
-- * create system default roles
-- * create schema, extensions in template1 & postgres
-- * create monitor views in template1 & postgres
--==================================================================--
-- Default Privileges --
--==================================================================--
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT USAGE ON SCHEMAS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT ON TABLES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT ON SEQUENCES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT USAGE ON SCHEMAS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT ON TABLES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT SELECT ON SEQUENCES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT EXECUTE ON FUNCTIONS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE postgres GRANT CREATE ON SCHEMAS TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT USAGE ON SCHEMAS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT ON TABLES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT ON SEQUENCES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT USAGE ON SCHEMAS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT ON TABLES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT SELECT ON SEQUENCES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT EXECUTE ON FUNCTIONS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE dbuser_admin GRANT CREATE ON SCHEMAS TO dbrole_admin;
-- for additional business admin, they can SET ROLE to dbrole_admin
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT USAGE ON SCHEMAS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT ON TABLES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT ON SEQUENCES TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT USAGE ON SCHEMAS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT ON TABLES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT SELECT ON SEQUENCES TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT EXECUTE ON FUNCTIONS TO dbrole_offline;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin;
ALTER DEFAULT PRIVILEGES FOR ROLE "dbrole_admin" GRANT CREATE ON SCHEMAS TO dbrole_admin;
--==================================================================--
-- Schemas --
--==================================================================--
CREATE SCHEMA IF NOT EXISTS "monitor";
-- revoke public creation
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
--==================================================================--
-- Extensions --
--==================================================================--
CREATE EXTENSION IF NOT EXISTS "pg_stat_statements" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pgstattuple" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_qualstats" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_buffercache" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pageinspect" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_prewarm" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_visibility" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_freespacemap" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "pg_repack" WITH SCHEMA "monitor";
CREATE EXTENSION IF NOT EXISTS "postgres_fdw";
CREATE EXTENSION IF NOT EXISTS "file_fdw";
CREATE EXTENSION IF NOT EXISTS "btree_gist";
CREATE EXTENSION IF NOT EXISTS "btree_gin";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";
CREATE EXTENSION IF NOT EXISTS "intagg";
CREATE EXTENSION IF NOT EXISTS "intarray";
--==================================================================--
-- Monitor Views --
--==================================================================--
----------------------------------------------------------------------
-- cleanse
----------------------------------------------------------------------
CREATE SCHEMA IF NOT EXISTS monitor;
GRANT USAGE ON SCHEMA monitor TO "dbuser_monitor";
GRANT USAGE ON SCHEMA monitor TO "dbuser_admin";
GRANT USAGE ON SCHEMA monitor TO "replicator";
DROP VIEW IF EXISTS monitor.pg_table_bloat_human;
DROP VIEW IF EXISTS monitor.pg_index_bloat_human;
DROP VIEW IF EXISTS monitor.pg_table_bloat;
DROP VIEW IF EXISTS monitor.pg_index_bloat;
DROP VIEW IF EXISTS monitor.pg_session;
DROP VIEW IF EXISTS monitor.pg_kill;
DROP VIEW IF EXISTS monitor.pg_cancel;
DROP VIEW IF EXISTS monitor.pg_seq_scan;
----------------------------------------------------------------------
-- Table bloat estimate
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_table_bloat AS
SELECT CURRENT_CATALOG AS datname, nspname, relname , bs * tblpages AS size,
CASE WHEN tblpages - est_tblpages_ff > 0 THEN (tblpages - est_tblpages_ff)/tblpages::FLOAT ELSE 0 END AS ratio
FROM (
SELECT ceil( reltuples / ( (bs-page_hdr)*fillfactor/(tpl_size*100) ) ) + ceil( toasttuples / 4 ) AS est_tblpages_ff,
tblpages, fillfactor, bs, tblid, nspname, relname, is_na
FROM (
SELECT
( 4 + tpl_hdr_size + tpl_data_size + (2 * ma)
- CASE WHEN tpl_hdr_size % ma = 0 THEN ma ELSE tpl_hdr_size % ma END
- CASE WHEN ceil(tpl_data_size)::INT % ma = 0 THEN ma ELSE ceil(tpl_data_size)::INT % ma END
) AS tpl_size, (heappages + toastpages) AS tblpages, heappages,
toastpages, reltuples, toasttuples, bs, page_hdr, tblid, nspname, relname, fillfactor, is_na
FROM (
SELECT
tbl.oid AS tblid, ns.nspname , tbl.relname, tbl.reltuples,
tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
coalesce(toast.reltuples, 0) AS toasttuples,
coalesce(substring(array_to_string(tbl.reloptions, ' ') FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
current_setting('block_size')::numeric AS bs,
CASE WHEN version()~'mingw32' OR version()~'64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
24 AS page_hdr,
23 + CASE WHEN MAX(coalesce(s.null_frac,0)) > 0 THEN ( 7 + count(s.attname) ) / 8 ELSE 0::int END
+ CASE WHEN bool_or(att.attname = 'oid' and att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size,
sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0) ) AS tpl_data_size,
bool_or(att.atttypid = 'pg_catalog.name'::regtype)
OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.attname) AS is_na
FROM pg_attribute AS att
JOIN pg_class AS tbl ON att.attrelid = tbl.oid
JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats AS s ON s.schemaname=ns.nspname AND s.tablename = tbl.relname AND s.inherited=false AND s.attname=att.attname
LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
WHERE NOT att.attisdropped AND tbl.relkind = 'r' AND nspname NOT IN ('pg_catalog','information_schema')
GROUP BY 1,2,3,4,5,6,7,8,9,10
) AS s
) AS s2
) AS s3
WHERE NOT is_na;
COMMENT ON VIEW monitor.pg_table_bloat IS 'postgres table bloat estimate';
----------------------------------------------------------------------
-- Index bloat estimate
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_index_bloat AS
SELECT CURRENT_CATALOG AS datname, nspname, idxname AS relname, relpages::BIGINT * bs AS size,
COALESCE((relpages - ( reltuples * (6 + ma - (CASE WHEN index_tuple_hdr % ma = 0 THEN ma ELSE index_tuple_hdr % ma END)
+ nulldatawidth + ma - (CASE WHEN nulldatawidth % ma = 0 THEN ma ELSE nulldatawidth % ma END))
/ (bs - pagehdr)::FLOAT + 1 )), 0) / relpages::FLOAT AS ratio
FROM (
SELECT nspname,
idxname,
reltuples,
relpages,
current_setting('block_size')::INTEGER AS bs,
(CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END) AS ma,
24 AS pagehdr,
(CASE WHEN max(COALESCE(pg_stats.null_frac, 0)) = 0 THEN 2 ELSE 6 END) AS index_tuple_hdr,
sum((1.0 - COALESCE(pg_stats.null_frac, 0.0)) *
COALESCE(pg_stats.avg_width, 1024))::INTEGER AS nulldatawidth
FROM pg_attribute
JOIN (
SELECT pg_namespace.nspname,
ic.relname AS idxname,
ic.reltuples,
ic.relpages,
pg_index.indrelid,
pg_index.indexrelid,
tc.relname AS tablename,
regexp_split_to_table(pg_index.indkey::TEXT, ' ') :: INTEGER AS attnum,
pg_index.indexrelid AS index_oid
FROM pg_index
JOIN pg_class ic ON pg_index.indexrelid = ic.oid
JOIN pg_class tc ON pg_index.indrelid = tc.oid
JOIN pg_namespace ON pg_namespace.oid = ic.relnamespace
JOIN pg_am ON ic.relam = pg_am.oid
WHERE pg_am.amname = 'btree' AND ic.relpages > 0 AND nspname NOT IN ('pg_catalog', 'information_schema')
) ind_atts ON pg_attribute.attrelid = ind_atts.indexrelid AND pg_attribute.attnum = ind_atts.attnum
JOIN pg_stats ON pg_stats.schemaname = ind_atts.nspname
AND ((pg_stats.tablename = ind_atts.tablename AND pg_stats.attname = pg_get_indexdef(pg_attribute.attrelid, pg_attribute.attnum, TRUE))
OR (pg_stats.tablename = ind_atts.idxname AND pg_stats.attname = pg_attribute.attname))
WHERE pg_attribute.attnum > 0
GROUP BY 1, 2, 3, 4, 5, 6
) est
LIMIT 512;
COMMENT ON VIEW monitor.pg_index_bloat IS 'postgres index bloat estimate (btree-only)';
----------------------------------------------------------------------
-- table bloat pretty
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_table_bloat_human AS
SELECT nspname || '.' || relname AS name,
pg_size_pretty(size) AS size,
pg_size_pretty((size * ratio)::BIGINT) AS wasted,
round(100 * ratio::NUMERIC, 2) as ratio
FROM monitor.pg_table_bloat ORDER BY wasted DESC NULLS LAST;
COMMENT ON VIEW monitor.pg_table_bloat_human IS 'postgres table bloat pretty';
----------------------------------------------------------------------
-- index bloat pretty
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_index_bloat_human AS
SELECT nspname || '.' || relname AS name,
pg_size_pretty(size) AS size,
pg_size_pretty((size * ratio)::BIGINT) AS wasted,
round(100 * ratio::NUMERIC, 2) as ratio
FROM monitor.pg_index_bloat;
COMMENT ON VIEW monitor.pg_index_bloat_human IS 'postgres index bloat pretty';
----------------------------------------------------------------------
-- pg session
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_session AS
SELECT coalesce(datname, 'all') AS datname,
numbackends,
active,
idle,
ixact,
max_duration,
max_tx_duration,
max_conn_duration
FROM (
SELECT datname,
count(*) AS numbackends,
count(*) FILTER ( WHERE state = 'active' ) AS active,
count(*) FILTER ( WHERE state = 'idle' ) AS idle,
count(*) FILTER ( WHERE state = 'idle in transaction'
OR state = 'idle in transaction (aborted)' ) AS ixact,
max(extract(epoch from now() - state_change))
FILTER ( WHERE state = 'active' ) AS max_duration,
max(extract(epoch from now() - xact_start)) AS max_tx_duration,
max(extract(epoch from now() - backend_start)) AS max_conn_duration
FROM pg_stat_activity
WHERE backend_type = 'client backend'
AND pid <> pg_backend_pid()
GROUP BY ROLLUP (1)
ORDER BY 1 NULLS FIRST
) t;
COMMENT ON VIEW monitor.pg_session IS 'postgres session stats';
----------------------------------------------------------------------
-- pg kill
----------------------------------------------------------------------
CREATE OR REPLACE VIEW monitor.pg_kill AS
SELECT pid,
pg_terminate_backend(pid) AS killed,
datname AS dat,
usename AS usr,
application_name AS app,
client_addr AS addr,
state,
extract(epoch from now() - state_change) AS query_time,
extract(epoch from now() - xact_start) AS xact_time,
extract(epoch from now() - backend_start) AS conn_time,
substring(query, 1, 40) AS query
FROM pg_stat_activity
WHERE backend_type = 'client backend'
AND pid <> pg_backend_pid();
COMMENT ON VIEW monitor.pg_kill IS 'kill all backend session';
----------------------------------------------------------------------
-- quick cancel view
----------------------------------------------------------------------
DROP VIEW IF EXISTS monitor.pg_cancel;
CREATE OR REPLACE VIEW monitor.pg_cancel AS
SELECT pid,
pg_cancel_backend(pid) AS cancel,
datname AS dat,
usename AS usr,
application_name AS app,
client_addr AS addr,
state,
extract(epoch from now() - state_change) AS query_time,
extract(epoch from now() - xact_start) AS xact_time,
extract(epoch from now() - backend_start) AS conn_time,
substring(query, 1, 40)
FROM pg_stat_activity
WHERE state = 'active'
AND backend_type = 'client backend'
and pid <> pg_backend_pid();
COMMENT ON VIEW monitor.pg_cancel IS 'cancel backend queries';
----------------------------------------------------------------------
-- seq scan
----------------------------------------------------------------------
DROP VIEW IF EXISTS monitor.pg_seq_scan;
CREATE OR REPLACE VIEW monitor.pg_seq_scan AS
SELECT schemaname AS nspname,
relname,
seq_scan,
seq_tup_read,
seq_tup_read / seq_scan AS seq_tup_avg,
idx_scan,
n_live_tup + n_dead_tup AS tuples,
n_live_tup / (n_live_tup + n_dead_tup) AS dead_ratio
FROM pg_stat_user_tables
WHERE seq_scan > 0
and (n_live_tup + n_dead_tup) > 0
ORDER BY seq_tup_read DESC
LIMIT 50;
COMMENT ON VIEW monitor.pg_seq_scan IS 'table that have seq scan';
----------------------------------------------------------------------
-- pg_shmem auxiliary function
-- PG 13 ONLY!
----------------------------------------------------------------------
CREATE OR REPLACE FUNCTION monitor.pg_shmem() RETURNS SETOF
pg_shmem_allocations AS $$ SELECT * FROM pg_shmem_allocations;$$ LANGUAGE SQL SECURITY DEFINER;
COMMENT ON FUNCTION monitor.pg_shmem() IS 'security wrapper for pg_shmem';
--==================================================================--
-- Customize Logic --
--==================================================================--
-- This script will be execute on primary instance among a newly created
-- postgres cluster. it will be executed as dbsu on template1 database
-- put your own customize logic here
-- make sure they are idempotent
5.2.5 - 定制业务ACL
配置Pigsty中的业务用户
PostgreSQL中的ACL包括两部分,用户权限体系(Privileges) 与 Host Based Authentication (HBA)
Pigsty提供了默认访问控制系统,用户可在此基础上进一步定制,与ACL相关的配置项包括:
HBA规则
用户可以通过 pg_hba_rules 与 pg_hba_rules_extra 定制 Postgres的HBA规则,通过 pgbouncer_hba_rules 与 pgbouncer_hba_rules_extra 定制Pgbouncer的HBA规则。
一条HBA规则是一个对象,包含3个必选字段:title
,role
,rules
。
title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
title
是这条规则的说明,会被渲染为注释信息。
role
是这条规则的应用范围,
rules
是具体的HBA规则数组,每一个元素都是一条规则五元组,请参考PG官方文档。
这样的一条规则,会被渲染至/pg/data/pg_hba.conf
文件中。
# allow intranet password access
host all all 10.0.0.0/8 md5
host all all 172.16.0.0/12 md5
host all all 192.168.0.0/16 md5
规则的应用范围
规则的 role
用于控制规则安装的位置。
role = common
的HBA规则组会安装到所有的实例上,而其他的取值,例如(role : primary
)则只会安装至pg_role = primary
的实例上。因此用户可以通过角色体系定义灵活的HBA规则。
作为一个特例,role: offline
的HBA规则,除了会安装至pg_role == 'offline'
的实例,也会安装至pg_offline_query == true
的实例上,允许离线用户访问。
规则的应用顺序
定义的HBA规则按照以下顺序生效:
特别注意
请注意,因为在实际生产应用中,通常会基于实例的角色,对HBA进行区分与细化管理。Pigsty不建议通过Patroni管理HBA配置。如果配置了Patroni中的HBA规则,数据库的HBA会在重启时被Patroni所覆盖。
5.3 - 执行剧本
如何利用Pigsty提供的剧本完成完整的初始化。
Pigsty采用声明式接口,配置完成之后只需运行固定的 剧本(Playbook),即可完成部署
基本部署
沙箱部署
仅监控部署
日常管理
Pigsty还提供了一些供日常运维管理使用的预置剧本:
5.3.1 - 基础设施初始化
如何使用剧本初始化基础设施
概览
基础设施初始化通过 infra.yml
完成。该剧本会在元节点 上完成基础设施的安装与部署。
infra.yml
将元节点(默认分组名为meta
)作为部署目标。
注意事项
❗️必须完成元节点的初始化后,才能正常执行数据库节点的初始化
infra.yml
固定会作用于配置文件中 名为 meta
的分组
元节点可以当作普通节点复用,即在元节点上也可以定义并创建PostgreSQL数据库。
Pigsty建议使用默认配置,在元节点上创建一个pg-meta
元数据库集群,用于承载Pigsty高级特性。
完整执行一遍初始化流程可能花费2~8分钟,视机器配置而异。
选择性执行
用户可以通过ansible的标签机制,选择性执行剧本的一个子集。
例如,如果只想执行本地源初始化的部分,则可以通过以下命令:
具体的标签请参考 任务详情
一些常用的任务子集包括:
./infra.yml --tags=repo -e repo_rebuild=true # 强制重新创建本地源
./infra.yml --tags=prometheus_reload # 重新加载Prometheus配置
./infra.yml --tags=nginx_haproxy # 重新生成Nginx Haproxy索引页
./infra.yml --tags=prometheus_targets,prometheus_reload # 重新生成Prometheus静态监控对象文件并应用
剧本说明
infra.yml
主要完成以下工作
- 部署并启用本地源
- 完成元节点的初始化
- 完成元节点基础设施初始化
- CA基础设施
- DNS Nameserver
- Nginx
- Prometheus & Alertmanger
- Grafana
- 将Pigsty本体拷贝至元节点
- 在元节点上完成数据库初始化(可选,用户可以通过标准的数据库集群初始化流程复用元节点)
原始内容
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : infra.yml
# Ctime : 2020-04-13
# Mtime : 2020-07-23
# Desc : init infrastructure on meta nodes
# Path : infra.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
#------------------------------------------------------------------------------
# init local yum repo (only run on meta nodes)
#------------------------------------------------------------------------------
- name: Init local repo
become: yes
hosts: meta
gather_facts: no
tags: repo
roles:
- repo
#------------------------------------------------------------------------------
# provision nodes
#------------------------------------------------------------------------------
- name: Provision Node
become: yes
hosts: meta
gather_facts: no
tags: node
roles:
- node
#------------------------------------------------------------------------------
# init meta service (only run on meta nodes)
#------------------------------------------------------------------------------
- name: Init meta service
become: yes
hosts: meta
gather_facts: no
tags: meta
roles:
- role: ca
tags: ca
- role: nameserver
tags: nameserver
- role: nginx
tags: nginx
- role: prometheus
tags: prometheus
- role: grafana
tags: grafana
#------------------------------------------------------------------------------
# init dcs on nodes
#------------------------------------------------------------------------------
- name: Init dcs
become: yes
hosts: meta
gather_facts: no
roles:
- role: consul
tags: dcs
#------------------------------------------------------------------------------
# copy scripts to meta node
#------------------------------------------------------------------------------
- name: Copy ansible scripts
become: yes
hosts: meta
gather_facts: no
ignore_errors: yes
tags: ansible
tasks:
- name: Copy ansible scritps
when: node_admin_setup is defined and node_admin_setup|bool and node_admin_username != ''
block:
# create copy of this repo
- name: Create ansible tarball
become: no
connection: local
run_once: true
command:
cmd: tar -cf files/meta.tgz roles templates ansible.cfg infra.yml pgsql.yml pgsql-remove.yml pgsql-createdb.yml pgsql-createuser.yml pgsql-service.yml pgsql-monitor.yml pigsty.yml Makefile
chdir: "{{ playbook_dir }}"
- name: Create ansible directory
file: path="/home/{{ node_admin_username }}/meta" state=directory owner={{ node_admin_username }}
- name: Copy ansible tarball
copy: src="meta.tgz" dest="/home/{{ node_admin_username }}/meta/meta.tgz" owner={{ node_admin_username }}
- name: Extract tarball
shell: |
cd /home/{{ node_admin_username }}/meta/
tar -xf meta.tgz
chown -R {{ node_admin_username }} /home/{{ node_admin_username }}
rm -rf meta.tgz
chmod a+x *.yml
#------------------------------------------------------------------------------
# meta node database (optional)
#------------------------------------------------------------------------------
# this play will create database clusters on meta nodes.
# it's good to reuse meta node as normal database nodes too
# but it's always better to leave it be.
#------------------------------------------------------------------------------
#- name: Pgsql Initialization
# become: yes
# hosts: meta
# gather_facts: no
# roles:
# - role: postgres # init postgres
# tags: [pgsql, postgres]
#
# - role: monitor # init monitor system
# tags: [pgsql, monitor]
#
# - role: service # init haproxy
# tags: [service]
...
任务详情
使用以下命令可以列出所有基础设施初始化会执行的任务,以及可以使用的标签:
默认任务如下:
playbook: ./infra.yml
play #1 (meta): Init local repo TAGS: [repo]
tasks:
repo : Create local repo directory TAGS: [repo, repo_dir]
repo : Backup & remove existing repos TAGS: [repo, repo_upstream]
repo : Add required upstream repos TAGS: [repo, repo_upstream]
repo : Check repo pkgs cache exists TAGS: [repo, repo_prepare]
repo : Set fact whether repo_exists TAGS: [repo, repo_prepare]
repo : Move upstream repo to backup TAGS: [repo, repo_prepare]
repo : Add local file system repos TAGS: [repo, repo_prepare]
repo : Remake yum cache if not exists TAGS: [repo, repo_prepare]
repo : Install repo bootstrap packages TAGS: [repo, repo_boot]
repo : Render repo nginx server files TAGS: [repo, repo_nginx]
repo : Disable selinux for repo server TAGS: [repo, repo_nginx]
repo : Launch repo nginx server TAGS: [repo, repo_nginx]
repo : Waits repo server online TAGS: [repo, repo_nginx]
repo : Download web url packages TAGS: [repo, repo_download]
repo : Download repo packages TAGS: [repo, repo_download]
repo : Download repo pkg deps TAGS: [repo, repo_download]
repo : Create local repo index TAGS: [repo, repo_download]
repo : Copy bootstrap scripts TAGS: [repo, repo_download, repo_script]
repo : Mark repo cache as valid TAGS: [repo, repo_download]
play #2 (meta): Provision Node TAGS: [node]
tasks:
node : Update node hostname TAGS: [node, node_name]
node : Add new hostname to /etc/hosts TAGS: [node, node_name]
node : Write static dns records TAGS: [node, node_dns]
node : Get old nameservers TAGS: [node, node_resolv]
node : Truncate resolv file TAGS: [node, node_resolv]
node : Write resolv options TAGS: [node, node_resolv]
node : Add new nameservers TAGS: [node, node_resolv]
node : Append old nameservers TAGS: [node, node_resolv]
node : Node configure disable firewall TAGS: [node, node_firewall]
node : Node disable selinux by default TAGS: [node, node_firewall]
node : Backup existing repos TAGS: [node, node_repo]
node : Install upstream repo TAGS: [node, node_repo]
node : Install local repo TAGS: [node, node_repo]
node : Install node basic packages TAGS: [node, node_pkgs]
node : Install node extra packages TAGS: [node, node_pkgs]
node : Install meta specific packages TAGS: [node, node_pkgs]
node : Install node basic packages TAGS: [node, node_pkgs]
node : Install node extra packages TAGS: [node, node_pkgs]
node : Install meta specific packages TAGS: [node, node_pkgs]
node : Node configure disable numa TAGS: [node, node_feature]
node : Node configure disable swap TAGS: [node, node_feature]
node : Node configure unmount swap TAGS: [node, node_feature]
node : Node setup static network TAGS: [node, node_feature]
node : Node configure disable firewall TAGS: [node, node_feature]
node : Node configure disk prefetch TAGS: [node, node_feature]
node : Enable linux kernel modules TAGS: [node, node_kernel]
node : Enable kernel module on reboot TAGS: [node, node_kernel]
node : Get config parameter page count TAGS: [node, node_tuned]
node : Get config parameter page size TAGS: [node, node_tuned]
node : Tune shmmax and shmall via mem TAGS: [node, node_tuned]
node : Create tuned profiles TAGS: [node, node_tuned]
node : Render tuned profiles TAGS: [node, node_tuned]
node : Active tuned profile TAGS: [node, node_tuned]
node : Change additional sysctl params TAGS: [node, node_tuned]
node : Copy default user bash profile TAGS: [node, node_profile]
node : Setup node default pam ulimits TAGS: [node, node_ulimit]
node : Create os user group admin TAGS: [node, node_admin]
node : Create os user admin TAGS: [node, node_admin]
node : Grant admin group nopass sudo TAGS: [node, node_admin]
node : Add no host checking to ssh config TAGS: [node, node_admin]
node : Add admin ssh no host checking TAGS: [node, node_admin]
node : Fetch all admin public keys TAGS: [node, node_admin]
node : Exchange all admin ssh keys TAGS: [node, node_admin]
node : Install public keys TAGS: [node, node_admin]
node : Install ntp package TAGS: [node, ntp_install]
node : Install chrony package TAGS: [node, ntp_install]
node : Setup default node timezone TAGS: [node, ntp_config]
node : Copy the ntp.conf file TAGS: [node, ntp_config]
node : Copy the chrony.conf template TAGS: [node, ntp_config]
node : Launch ntpd service TAGS: [node, ntp_launch]
node : Launch chronyd service TAGS: [node, ntp_launch]
play #3 (meta): Init meta service TAGS: [meta]
tasks:
ca : Create local ca directory TAGS: [ca, ca_dir, meta]
ca : Copy ca cert from local files TAGS: [ca, ca_copy, meta]
ca : Check ca key cert exists TAGS: [ca, ca_create, meta]
ca : Create self-signed CA key-cert TAGS: [ca, ca_create, meta]
nameserver : Make sure dnsmasq package installed TAGS: [meta, nameserver]
nameserver : Copy dnsmasq /etc/dnsmasq.d/config TAGS: [meta, nameserver]
nameserver : Add dynamic dns records to meta TAGS: [meta, nameserver]
nameserver : Launch meta dnsmasq service TAGS: [meta, nameserver]
nameserver : Wait for meta dnsmasq online TAGS: [meta, nameserver]
nameserver : Register consul dnsmasq service TAGS: [meta, nameserver]
nameserver : Reload consul TAGS: [meta, nameserver]
nginx : Make sure nginx installed TAGS: [meta, nginx, nginx_install]
nginx : Create local html directory TAGS: [meta, nginx, nginx_content]
nginx : Create nginx config directory TAGS: [meta, nginx, nginx_content]
nginx : Update default nginx index page TAGS: [meta, nginx, nginx_content]
nginx : Copy nginx default config TAGS: [meta, nginx, nginx_config]
nginx : Copy nginx upstream conf TAGS: [meta, nginx, nginx_config]
nginx : Templating /etc/nginx/haproxy.conf TAGS: [meta, nginx, nginx_haproxy]
nginx : Render haproxy upstream in cluster mode TAGS: [meta, nginx, nginx_haproxy]
nginx : Render haproxy location in cluster mode TAGS: [meta, nginx, nginx_haproxy]
nginx : Templating haproxy cluster index TAGS: [meta, nginx, nginx_haproxy]
nginx : Templating haproxy cluster index TAGS: [meta, nginx, nginx_haproxy]
nginx : Restart meta nginx service TAGS: [meta, nginx, nginx_restart]
nginx : Wait for nginx service online TAGS: [meta, nginx, nginx_restart]
nginx : Make sure nginx exporter installed TAGS: [meta, nginx, nginx_exporter]
nginx : Config nginx_exporter options TAGS: [meta, nginx, nginx_exporter]
nginx : Restart nginx_exporter service TAGS: [meta, nginx, nginx_exporter]
nginx : Wait for nginx exporter online TAGS: [meta, nginx, nginx_exporter]
nginx : Register cosnul nginx service TAGS: [meta, nginx, nginx_register]
nginx : Register consul nginx-exporter service TAGS: [meta, nginx, nginx_register]
nginx : Reload consul TAGS: [meta, nginx, nginx_register]
prometheus : Install prometheus and alertmanager TAGS: [meta, prometheus]
prometheus : Wipe out prometheus config dir TAGS: [meta, prometheus, prometheus_clean]
prometheus : Wipe out existing prometheus data TAGS: [meta, prometheus, prometheus_clean]
prometheus : Create postgres directory structure TAGS: [meta, prometheus, prometheus_config]
prometheus : Copy prometheus bin scripts TAGS: [meta, prometheus, prometheus_config]
prometheus : Copy prometheus rules scripts TAGS: [meta, prometheus, prometheus_config]
prometheus : Copy altermanager config TAGS: [meta, prometheus, prometheus_config]
prometheus : Render prometheus config TAGS: [meta, prometheus, prometheus_config]
prometheus : Config /etc/prometheus opts TAGS: [meta, prometheus, prometheus_config]
prometheus : Launch prometheus service TAGS: [meta, prometheus, prometheus_launch]
prometheus : Launch alertmanager service TAGS: [meta, prometheus, prometheus_launch]
prometheus : Wait for prometheus online TAGS: [meta, prometheus, prometheus_launch]
prometheus : Wait for alertmanager online TAGS: [meta, prometheus, prometheus_launch]
prometheus : Render prometheus targets in cluster mode TAGS: [meta, prometheus, prometheus_targets]
prometheus : Reload prometheus service TAGS: [meta, prometheus, prometheus_reload]
prometheus : Copy prometheus service definition TAGS: [meta, prometheus, prometheus_register]
prometheus : Copy alertmanager service definition TAGS: [meta, prometheus, prometheus_register]
prometheus : Reload consul to register prometheus TAGS: [meta, prometheus, prometheus_register]
grafana : Make sure grafana is installed TAGS: [grafana, grafana_install, meta]
grafana : Check grafana plugin cache exists TAGS: [grafana, grafana_plugin, meta]
grafana : Provision grafana plugins via cache TAGS: [grafana, grafana_plugin, meta]
grafana : Download grafana plugins from web TAGS: [grafana, grafana_plugin, meta]
grafana : Download grafana plugins from web TAGS: [grafana, grafana_plugin, meta]
grafana : Create grafana plugins cache TAGS: [grafana, grafana_plugin, meta]
grafana : Copy /etc/grafana/grafana.ini TAGS: [grafana, grafana_config, meta]
grafana : Remove grafana provision dir TAGS: [grafana, grafana_config, meta]
grafana : Copy provisioning content TAGS: [grafana, grafana_config, meta]
grafana : Copy pigsty dashboards TAGS: [grafana, grafana_config, meta]
grafana : Copy pigsty icon image TAGS: [grafana, grafana_config, meta]
grafana : Replace grafana icon with pigsty TAGS: [grafana, grafana_config, grafana_customize, meta]
grafana : Launch grafana service TAGS: [grafana, grafana_launch, meta]
grafana : Wait for grafana online TAGS: [grafana, grafana_launch, meta]
grafana : Update grafana default preferences TAGS: [grafana, grafana_provision, meta]
grafana : Register consul grafana service TAGS: [grafana, grafana_register, meta]
grafana : Reload consul TAGS: [grafana, grafana_register, meta]
play #4 (meta): Init dcs TAGS: []
tasks:
consul : Check for existing consul TAGS: [consul_check, dcs]
consul : Consul exists flag fact set TAGS: [consul_check, dcs]
consul : Abort due to consul exists TAGS: [consul_check, dcs]
consul : Clean existing consul instance TAGS: [consul_clean, dcs]
consul : Stop any running consul instance TAGS: [consul_clean, dcs]
consul : Remove existing consul dir TAGS: [consul_clean, dcs]
consul : Recreate consul dir TAGS: [consul_clean, dcs]
consul : Make sure consul is installed TAGS: [consul_install, dcs]
consul : Make sure consul dir exists TAGS: [consul_config, dcs]
consul : Get dcs server node names TAGS: [consul_config, dcs]
consul : Get dcs node name from var TAGS: [consul_config, dcs]
consul : Get dcs node name from var TAGS: [consul_config, dcs]
consul : Fetch hostname as dcs node name TAGS: [consul_config, dcs]
consul : Get dcs name from hostname TAGS: [consul_config, dcs]
consul : Copy /etc/consul.d/consul.json TAGS: [consul_config, dcs]
consul : Copy consul agent service TAGS: [consul_config, dcs]
consul : Get dcs bootstrap expect quroum TAGS: [consul_server, dcs]
consul : Copy consul server service unit TAGS: [consul_server, dcs]
consul : Launch consul server service TAGS: [consul_server, dcs]
consul : Wait for consul server online TAGS: [consul_server, dcs]
consul : Launch consul agent service TAGS: [consul_agent, dcs]
consul : Wait for consul agent online TAGS: [consul_agent, dcs]
play #5 (meta): Copy ansible scripts TAGS: [ansible]
tasks:
Create ansible tarball TAGS: [ansible]
Create ansible directory TAGS: [ansible]
Copy ansible tarball TAGS: [ansible]
Extract tarball TAGS: [ansible]
5.3.2 - 数据库集群初始化
如何定义并拉起PostgreSQL数据库集群
剧本概览
完成了基础设施初始化后,用户可以 pgsql.yml
完成数据库集群的初始化。
首先在 Pigsty配置文件 中完成数据库集群的定义,然后通过执行pgsql.yml
将变更应用至实际环境中。
./pgsql.yml # 在所有清单中的机器上执行数据库集群初始化操作(危险!)
./pgsql.yml -l pg-test # 在 pg-test 分组下的机器执行数据库集群初始化(推荐!)
./pgsql.yml -l pg-meta,pg-test # 同时初始化pg-meta与pg-test两个集群
./pgsql.yml -l 10.10.10.11 # 初始化10.10.10.11这台机器上的数据库实例
注意事项
-
使用不带参数的pgsql.yml
虽然很方便,但在生产环境中是一个高危操作
强烈建议您在执行时添加-l
参数,限制命令执行的对象范围。
-
用户可以将元节点当成普通节点复用,即在元节点上定义并创建PostgreSQL数据库。
默认沙箱环境中,执行./pgsql.yml
会同时完成pg-meta
与pg-test
的初始化工作。
-
单独针对集群从库执行初始化时,用户必须自行确保主库必须已经完成初始化,主库与其从库同时进行初始化则无此要求。
保护机制
pgsql.yml
提供保护机制,由配置参数pg_exists_action
决定。当执行剧本前会目标机器上有正在运行的PostgreSQL实例时,Pigsty会根据pg_exists_action
的配置abort|clean|skip
行动。
abort
:建议设置为默认配置,如遇现存实例,中止剧本执行,避免误删库。
clean
:建议在本地沙箱环境使用,如遇现存实例,清除已有数据库。
skip
: 直接在已有数据库集群上执行后续逻辑。
- 您可以通过
./pgsql.yml -e pg_exists_action=clean
的方式来覆盖配置文件选项,强制抹掉现有实例
pg_disable_purge
选项提供了双重保护,如果启用该选项,则``pg_exists_action会被强制设置为
abort`,在任何情况下都不会抹掉运行中的数据库实例。
``dcs_exists_action与
dcs_disable_purge`与上述两个选项效果一致,但针对DCS(Consul Agent)实例。
选择性执行
用户可以通过ansible的标签机制,可以选择执行剧本的一个子集。
举个例子,如果只想执行服务初始化的部分,则可以通过以下命令进行
./pgsql.yml --tags=service
常用的命令子集如下:
./pgsql.yml --tags=infra # 完成基础设施的初始化,包括机器节点初始化与DCS部署
./pgsql.yml --tags=node # 完成机器节点的初始化
./pgsql.yml --tags=dcs # 完成DCS:consul/etcd的初始化
./pgsql.yml --tags=dcs -e dcs_exists_action # 完成consul/etcd的初始化,抹除已有的consul agent
./pgsql.yml --tags=pgsql # 完成数据库与监控的部署
./pgsql.yml --tags=postgres # 完成数据库部署
./pgsql.yml --tags=monitor # 完成监控的部署
./pgsql.yml --tags=service # 完成负载均衡的部署,包括Haproxy与VIP
./pgsql.yml --tags=haproxy_config,haproxy_reload # 修改Haproxy配置并应用。
剧本说明
pgsql.yml
主要完成以下工作:
- 初始化数据库节点基础设施(
node
)
- 初始化DCS Agent(如果为元节点,则为DCS Server)服务(
consul
)。
- 安装、部署、初始化PostgreSQL, Pgbouncer, Patroni(
postgres
)
- 安装PostgreSQL监控系统(
monitor
)
- 安装部署Haproxy与VIP,对外暴露服务(
service
)
精确到任务的标签请参考任务详情
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : pgsql.yml
# Mtime : 2020-05-12
# Mtime : 2021-03-15
# Desc : initialize pigsty cluster
# Path : pgsql.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
#------------------------------------------------------------------------------
# init node and database
#------------------------------------------------------------------------------
- name: Pgsql Initialization
become: yes
hosts: all
gather_facts: no
roles:
- role: node # init node
tags: [infra, node]
- role: consul # init consul
tags: [infra, dcs]
- role: postgres # init postgres
tags: [pgsql, postgres]
- role: monitor # init monitor system
tags: [pgsql, monitor]
- role: service # init service
tags: [service]
...
任务详情
使用以下命令可以列出数据库集群初始化的所有任务,以及可以使用的标签:
默认任务如下:
playbook: ./pgsql.yml
play #1 (all): Pgsql Initialization TAGS: []
tasks:
node : Update node hostname TAGS: [infra, node, node_name]
node : Add new hostname to /etc/hosts TAGS: [infra, node, node_name]
node : Write static dns records TAGS: [infra, node, node_dns]
node : Get old nameservers TAGS: [infra, node, node_resolv]
node : Truncate resolv file TAGS: [infra, node, node_resolv]
node : Write resolv options TAGS: [infra, node, node_resolv]
node : Add new nameservers TAGS: [infra, node, node_resolv]
node : Append old nameservers TAGS: [infra, node, node_resolv]
node : Node configure disable firewall TAGS: [infra, node, node_firewall]
node : Node disable selinux by default TAGS: [infra, node, node_firewall]
node : Backup existing repos TAGS: [infra, node, node_repo]
node : Install upstream repo TAGS: [infra, node, node_repo]
node : Install local repo TAGS: [infra, node, node_repo]
node : Install node basic packages TAGS: [infra, node, node_pkgs]
node : Install node extra packages TAGS: [infra, node, node_pkgs]
node : Install meta specific packages TAGS: [infra, node, node_pkgs]
node : Install node basic packages TAGS: [infra, node, node_pkgs]
node : Install node extra packages TAGS: [infra, node, node_pkgs]
node : Install meta specific packages TAGS: [infra, node, node_pkgs]
node : Node configure disable numa TAGS: [infra, node, node_feature]
node : Node configure disable swap TAGS: [infra, node, node_feature]
node : Node configure unmount swap TAGS: [infra, node, node_feature]
node : Node setup static network TAGS: [infra, node, node_feature]
node : Node configure disable firewall TAGS: [infra, node, node_feature]
node : Node configure disk prefetch TAGS: [infra, node, node_feature]
node : Enable linux kernel modules TAGS: [infra, node, node_kernel]
node : Enable kernel module on reboot TAGS: [infra, node, node_kernel]
node : Get config parameter page count TAGS: [infra, node, node_tuned]
node : Get config parameter page size TAGS: [infra, node, node_tuned]
node : Tune shmmax and shmall via mem TAGS: [infra, node, node_tuned]
node : Create tuned profiles TAGS: [infra, node, node_tuned]
node : Render tuned profiles TAGS: [infra, node, node_tuned]
node : Active tuned profile TAGS: [infra, node, node_tuned]
node : Change additional sysctl params TAGS: [infra, node, node_tuned]
node : Copy default user bash profile TAGS: [infra, node, node_profile]
node : Setup node default pam ulimits TAGS: [infra, node, node_ulimit]
node : Create os user group admin TAGS: [infra, node, node_admin]
node : Create os user admin TAGS: [infra, node, node_admin]
node : Grant admin group nopass sudo TAGS: [infra, node, node_admin]
node : Add no host checking to ssh config TAGS: [infra, node, node_admin]
node : Add admin ssh no host checking TAGS: [infra, node, node_admin]
node : Fetch all admin public keys TAGS: [infra, node, node_admin]
node : Exchange all admin ssh keys TAGS: [infra, node, node_admin]
node : Install public keys TAGS: [infra, node, node_admin]
node : Install ntp package TAGS: [infra, node, ntp_install]
node : Install chrony package TAGS: [infra, node, ntp_install]
node : Setup default node timezone TAGS: [infra, node, ntp_config]
node : Copy the ntp.conf file TAGS: [infra, node, ntp_config]
node : Copy the chrony.conf template TAGS: [infra, node, ntp_config]
node : Launch ntpd service TAGS: [infra, node, ntp_launch]
node : Launch chronyd service TAGS: [infra, node, ntp_launch]
consul : Check for existing consul TAGS: [consul_check, dcs, infra]
consul : Consul exists flag fact set TAGS: [consul_check, dcs, infra]
consul : Abort due to consul exists TAGS: [consul_check, dcs, infra]
consul : Clean existing consul instance TAGS: [consul_clean, dcs, infra]
consul : Stop any running consul instance TAGS: [consul_clean, dcs, infra]
consul : Remove existing consul dir TAGS: [consul_clean, dcs, infra]
consul : Recreate consul dir TAGS: [consul_clean, dcs, infra]
consul : Make sure consul is installed TAGS: [consul_install, dcs, infra]
consul : Make sure consul dir exists TAGS: [consul_config, dcs, infra]
consul : Get dcs server node names TAGS: [consul_config, dcs, infra]
consul : Get dcs node name from var TAGS: [consul_config, dcs, infra]
consul : Get dcs node name from var TAGS: [consul_config, dcs, infra]
consul : Fetch hostname as dcs node name TAGS: [consul_config, dcs, infra]
consul : Get dcs name from hostname TAGS: [consul_config, dcs, infra]
consul : Copy /etc/consul.d/consul.json TAGS: [consul_config, dcs, infra]
consul : Copy consul agent service TAGS: [consul_config, dcs, infra]
consul : Get dcs bootstrap expect quroum TAGS: [consul_server, dcs, infra]
consul : Copy consul server service unit TAGS: [consul_server, dcs, infra]
consul : Launch consul server service TAGS: [consul_server, dcs, infra]
consul : Wait for consul server online TAGS: [consul_server, dcs, infra]
consul : Launch consul agent service TAGS: [consul_agent, dcs, infra]
consul : Wait for consul agent online TAGS: [consul_agent, dcs, infra]
postgres : Create os group postgres TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Make sure dcs group exists TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Create dbsu {{ pg_dbsu }} TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Grant dbsu nopass sudo TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Grant dbsu all sudo TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Grant dbsu limited sudo TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Config patroni watchdog support TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Add dbsu ssh no host checking TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Fetch dbsu public keys TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Exchange dbsu ssh keys TAGS: [instal, pg_dbsu, pgsql, postgres]
postgres : Install offical pgdg yum repo TAGS: [instal, pg_install, pgsql, postgres]
postgres : Install pg packages TAGS: [instal, pg_install, pgsql, postgres]
postgres : Install pg extensions TAGS: [instal, pg_install, pgsql, postgres]
postgres : Link /usr/pgsql to current version TAGS: [instal, pg_install, pgsql, postgres]
postgres : Add pg bin dir to profile path TAGS: [instal, pg_install, pgsql, postgres]
postgres : Fix directory ownership TAGS: [instal, pg_install, pgsql, postgres]
postgres : Remove default postgres service TAGS: [instal, pg_install, pgsql, postgres]
postgres : Check necessary variables exists TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Fetch variables via pg_cluster TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Set cluster basic facts for hosts TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Assert cluster primary singleton TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Setup cluster primary ip address TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Setup repl upstream for primary TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Setup repl upstream for replicas TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Debug print instance summary TAGS: [always, pg_preflight, pgsql, postgres, preflight]
postgres : Check for existing postgres instance TAGS: [pg_check, pgsql, postgres, prepare]
postgres : Set fact whether pg port is open TAGS: [pg_check, pgsql, postgres, prepare]
postgres : Abort due to existing postgres instance TAGS: [pg_check, pgsql, postgres, prepare]
postgres : Clean existing postgres instance TAGS: [pg_check, pgsql, postgres, prepare]
postgres : Shutdown existing postgres service TAGS: [pg_clean, pgsql, postgres, prepare]
postgres : Remove registerd consul service TAGS: [pg_clean, pgsql, postgres, prepare]
postgres : Remove postgres metadata in consul TAGS: [pg_clean, pgsql, postgres, prepare]
postgres : Remove existing postgres data TAGS: [pg_clean, pgsql, postgres, prepare]
postgres : Make sure main and backup dir exists TAGS: [pg_dir, pgsql, postgres, prepare]
postgres : Create postgres directory structure TAGS: [pg_dir, pgsql, postgres, prepare]
postgres : Create pgbouncer directory structure TAGS: [pg_dir, pgsql, postgres, prepare]
postgres : Create links from pgbkup to pgroot TAGS: [pg_dir, pgsql, postgres, prepare]
postgres : Create links from current cluster TAGS: [pg_dir, pgsql, postgres, prepare]
postgres : Copy pg_cluster to /pg/meta/cluster TAGS: [pg_meta, pgsql, postgres, prepare]
postgres : Copy pg_version to /pg/meta/version TAGS: [pg_meta, pgsql, postgres, prepare]
postgres : Copy pg_instance to /pg/meta/instance TAGS: [pg_meta, pgsql, postgres, prepare]
postgres : Copy pg_seq to /pg/meta/sequence TAGS: [pg_meta, pgsql, postgres, prepare]
postgres : Copy pg_role to /pg/meta/role TAGS: [pg_meta, pgsql, postgres, prepare]
postgres : Copy postgres scripts to /pg/bin/ TAGS: [pg_scripts, pgsql, postgres, prepare]
postgres : Copy alias profile to /etc/profile.d TAGS: [pg_scripts, pgsql, postgres, prepare]
postgres : Copy psqlrc to postgres home TAGS: [pg_scripts, pgsql, postgres, prepare]
postgres : Setup hostname to pg instance name TAGS: [pg_hostname, pgsql, postgres, prepare]
postgres : Copy consul node-meta definition TAGS: [pg_nodemeta, pgsql, postgres, prepare]
postgres : Restart consul to load new node-meta TAGS: [pg_nodemeta, pgsql, postgres, prepare]
postgres : Config patroni watchdog support TAGS: [pg_watchdog, pgsql, postgres, prepare]
postgres : Get config parameter page count TAGS: [pg_config, pgsql, postgres]
postgres : Get config parameter page size TAGS: [pg_config, pgsql, postgres]
postgres : Tune shared buffer and work mem TAGS: [pg_config, pgsql, postgres]
postgres : Hanlde small size mem occasion TAGS: [pg_config, pgsql, postgres]
postgres : Calculate postgres mem params TAGS: [pg_config, pgsql, postgres]
postgres : create patroni config dir TAGS: [pg_config, pgsql, postgres]
postgres : use predefined patroni template TAGS: [pg_config, pgsql, postgres]
postgres : Render default /pg/conf/patroni.yml TAGS: [pg_config, pgsql, postgres]
postgres : Link /pg/conf/patroni to /pg/bin/ TAGS: [pg_config, pgsql, postgres]
postgres : Link /pg/bin/patroni.yml to /etc/patroni/ TAGS: [pg_config, pgsql, postgres]
postgres : Config patroni watchdog support TAGS: [pg_config, pgsql, postgres]
postgres : Copy patroni systemd service file TAGS: [pg_config, pgsql, postgres]
postgres : create patroni systemd drop-in dir TAGS: [pg_config, pgsql, postgres]
postgres : Copy postgres systemd service file TAGS: [pg_config, pgsql, postgres]
postgres : Drop-In consul dependency for patroni TAGS: [pg_config, pgsql, postgres]
postgres : Render default initdb scripts TAGS: [pg_config, pgsql, postgres]
postgres : Launch patroni on primary instance TAGS: [pg_primary, pgsql, postgres]
postgres : Wait for patroni primary online TAGS: [pg_primary, pgsql, postgres]
postgres : Wait for postgres primary online TAGS: [pg_primary, pgsql, postgres]
postgres : Check primary postgres service ready TAGS: [pg_primary, pgsql, postgres]
postgres : Check replication connectivity to primary TAGS: [pg_primary, pgsql, postgres]
postgres : Render init roles sql TAGS: [pg_init, pg_init_role, pgsql, postgres]
postgres : Render init template sql TAGS: [pg_init, pg_init_tmpl, pgsql, postgres]
postgres : Render default pg-init scripts TAGS: [pg_init, pg_init_main, pgsql, postgres]
postgres : Execute initialization scripts TAGS: [pg_init, pg_init_exec, pgsql, postgres]
postgres : Check primary instance ready TAGS: [pg_init, pg_init_exec, pgsql, postgres]
postgres : Add dbsu password to pgpass if exists TAGS: [pg_pass, pgsql, postgres]
postgres : Add system user to pgpass TAGS: [pg_pass, pgsql, postgres]
postgres : Check replication connectivity to primary TAGS: [pg_replica, pgsql, postgres]
postgres : Launch patroni on replica instances TAGS: [pg_replica, pgsql, postgres]
postgres : Wait for patroni replica online TAGS: [pg_replica, pgsql, postgres]
postgres : Wait for postgres replica online TAGS: [pg_replica, pgsql, postgres]
postgres : Check replica postgres service ready TAGS: [pg_replica, pgsql, postgres]
postgres : Render hba rules TAGS: [pg_hba, pgsql, postgres]
postgres : Reload hba rules TAGS: [pg_hba, pgsql, postgres]
postgres : Pause patroni TAGS: [pg_patroni, pgsql, postgres]
postgres : Stop patroni on replica instance TAGS: [pg_patroni, pgsql, postgres]
postgres : Stop patroni on primary instance TAGS: [pg_patroni, pgsql, postgres]
postgres : Launch raw postgres on primary TAGS: [pg_patroni, pgsql, postgres]
postgres : Launch raw postgres on primary TAGS: [pg_patroni, pgsql, postgres]
postgres : Wait for postgres online TAGS: [pg_patroni, pgsql, postgres]
postgres : Check pgbouncer is installed TAGS: [pgbouncer, pgbouncer_check, pgsql, postgres]
postgres : Stop existing pgbouncer service TAGS: [pgbouncer, pgbouncer_clean, pgsql, postgres]
postgres : Remove existing pgbouncer dirs TAGS: [pgbouncer, pgbouncer_clean, pgsql, postgres]
postgres : Recreate dirs with owner postgres TAGS: [pgbouncer, pgbouncer_clean, pgsql, postgres]
postgres : Copy /etc/pgbouncer/pgbouncer.ini TAGS: [pgbouncer, pgbouncer_config, pgbouncer_ini, pgsql, postgres]
postgres : Copy /etc/pgbouncer/pgb_hba.conf TAGS: [pgbouncer, pgbouncer_config, pgbouncer_hba, pgsql, postgres]
postgres : Touch userlist and database list TAGS: [pgbouncer, pgbouncer_config, pgsql, postgres]
postgres : Add default users to pgbouncer TAGS: [pgbouncer, pgbouncer_config, pgsql, postgres]
postgres : Copy pgbouncer systemd service TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
postgres : Launch pgbouncer pool service TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
postgres : Wait for pgbouncer service online TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
postgres : Check pgbouncer service is ready TAGS: [pgbouncer, pgbouncer_launch, pgsql, postgres]
include_tasks TAGS: [pg_user, pgsql, postgres]
include_tasks TAGS: [pg_db, pgsql, postgres]
postgres : Reload pgbouncer to add db and users TAGS: [pgbouncer_reload, pgsql, postgres]
postgres : Copy pg service definition to consul TAGS: [pg_register, pgsql, postgres, register]
postgres : Reload postgres consul service TAGS: [pg_register, pgsql, postgres, register]
postgres : Render grafana datasource definition TAGS: [pg_grafana, pgsql, postgres, register]
postgres : Register datasource to grafana TAGS: [pg_grafana, pgsql, postgres, register]
monitor : Install exporter yum repo TAGS: [exporter_install, exporter_yum_install, monitor, pgsql]
monitor : Install node_exporter and pg_exporter TAGS: [exporter_install, exporter_yum_install, monitor, pgsql]
monitor : Copy node_exporter binary TAGS: [exporter_binary_install, exporter_install, monitor, pgsql]
monitor : Copy pg_exporter binary TAGS: [exporter_binary_install, exporter_install, monitor, pgsql]
monitor : Create /etc/pg_exporter conf dir TAGS: [monitor, pg_exporter, pgsql]
monitor : Copy default pg_exporter.yaml TAGS: [monitor, pg_exporter, pgsql]
monitor : Config /etc/default/pg_exporter TAGS: [monitor, pg_exporter, pgsql]
monitor : Config pg_exporter service unit TAGS: [monitor, pg_exporter, pgsql]
monitor : Launch pg_exporter systemd service TAGS: [monitor, pg_exporter, pgsql]
monitor : Wait for pg_exporter service online TAGS: [monitor, pg_exporter, pgsql]
monitor : Register pg-exporter consul service TAGS: [monitor, pg_exporter_register, pgsql]
monitor : Reload pg-exporter consul service TAGS: [monitor, pg_exporter_register, pgsql]
monitor : Config pgbouncer_exporter opts TAGS: [monitor, pgbouncer_exporter, pgsql]
monitor : Config pgbouncer_exporter service TAGS: [monitor, pgbouncer_exporter, pgsql]
monitor : Launch pgbouncer_exporter service TAGS: [monitor, pgbouncer_exporter, pgsql]
monitor : Wait for pgbouncer_exporter online TAGS: [monitor, pgbouncer_exporter, pgsql]
monitor : Register pgb-exporter consul service TAGS: [monitor, node_exporter_register, pgsql]
monitor : Reload pgb-exporter consul service TAGS: [monitor, node_exporter_register, pgsql]
monitor : Copy node_exporter systemd service TAGS: [monitor, node_exporter, pgsql]
monitor : Config default node_exporter options TAGS: [monitor, node_exporter, pgsql]
monitor : Launch node_exporter service unit TAGS: [monitor, node_exporter, pgsql]
monitor : Wait for node_exporter online TAGS: [monitor, node_exporter, pgsql]
monitor : Register node-exporter service to consul TAGS: [monitor, node_exporter_register, pgsql]
monitor : Reload node-exporter consul service TAGS: [monitor, node_exporter_register, pgsql]
service : Make sure haproxy is installed TAGS: [haproxy_install, service]
service : Create haproxy directory TAGS: [haproxy_install, service]
service : Copy haproxy systemd service file TAGS: [haproxy_install, haproxy_unit, service]
service : Fetch postgres cluster memberships TAGS: [haproxy_config, service]
service : Templating /etc/haproxy/haproxy.cfg TAGS: [haproxy_config, service]
service : Launch haproxy load balancer service TAGS: [haproxy_launch, haproxy_restart, service]
service : Wait for haproxy load balancer online TAGS: [haproxy_launch, service]
service : Reload haproxy load balancer service TAGS: [haproxy_reload, service]
service : Copy haproxy exporter definition TAGS: [haproxy_register, service]
service : Copy haproxy service definition TAGS: [haproxy_register, service]
service : Reload haproxy consul service TAGS: [haproxy_register, service]
service : Make sure vip-manager is installed TAGS: [service, vip_l2_install]
service : Copy vip-manager systemd service file TAGS: [service, vip_l2_install]
service : create vip-manager systemd drop-in dir TAGS: [service, vip_l2_install]
service : create vip-manager systemd drop-in file TAGS: [service, vip_l2_install]
service : Templating /etc/default/vip-manager.yml TAGS: [service, vip_l2_config, vip_manager_config]
service : Launch vip-manager TAGS: [service, vip_l2_reload]
service : Fetch postgres cluster memberships TAGS: [service, vip_l4_config]
service : Render L4 VIP configs TAGS: [service, vip_l4_config]
include_tasks TAGS: [service, vip_l4_reload]
5.3.3 - 沙箱初始化
如何使用快速部署沙箱环境
常规初始化流程需要先完成元节点/基础设施的初始化,再完成其他数据库节点的初始化。
为了加快沙箱环境的初始化速度,Pigsty提供了专用于沙箱的初始化剧本sandbox.yml
,可以采用交织的方式一次性同时完成基础设施元节点和普通节点的初始化。这种初始化方式很快,但不建议在生产环境使用。
剧本概览
用户可以直接调用sandbox.yml
或通过make init
的快捷方式完成沙箱环境的一键初始化。
注意事项
沙箱初始化的具体注意事项与 基础设施部署 和 PG集群部署 一致。
剧本说明
sandbox.yml
将infra.yml
与pgsql.yml
的工作交织在一起,如下所示:
#------------------------------------------------------------------------------
# init local yum repo on meta node
#------------------------------------------------------------------------------
- name: Init local repo
become: yes
hosts: meta
gather_facts: no
tags: repo
roles:
- repo
#------------------------------------------------------------------------------
# provision all nodes
#------------------------------------------------------------------------------
# node provision depends on existing repo on meta node
- name: Provision Node
become: yes
hosts: all
gather_facts: no
tags: node
roles:
- node
#------------------------------------------------------------------------------
# init meta service on meta node
#------------------------------------------------------------------------------
# meta provision depends on node provision. You'll have to provision node on meta node
# then provision meta infrastructure on meta node
- name: Init meta service
become: yes
hosts: meta
gather_facts: no
tags: meta
roles:
- role: ca
tags: ca
- role: nameserver
tags: nameserver
- role: nginx
tags: nginx
- role: prometheus
tags: prometheus
- role: grafana
tags: grafana
#------------------------------------------------------------------------------
# init dcs on nodes
#------------------------------------------------------------------------------
# typically you'll have to bootstrap dcs on meta node first (or use external dcs)
# but pigsty allows you to setup server and agent at the same time.
- name: Init dcs
become: yes
hosts: all # provision all nodes or just meta nodes
gather_facts: no
roles:
- role: consul
tags: dcs
#------------------------------------------------------------------------------
# create or recreate postgres database clusters
#------------------------------------------------------------------------------
- name: Init database cluster
become: yes
hosts: all
gather_facts: false
roles:
- role: postgres # init postgres
tags: postgres
- role: monitor # init monitor system
tags: monitor
- role: haproxy # init haproxy
tags: haproxy
- role: vip # init vip-manager
tags: vip
默认任务
使用以下命令可以列出所有沙箱初始化会执行的任务,以及可以使用的标签:
./sandbox.yml --list-tasks
任务列表如下:
playbook: ./sandbox.yml
play #1 (meta): Init local repo TAGS: [repo]
tasks:
repo : Create local repo directory TAGS: [repo, repo_dir]
repo : Backup & remove existing repos TAGS: [repo, repo_upstream]
repo : Add required upstream repos TAGS: [repo, repo_upstream]
repo : Check repo pkgs cache exists TAGS: [repo, repo_prepare]
repo : Set fact whether repo_exists TAGS: [repo, repo_prepare]
repo : Move upstream repo to backup TAGS: [repo, repo_prepare]
repo : Add local file system repos TAGS: [repo, repo_prepare]
repo : Remake yum cache if not exists TAGS: [repo, repo_prepare]
repo : Install repo bootstrap packages TAGS: [repo, repo_boot]
repo : Render repo nginx server files TAGS: [repo, repo_nginx]
repo : Disable selinux for repo server TAGS: [repo, repo_nginx]
repo : Launch repo nginx server TAGS: [repo, repo_nginx]
repo : Waits repo server online TAGS: [repo, repo_nginx]
repo : Download web url packages TAGS: [repo, repo_download]
repo : Download repo packages TAGS: [repo, repo_download]
repo : Download repo pkg deps TAGS: [repo, repo_download]
repo : Create local repo index TAGS: [repo, repo_download]
repo : Copy bootstrap scripts TAGS: [repo, repo_download, repo_script]
repo : Mark repo cache as valid TAGS: [repo, repo_download]
play #2 (all): Provision Node TAGS: [node]
tasks:
node : Update node hostname TAGS: [node, node_name]
node : Add new hostname to /etc/hosts TAGS: [node, node_name]
node : Write static dns records TAGS: [node, node_dns]
node : Get old nameservers TAGS: [node, node_resolv]
node : Truncate resolv file TAGS: [node, node_resolv]
node : Write resolv options TAGS: [node, node_resolv]
node : Add new nameservers TAGS: [node, node_resolv]
node : Append old nameservers TAGS: [node, node_resolv]
node : Node configure disable firewall TAGS: [node, node_firewall]
node : Node disable selinux by default TAGS: [node, node_firewall]
node : Backup existing repos TAGS: [node, node_repo]
node : Install upstream repo TAGS: [node, node_repo]
node : Install local repo TAGS: [node, node_repo]
node : Install node basic packages TAGS: [node, node_pkgs]
node : Install node extra packages TAGS: [node, node_pkgs]
node : Install meta specific packages TAGS: [node, node_pkgs]
node : Install node basic packages TAGS: [node, node_pkgs]
node : Install node extra packages TAGS: [node, node_pkgs]
node : Install meta specific packages TAGS: [node, node_pkgs]
node : Node configure disable numa TAGS: [node, node_feature]
node : Node configure disable swap TAGS: [node, node_feature]
node : Node configure unmount swap TAGS: [node, node_feature]
node : Node setup static network TAGS: [node, node_feature]
node : Node configure disable firewall TAGS: [node, node_feature]
node : Node configure disk prefetch TAGS: [node, node_feature]
node : Enable linux kernel modules TAGS: [node, node_kernel]
node : Enable kernel module on reboot TAGS: [node, node_kernel]
node : Get config parameter page count TAGS: [node, node_tuned]
node : Get config parameter page size TAGS: [node, node_tuned]
node : Tune shmmax and shmall via mem TAGS: [node, node_tuned]
node : Create tuned profiles TAGS: [node, node_tuned]
node : Render tuned profiles TAGS: [node, node_tuned]
node : Active tuned profile TAGS: [node, node_tuned]
node : Change additional sysctl params TAGS: [node, node_tuned]
node : Copy default user bash profile TAGS: [node, node_profile]
node : Setup node default pam ulimits TAGS: [node, node_ulimit]
node : Create os user group admin TAGS: [node, node_admin]
node : Create os user admin TAGS: [node, node_admin]
node : Grant admin group nopass sudo TAGS: [node, node_admin]
node : Add no host checking to ssh config TAGS: [node, node_admin]
node : Add admin ssh no host checking TAGS: [node, node_admin]
node : Fetch all admin public keys TAGS: [node, node_admin]
node : Exchange all admin ssh keys TAGS: [node, node_admin]
node : Install public keys TAGS: [node, node_admin]
node : Install ntp package TAGS: [node, ntp_install]
node : Install chrony package TAGS: [node, ntp_install]
node : Setup default node timezone TAGS: [node, ntp_config]
node : Copy the ntp.conf file TAGS: [node, ntp_config]
node : Copy the chrony.conf template TAGS: [node, ntp_config]
node : Launch ntpd service TAGS: [node, ntp_launch]
node : Launch chronyd service TAGS: [node, ntp_launch]
play #3 (meta): Init meta service TAGS: [meta]
tasks:
ca : Create local ca directory TAGS: [ca, ca_dir, meta]
ca : Copy ca cert from local files TAGS: [ca, ca_copy, meta]
ca : Check ca key cert exists TAGS: [ca, ca_create, meta]
ca : Create self-signed CA key-cert TAGS: [ca, ca_create, meta]
nameserver : Make sure dnsmasq package installed TAGS: [meta, nameserver]
nameserver : Copy dnsmasq /etc/dnsmasq.d/config TAGS: [meta, nameserver]
nameserver : Add dynamic dns records to meta TAGS: [meta, nameserver]
nameserver : Launch meta dnsmasq service TAGS: [meta, nameserver]
nameserver : Wait for meta dnsmasq online TAGS: [meta, nameserver]
nameserver : Register consul dnsmasq service TAGS: [meta, nameserver]
nameserver : Reload consul TAGS: [meta, nameserver]
nginx : Make sure nginx package installed TAGS: [meta, nginx, nginx_install]
nginx : Create local html directory TAGS: [meta, nginx, nginx_dir]
nginx : Update default nginx index page TAGS: [meta, nginx, nginx_dir]
nginx : Copy nginx default config TAGS: [meta, nginx, nginx_config]
nginx : Copy nginx upstream conf TAGS: [meta, nginx, nginx_config]
nginx : Fetch haproxy facts TAGS: [meta, nginx, nginx_config, nginx_haproxy]
nginx : Templating /etc/nginx/haproxy.conf TAGS: [meta, nginx, nginx_config, nginx_haproxy]
nginx : Templating haproxy.html TAGS: [meta, nginx, nginx_config, nginx_haproxy]
nginx : Launch nginx server TAGS: [meta, nginx, nginx_reload]
nginx : Restart meta nginx service TAGS: [meta, nginx, nginx_launch]
nginx : Wait for nginx service online TAGS: [meta, nginx, nginx_launch]
nginx : Make sure nginx exporter installed TAGS: [meta, nginx, nginx_exporter]
nginx : Config nginx_exporter options TAGS: [meta, nginx, nginx_exporter]
nginx : Restart nginx_exporter service TAGS: [meta, nginx, nginx_exporter]
nginx : Wait for nginx exporter online TAGS: [meta, nginx, nginx_exporter]
nginx : Register cosnul nginx service TAGS: [meta, nginx, nginx_register]
nginx : Register consul nginx-exporter service TAGS: [meta, nginx, nginx_register]
nginx : Reload consul TAGS: [meta, nginx, nginx_register]
prometheus : Install prometheus and alertmanager TAGS: [meta, prometheus, prometheus_install]
prometheus : Wipe out prometheus config dir TAGS: [meta, prometheus, prometheus_clean]
prometheus : Wipe out existing prometheus data TAGS: [meta, prometheus, prometheus_clean]
prometheus : Create postgres directory structure TAGS: [meta, prometheus, prometheus_config]
prometheus : Copy prometheus bin scripts TAGS: [meta, prometheus, prometheus_config]
prometheus : Copy prometheus rules scripts TAGS: [meta, prometheus, prometheus_config]
prometheus : Copy altermanager config TAGS: [meta, prometheus, prometheus_config]
prometheus : Render prometheus config TAGS: [meta, prometheus, prometheus_config]
prometheus : Config /etc/prometheus opts TAGS: [meta, prometheus, prometheus_config]
prometheus : Fetch prometheus static monitoring targets TAGS: [meta, prometheus, prometheus_config, prometheus_targets]
prometheus : Render prometheus static targets TAGS: [meta, prometheus, prometheus_config, prometheus_targets]
prometheus : Launch prometheus service TAGS: [meta, prometheus, prometheus_launch]
prometheus : Launch alertmanager service TAGS: [meta, prometheus, prometheus_launch]
prometheus : Wait for prometheus online TAGS: [meta, prometheus, prometheus_launch]
prometheus : Wait for alertmanager online TAGS: [meta, prometheus, prometheus_launch]
prometheus : Reload prometheus service TAGS: [meta, prometheus, prometheus_reload]
prometheus : Copy prometheus service definition TAGS: [meta, prometheus, prometheus_register]
prometheus : Copy alertmanager service definition TAGS: [meta, prometheus, prometheus_register]
prometheus : Reload consul to register prometheus TAGS: [meta, prometheus, prometheus_register]
grafana : Make sure grafana is installed TAGS: [grafana, grafana_install, meta]
grafana : Check grafana plugin cache exists TAGS: [grafana, grafana_plugin, meta]
grafana : Provision grafana plugins via cache TAGS: [grafana, grafana_plugin, meta]
grafana : Download grafana plugins from web TAGS: [grafana, grafana_plugin, meta]
grafana : Download grafana plugins from web TAGS: [grafana, grafana_plugin, meta]
grafana : Create grafana plugins cache TAGS: [grafana, grafana_plugin, meta]
grafana : Copy /etc/grafana/grafana.ini TAGS: [grafana, grafana_config, meta]
grafana : Remove grafana provision dir TAGS: [grafana, grafana_config, meta]
grafana : Copy provisioning content TAGS: [grafana, grafana_config, meta]
grafana : Copy pigsty dashboards TAGS: [grafana, grafana_config, meta]
grafana : Copy pigsty icon image TAGS: [grafana, grafana_config, meta]
grafana : Replace grafana icon with pigsty TAGS: [grafana, grafana_config, grafana_customize, meta]
grafana : Launch grafana service TAGS: [grafana, grafana_launch, meta]
grafana : Wait for grafana online TAGS: [grafana, grafana_launch, meta]
grafana : Update grafana default preferences TAGS: [grafana, grafana_provision, meta]
grafana : Register consul grafana service TAGS: [grafana, grafana_register, meta]
grafana : Reload consul TAGS: [grafana, grafana_register, meta]
play #4 (all): Init dcs TAGS: []
tasks:
consul : Check for existing consul TAGS: [consul_check, dcs]
consul : Consul exists flag fact set TAGS: [consul_check, dcs]
consul : Abort due to consul exists TAGS: [consul_check, dcs]
consul : Clean existing consul instance TAGS: [consul_clean, dcs]
consul : Stop any running consul instance TAGS: [consul_clean, dcs]
consul : Remove existing consul dir TAGS: [consul_clean, dcs]
consul : Recreate consul dir TAGS: [consul_clean, dcs]
consul : Make sure consul is installed TAGS: [consul_install, dcs]
consul : Make sure consul dir exists TAGS: [consul_config, dcs]
consul : Get dcs server node names TAGS: [consul_config, dcs]
consul : Get dcs node name from var TAGS: [consul_config, dcs]
consul : Get dcs node name from var TAGS: [consul_config, dcs]
consul : Fetch hostname as dcs node name TAGS: [consul_config, dcs]
consul : Get dcs name from hostname TAGS: [consul_config, dcs]
consul : Copy /etc/consul.d/consul.json TAGS: [consul_config, dcs]
consul : Copy consul agent service TAGS: [consul_config, dcs]
consul : Get dcs bootstrap expect quroum TAGS: [consul_server, dcs]
consul : Copy consul server service unit TAGS: [consul_server, dcs]
consul : Launch consul server service TAGS: [consul_server, dcs]
consul : Wait for consul server online TAGS: [consul_server, dcs]
consul : Launch consul agent service TAGS: [consul_agent, dcs]
consul : Wait for consul agent online TAGS: [consul_agent, dcs]
play #5 (all): Init database cluster TAGS: []
tasks:
postgres : Create os group postgres TAGS: [instal, pg_dbsu, postgres]
postgres : Make sure dcs group exists TAGS: [instal, pg_dbsu, postgres]
postgres : Create dbsu {{ pg_dbsu }} TAGS: [instal, pg_dbsu, postgres]
postgres : Grant dbsu nopass sudo TAGS: [instal, pg_dbsu, postgres]
postgres : Grant dbsu all sudo TAGS: [instal, pg_dbsu, postgres]
postgres : Grant dbsu limited sudo TAGS: [instal, pg_dbsu, postgres]
postgres : Config patroni watchdog support TAGS: [instal, pg_dbsu, postgres]
postgres : Add dbsu ssh no host checking TAGS: [instal, pg_dbsu, postgres]
postgres : Fetch dbsu public keys TAGS: [instal, pg_dbsu, postgres]
postgres : Exchange dbsu ssh keys TAGS: [instal, pg_dbsu, postgres]
postgres : Install offical pgdg yum repo TAGS: [instal, pg_install, postgres]
postgres : Install pg packages TAGS: [instal, pg_install, postgres]
postgres : Install pg extensions TAGS: [instal, pg_install, postgres]
postgres : Link /usr/pgsql to current version TAGS: [instal, pg_install, postgres]
postgres : Add pg bin dir to profile path TAGS: [instal, pg_install, postgres]
postgres : Fix directory ownership TAGS: [instal, pg_install, postgres]
postgres : Remove default postgres service TAGS: [instal, pg_install, postgres]
postgres : Check necessary variables exists TAGS: [always, pg_preflight, postgres, preflight]
postgres : Fetch variables via pg_cluster TAGS: [always, pg_preflight, postgres, preflight]
postgres : Set cluster basic facts for hosts TAGS: [always, pg_preflight, postgres, preflight]
postgres : Assert cluster primary singleton TAGS: [always, pg_preflight, postgres, preflight]
postgres : Setup cluster primary ip address TAGS: [always, pg_preflight, postgres, preflight]
postgres : Setup repl upstream for primary TAGS: [always, pg_preflight, postgres, preflight]
postgres : Setup repl upstream for replicas TAGS: [always, pg_preflight, postgres, preflight]
postgres : Debug print instance summary TAGS: [always, pg_preflight, postgres, preflight]
postgres : Check for existing postgres instance TAGS: [pg_check, postgres, prepare]
postgres : Set fact whether pg port is open TAGS: [pg_check, postgres, prepare]
postgres : Abort due to existing postgres instance TAGS: [pg_check, postgres, prepare]
postgres : Clean existing postgres instance TAGS: [pg_check, postgres, prepare]
postgres : Shutdown existing postgres service TAGS: [pg_clean, postgres, prepare]
postgres : Remove registerd consul service TAGS: [pg_clean, postgres, prepare]
postgres : Remove postgres metadata in consul TAGS: [pg_clean, postgres, prepare]
postgres : Remove existing postgres data TAGS: [pg_clean, postgres, prepare]
postgres : Make sure main and backup dir exists TAGS: [pg_dir, postgres, prepare]
postgres : Create postgres directory structure TAGS: [pg_dir, postgres, prepare]
postgres : Create pgbouncer directory structure TAGS: [pg_dir, postgres, prepare]
postgres : Create links from pgbkup to pgroot TAGS: [pg_dir, postgres, prepare]
postgres : Create links from current cluster TAGS: [pg_dir, postgres, prepare]
postgres : Copy pg_cluster to /pg/meta/cluster TAGS: [pg_meta, postgres, prepare]
postgres : Copy pg_version to /pg/meta/version TAGS: [pg_meta, postgres, prepare]
postgres : Copy pg_instance to /pg/meta/instance TAGS: [pg_meta, postgres, prepare]
postgres : Copy pg_seq to /pg/meta/sequence TAGS: [pg_meta, postgres, prepare]
postgres : Copy pg_role to /pg/meta/role TAGS: [pg_meta, postgres, prepare]
postgres : Copy postgres scripts to /pg/bin/ TAGS: [pg_scripts, postgres, prepare]
postgres : Copy alias profile to /etc/profile.d TAGS: [pg_scripts, postgres, prepare]
postgres : Copy psqlrc to postgres home TAGS: [pg_scripts, postgres, prepare]
postgres : Setup hostname to pg instance name TAGS: [pg_hostname, postgres, prepare]
postgres : Copy consul node-meta definition TAGS: [pg_nodemeta, postgres, prepare]
postgres : Restart consul to load new node-meta TAGS: [pg_nodemeta, postgres, prepare]
postgres : Config patroni watchdog support TAGS: [pg_watchdog, postgres, prepare]
postgres : Get config parameter page count TAGS: [pg_config, postgres]
postgres : Get config parameter page size TAGS: [pg_config, postgres]
postgres : Tune shared buffer and work mem TAGS: [pg_config, postgres]
postgres : Hanlde small size mem occasion TAGS: [pg_config, postgres]
postgres : Calculate postgres mem params TAGS: [pg_config, postgres]
postgres : create patroni config dir TAGS: [pg_config, postgres]
postgres : use predefined patroni template TAGS: [pg_config, postgres]
postgres : Render default /pg/conf/patroni.yml TAGS: [pg_config, postgres]
postgres : Link /pg/conf/patroni to /pg/bin/ TAGS: [pg_config, postgres]
postgres : Link /pg/bin/patroni.yml to /etc/patroni/ TAGS: [pg_config, postgres]
postgres : Config patroni watchdog support TAGS: [pg_config, postgres]
postgres : create patroni systemd drop-in dir TAGS: [pg_config, postgres]
postgres : Copy postgres systemd service file TAGS: [pg_config, postgres]
postgres : create patroni systemd drop-in file TAGS: [pg_config, postgres]
postgres : Render default initdb scripts TAGS: [pg_config, postgres]
postgres : Launch patroni on primary instance TAGS: [pg_primary, postgres]
postgres : Wait for patroni primary online TAGS: [pg_primary, postgres]
postgres : Wait for postgres primary online TAGS: [pg_primary, postgres]
postgres : Check primary postgres service ready TAGS: [pg_primary, postgres]
postgres : Check replication connectivity to primary TAGS: [pg_primary, postgres]
postgres : Render default pg-init scripts TAGS: [pg_init, pg_init_config, postgres]
postgres : Render template init script TAGS: [pg_init, pg_init_config, postgres]
postgres : Execute initialization scripts TAGS: [pg_init, postgres]
postgres : Check primary instance ready TAGS: [pg_init, postgres]
postgres : Add dbsu password to pgpass if exists TAGS: [pg_pass, postgres]
postgres : Add system user to pgpass TAGS: [pg_pass, postgres]
postgres : Check replication connectivity to primary TAGS: [pg_replica, postgres]
postgres : Launch patroni on replica instances TAGS: [pg_replica, postgres]
postgres : Wait for patroni replica online TAGS: [pg_replica, postgres]
postgres : Wait for postgres replica online TAGS: [pg_replica, postgres]
postgres : Check replica postgres service ready TAGS: [pg_replica, postgres]
postgres : Render hba rules TAGS: [pg_hba, postgres]
postgres : Reload hba rules TAGS: [pg_hba, postgres]
postgres : Pause patroni TAGS: [pg_patroni, postgres]
postgres : Stop patroni on replica instance TAGS: [pg_patroni, postgres]
postgres : Stop patroni on primary instance TAGS: [pg_patroni, postgres]
postgres : Launch raw postgres on primary TAGS: [pg_patroni, postgres]
postgres : Launch raw postgres on primary TAGS: [pg_patroni, postgres]
postgres : Wait for postgres online TAGS: [pg_patroni, postgres]
postgres : Check pgbouncer is installed TAGS: [pgbouncer, pgbouncer_check, postgres]
postgres : Stop existing pgbouncer service TAGS: [pgbouncer, pgbouncer_clean, postgres]
postgres : Remove existing pgbouncer dirs TAGS: [pgbouncer, pgbouncer_clean, postgres]
postgres : Recreate dirs with owner postgres TAGS: [pgbouncer, pgbouncer_clean, postgres]
postgres : Copy /etc/pgbouncer/pgbouncer.ini TAGS: [pgbouncer, pgbouncer_config, pgbouncer_ini, postgres]
postgres : Copy /etc/pgbouncer/pgb_hba.conf TAGS: [pgbouncer, pgbouncer_config, pgbouncer_hba, postgres]
postgres : Touch userlist and database list TAGS: [pgbouncer, pgbouncer_config, postgres]
postgres : Add default users to pgbouncer TAGS: [pgbouncer, pgbouncer_config, postgres]
postgres : Copy pgbouncer systemd service TAGS: [pgbouncer, pgbouncer_launch, postgres]
postgres : Launch pgbouncer pool service TAGS: [pgbouncer, pgbouncer_launch, postgres]
postgres : Wait for pgbouncer service online TAGS: [pgbouncer, pgbouncer_launch, postgres]
postgres : Check pgbouncer service is ready TAGS: [pgbouncer, pgbouncer_launch, postgres]
postgres : Render business init script TAGS: [business, pg_biz_config, pg_biz_init, postgres]
postgres : Render database baseline sql TAGS: [business, pg_biz_config, pg_biz_init, postgres]
postgres : Execute business init script TAGS: [business, pg_biz_init, postgres]
postgres : Execute database baseline sql TAGS: [business, pg_biz_init, postgres]
postgres : Add pgbouncer busniess users TAGS: [business, pg_biz_pgbouncer, postgres]
postgres : Add pgbouncer busniess database TAGS: [business, pg_biz_pgbouncer, postgres]
postgres : Restart pgbouncer TAGS: [business, pg_biz_pgbouncer, postgres]
postgres : Copy pg service definition to consul TAGS: [pg_register, postgres, register]
postgres : Reload postgres consul service TAGS: [pg_register, postgres, register]
postgres : Render grafana datasource definition TAGS: [pg_grafana, postgres, register]
postgres : Register datasource to grafana TAGS: [pg_grafana, postgres, register]
monitor : Create /etc/pg_exporter conf dir TAGS: [monitor, pg_exporter]
monitor : Copy default pg_exporter.yaml TAGS: [monitor, pg_exporter]
monitor : Config /etc/default/pg_exporter TAGS: [monitor, pg_exporter]
monitor : Copy pg_exporter binary TAGS: [monitor, pg_exporter, pg_exporter_binary]
monitor : Config pg_exporter service unit TAGS: [monitor, pg_exporter]
monitor : Launch pg_exporter systemd service TAGS: [monitor, pg_exporter]
monitor : Wait for pg_exporter service online TAGS: [monitor, pg_exporter]
monitor : Register pg-exporter consul service TAGS: [monitor, pg_exporter_register]
monitor : Reload pg-exporter consul service TAGS: [monitor, pg_exporter_register]
monitor : Config pgbouncer_exporter opts TAGS: [monitor, pgbouncer_exporter]
monitor : Config pgbouncer_exporter service TAGS: [monitor, pgbouncer_exporter]
monitor : Launch pgbouncer_exporter service TAGS: [monitor, pgbouncer_exporter]
monitor : Wait for pgbouncer_exporter online TAGS: [monitor, pgbouncer_exporter]
monitor : Register pgb-exporter consul service TAGS: [monitor, node_exporter_register]
monitor : Reload pgb-exporter consul service TAGS: [monitor, node_exporter_register]
monitor : Copy node_exporter binary TAGS: [monitor, node_exporter, node_exporter_binary]
monitor : Copy node_exporter systemd service TAGS: [monitor, node_exporter]
monitor : Config default node_exporter options TAGS: [monitor, node_exporter]
monitor : Launch node_exporter service unit TAGS: [monitor, node_exporter]
monitor : Wait for node_exporter online TAGS: [monitor, node_exporter]
monitor : Register node-exporter service to consul TAGS: [monitor, node_exporter_register]
monitor : Reload node-exporter consul service TAGS: [monitor, node_exporter_register]
haproxy : Make sure haproxy is installed TAGS: [haproxy, haproxy_install]
haproxy : Create haproxy directory TAGS: [haproxy, haproxy_install]
haproxy : Copy haproxy systemd service file TAGS: [haproxy, haproxy_install, haproxy_unit]
haproxy : Fetch postgres cluster memberships TAGS: [haproxy, haproxy_config]
haproxy : Templating /etc/haproxy/haproxy.cfg TAGS: [haproxy, haproxy_config]
haproxy : Launch haproxy load balancer service TAGS: [haproxy, haproxy_launch, haproxy_restart]
haproxy : Wait for haproxy load balancer online TAGS: [haproxy, haproxy_launch]
haproxy : Reload haproxy load balancer service TAGS: [haproxy, haproxy_reload]
haproxy : Copy haproxy service definition TAGS: [haproxy, haproxy_register]
haproxy : Reload haproxy consul service TAGS: [haproxy, haproxy_register]
vip : Templating /etc/default/vip-manager.yml TAGS: [vip]
vip : create vip-manager. systemd drop-in dir TAGS: [vip]
vip : create vip-manager systemd drop-in file TAGS: [vip]
vip : Launch vip-manager TAGS: [vip]
5.3.4 - 下线数据库集群
如何下线PostgreSQL数据库集群与实例
剧本概览
数据库下线:可以移除现有的数据库集群或实例,回收节点:pgsql-remove.yml
日常管理
./pgsql-remove.yml -l pg-test # 下线在 pg-test 集群
./pgsql-remove.yml -l pg-test -l 10.10.10.13 # 下线在 pg-test 集群中的一个实例
剧本说明
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : pgsql-remove.yml
# Mtime : 2020-05-12
# Mtime : 2021-03-15
# Desc : remove postgres & consul services
# Path : pgsql-remove.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# this playbook aims at removing postgres & consul & related service
# from # existing instances. So that the node can be recycled for
# re-initialize or other database clusters.
#------------------------------------------------------------------------------
# Remove load balancer
#------------------------------------------------------------------------------
- name: Remove load balancer
become: yes
hosts: all
serial: 1
gather_facts: no
tags: rm-lb
tasks:
- name: Stop load balancer
ignore_errors: true
systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
with_items:
- vip-manager
- haproxy
# - keepalived
#------------------------------------------------------------------------------
# Remove pg monitor
#------------------------------------------------------------------------------
- name: Remove monitor
become: yes
hosts: all
gather_facts: no
tags: rm-monitor
tasks:
- name: Stop monitor service
ignore_errors: true
systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
with_items:
- pg_exporter
- pgbouncer_exporter
- name: Deregister exporter service
ignore_errors: true
file: path=/etc/consul.d/svc-{{ item }}.json state=absent
with_items:
- haproxy
- pg-exporter
- pgbouncer-exporter
- name: Reload consul
systemd: name=consul state=reloaded
#------------------------------------------------------------------------------
# Remove watchdog owner
#------------------------------------------------------------------------------
- name: Remove monitor
become: yes
hosts: all
gather_facts: no
tags: rm-watchdog
tasks:
# - watchdog owner - #
- name: Remove patroni watchdog ownership
ignore_errors: true
file: path=/dev/watchdog owner=root group=root
#------------------------------------------------------------------------------
# Remove postgres service
#------------------------------------------------------------------------------
- name: Remove Postgres service
become: yes
hosts: all
serial: 1
gather_facts: no
tags: rm-pg
tasks:
- name: Remove postgres replica services
when: pg_role != 'primary'
ignore_errors: true
systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
with_items:
- patroni
- postgres
- pgbouncer
# if in resume mode, postgres will not be stopped
- name: Force stop postgres non-primary process
become_user: "{{ pg_dbsu }}"
when: pg_role != 'primary'
ignore_errors: true
shell: |
{{ pg_bin_dir }}/pg_ctl -D {{ pg_data }} stop -m immediate
exit 0
- name: Remove postgres primary services
when: pg_role == 'primary'
ignore_errors: true
systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
with_items:
- patroni
- postgres
- pgbouncer
- name: Force stop postgres primary process
become_user: "{{ pg_dbsu }}"
when: pg_role == 'primary'
ignore_errors: true
shell: |
{{ pg_bin_dir }}/pg_ctl -D {{ pg_data }} stop -m immediate
exit 0
- name: Deregister postgres services
ignore_errors: true
file: path=/etc/consul.d/svc-{{ item }}.json state=absent
with_items:
- postgres
- pgbouncer
- patroni
#------------------------------------------------------------------------------
# Remove postgres service
#------------------------------------------------------------------------------
- name: Remove Infrastructure
become: yes
hosts: all
serial: 1
gather_facts: no
tags: rm-infra
tasks:
- name: Consul leave cluster
ignore_errors: true
command: /usr/bin/consul leave
- name: Stop consul and node_exporter
ignore_errors: true
systemd: name={{ item }} state=stopped enabled=no daemon_reload=yes
with_items:
- node_exporter
- consul
#------------------------------------------------------------------------------
# Uninstall postgres and consul
#------------------------------------------------------------------------------
- name: Uninstall Packages
become: yes
hosts: all
gather_facts: no
tags: rm-pkgs
tasks:
- name: Uninstall postgres and consul
when: yum_remove is defined and yum_remove|bool
shell: |
yum remove -y consul
yum remove -y postgresql{{ pg_version }}*
...
使用样例
./pgsql-remove.yml -l pg-test
执行结果
任务详情
默认任务如下:
playbook: ./pgsql-remove.yml
play #1 (all): Remove load balancer TAGS: [rm-lb]
tasks:
Stop load balancer TAGS: [rm-lb]
play #2 (all): Remove monitor TAGS: [rm-monitor]
tasks:
Stop monitor service TAGS: [rm-monitor]
Deregister exporter service TAGS: [rm-monitor]
Reload consul TAGS: [rm-monitor]
play #3 (all): Remove monitor TAGS: [rm-watchdog]
tasks:
Remove patroni watchdog ownership TAGS: [rm-watchdog]
play #4 (all): Remove Postgres service TAGS: [rm-pg]
tasks:
Remove postgres replica services TAGS: [rm-pg]
Force stop postgres non-primary process TAGS: [rm-pg]
Remove postgres primary services TAGS: [rm-pg]
Force stop postgres primary process TAGS: [rm-pg]
Deregister postgres services TAGS: [rm-pg]
play #5 (all): Remove Infrastructure TAGS: [rm-infra]
tasks:
Consul leave cluster TAGS: [rm-infra]
Stop consul and node_exporter TAGS: [rm-infra]
play #6 (all): Uninstall Packages TAGS: [rm-pkgs]
tasks:
Uninstall postgres and consul TAGS: [rm-pkgs]
5.3.5 - 仅监控部署
如何单独部署Pigsty监控系统?
剧本概览
部署监控系统:可以在现有集群中创建新的用户或修改现有用户:pgsql-monitor.yml
日常管理
# 在 pg-test 集群中部署监控
./pgsql-monitor.yml -l pg-test
剧本说明
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : pgsql-monitor.yml
# Ctime : 2021-02-23
# Mtime : 2021-02-27
# Desc : deploy monitor components only
# Path : pgsql-monitor.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# this is pgsql monitor setup playbook for MONITOR ONLY mode
# MONITOR-ONLY (monly) mode is a special deployment mode for
# integration with exterior provisioning solution or existing
# postgres clusters.
# with limited functionalities
# For monly deployment, The infra part is still the same.
# You MUST use static services discovery for prometheus
# You CAN NOT use services_registry
#------------------------------------------------------------------------------
# Deploy monitor on selected targets
#------------------------------------------------------------------------------
- name: Monitor Only Deployment
become: yes
hosts: all
gather_facts: no
tags: monitor
roles:
- role: monitor # init monitor system
vars:
#------------------------------------------------------------------------------
# RECOMMEND CHANGES
#------------------------------------------------------------------------------
# You'd better change those options in your main config file
# prometheus_sd_method: static # MUST use static sd for monitor only mode
service_registry: none # MUST NOT register services
exporter_install: binary # none|yum|binary, none by default
# exporter_install controls how node_exporter & pg_exporter are installed
# none : I've already installed manually
# yum : Use yum install, `exporter_repo_url` will be added if specified
# binary : Copy binary to /usr/bin. You must have binary in your `files` dir
#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
# exporter_install: none # none|yum|binary, none by default
# exporter_repo_url: '' # if set, repo will be added to /etc/yum.repos.d/ before yum installation
# - collect - #
# exporter_metrics_path: /metrics # default metric path for pg related exporter
# - node exporter - #
# node_exporter_enabled: true # setup node_exporter on instance
# node_exporter_port: 9100 # default port for node exporter
# node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'
# - pg exporter - #
# pg_exporter_config: pg_exporter-demo.yaml # default config files for pg_exporter
# pg_exporter_enabled: true # setup pg_exporter on instance
# pg_exporter_port: 9630 # default port for pg exporter
# pg_exporter_url: '' # optional, if not set, generate from reference parameters
# - pgbouncer exporter - #
# pgbouncer exporter require pgbouncer to work, so it is disabled by default in monitor-only mode
# pgbouncer_exporter_enabled: false # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
# pgbouncer_exporter_port: 9631 # default port for pgbouncer exporter
# pgbouncer_exporter_url: '' # optional, if not set, generate from reference parameters
# - postgres variables reference - #
# pg_dbsu: postgres
# pg_port: 5432 # postgres port (5432 by default)
# pgbouncer_port: 6432 # pgbouncer port (6432 by default)
# pg_localhost: /var/run/postgresql # localhost unix socket dir for connection
# pg_default_database: postgres # default database will be used as primary monitor target
# pg_monitor_username: dbuser_monitor # system monitor username, for postgres and pgbouncer
# pg_monitor_password: DBUser.Monitor # system monitor user's password
# service_registry: consul # none | consul | etcd | both
#------------------------------------------------------------------------------
# update static inventory in meta node and reload
#------------------------------------------------------------------------------
- name: Update prometheus static sd files
become: yes
hosts: meta
tags: prometheus
gather_facts: no
vars:
#------------------------------------------------------------------------------
# RECOMMEND CHANGES
#------------------------------------------------------------------------------
prometheus_sd_method: static # service discovery method: static|consul|etcd
tasks:
- include_tasks: roles/prometheus/tasks/targets.yml
- include_tasks: roles/prometheus/tasks/reload.yml
...
使用样例
./pgsql-monitor.yml -l pg-test
执行结果
$ ./pgsql-monitor.yml -l pg-test -e pg_user=test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
PLAY [Create user in cluster] *****************************************************************************************************************************************************
TASK [Check parameter pg_user] ****************************************************************************************************************************************************
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [Fetch user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
"msg": {
"comment": "default test user for production usage",
"name": "test",
"password": "test",
"pgbouncer": true,
"roles": [
"dbrole_readwrite"
]
}
}
ok: [10.10.10.12] => {
"msg": {
"comment": "default test user for production usage",
"name": "test",
"password": "test",
"pgbouncer": true,
"roles": [
"dbrole_readwrite"
]
}
}
ok: [10.10.10.13] => {
"msg": {
"comment": "default test user for production usage",
"name": "test",
"password": "test",
"pgbouncer": true,
"roles": [
"dbrole_readwrite"
]
}
}
TASK [Check user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/postgres/tasks/monitor.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [Render user test creation sql] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Execute user test creation sql on primary] **********************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Add user to pgbouncer] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [Reload pgbouncer to add user] ***********************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11 : ok=9 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
10.10.10.12 : ok=7 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
10.10.10.13 : ok=7 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
任务详情
默认任务如下:
playbook: ./pgsql-monitor.yml
play #1 (all): Monitor Only Deployment TAGS: [monitor]
tasks:
monitor : Install exporter yum repo TAGS: [exporter_install, exporter_yum_install, monitor]
monitor : Install node_exporter and pg_exporter TAGS: [exporter_install, exporter_yum_install, monitor]
monitor : Copy node_exporter binary TAGS: [exporter_binary_install, exporter_install, monitor]
monitor : Copy pg_exporter binary TAGS: [exporter_binary_install, exporter_install, monitor]
monitor : Create /etc/pg_exporter conf dir TAGS: [monitor, pg_exporter]
monitor : Copy default pg_exporter.yaml TAGS: [monitor, pg_exporter]
monitor : Config /etc/default/pg_exporter TAGS: [monitor, pg_exporter]
monitor : Config pg_exporter service unit TAGS: [monitor, pg_exporter]
monitor : Launch pg_exporter systemd service TAGS: [monitor, pg_exporter]
monitor : Wait for pg_exporter service online TAGS: [monitor, pg_exporter]
monitor : Register pg-exporter consul service TAGS: [monitor, pg_exporter_register]
monitor : Reload pg-exporter consul service TAGS: [monitor, pg_exporter_register]
monitor : Config pgbouncer_exporter opts TAGS: [monitor, pgbouncer_exporter]
monitor : Config pgbouncer_exporter service TAGS: [monitor, pgbouncer_exporter]
monitor : Launch pgbouncer_exporter service TAGS: [monitor, pgbouncer_exporter]
monitor : Wait for pgbouncer_exporter online TAGS: [monitor, pgbouncer_exporter]
monitor : Register pgb-exporter consul service TAGS: [monitor, node_exporter_register]
monitor : Reload pgb-exporter consul service TAGS: [monitor, node_exporter_register]
monitor : Copy node_exporter systemd service TAGS: [monitor, node_exporter]
monitor : Config default node_exporter options TAGS: [monitor, node_exporter]
monitor : Launch node_exporter service unit TAGS: [monitor, node_exporter]
monitor : Wait for node_exporter online TAGS: [monitor, node_exporter]
monitor : Register node-exporter service to consul TAGS: [monitor, node_exporter_register]
monitor : Reload node-exporter consul service TAGS: [monitor, node_exporter_register]
play #2 (meta): Update prometheus static sd files TAGS: [prometheus]
tasks:
include_tasks TAGS: [prometheus]
include_tasks TAGS: [prometheus]
5.3.6 - 创建业务用户
如何在用户集群中新建或修改业务用户?
剧本概览
创建业务用户:可以在现有集群中创建新的用户或修改现有用户:pgsql-createuser.yml
日常管理
# 在 pg-test 集群创建名为 test 的用户
./pgsql-createuser.yml -l pg-test -e pg_user=test
请注意,pg_user
指定的用户,必须已经存在于集群pg_users
的定义中,否则会报错。这意味着用户必须先定义用户,再创建用户。
剧本说明
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : pgsql-createuser.yml
# Ctime : 2021-02-27
# Mtime : 2021-02-27
# Desc : create user on running cluster
# Path : pgsql-createuser.yml
# Deps : templates/pg-user.sql
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
#=============================================================================#
# How to create user ?
# 1. define user in your configuration file! <cluster>.vars.pg_usesrs
# 2. execute this playbook with pg_user set to your new user.name
# 3. run playbook on target cluster
# It essentially does:
# 1. create sql file in /pg/tmp/pg-user-{{ user.name }}.sql
# 2. create user on primary instance with that sql
# 3. if {{ user.pgbouncer }}, add to all cluster members and reload
#=============================================================================#
- name: Create user in cluster
become: yes
hosts: all
gather_facts: no
vars:
##################################################################################
# IMPORTANT: Change this or use cli-arg to specify target user in inventory #
##################################################################################
pg_user: test
tasks:
#------------------------------------------------------------------------------
# pre-flight check: validate pg_user and user definition
# ------------------------------------------------------------------------------
- name: Preflight
block:
- name: Check parameter pg_user
connection: local
assert:
that:
- pg_user is defined
- pg_user != ''
- pg_user != 'postgres'
fail_msg: variable 'pg_user' should be specified to create target user
- name: Fetch user definition
connection: local
set_fact:
pg_user_definition={{ pg_users | json_query(pg_user_definition_query) }}
vars:
pg_user_definition_query: "[?name=='{{ pg_user }}'] | [0]"
# print user definition
- debug:
msg: "{{ pg_user_definition }}"
- name: Check user definition
assert:
that:
- pg_user_definition is defined
- pg_user_definition != None
- pg_user_definition != ''
- pg_user_definition != {}
fail_msg: user definition for {{ pg_user }} should exists in pg_users
#------------------------------------------------------------------------------
# Create user on cluster primary and add pgbouncer entry to cluster members
#------------------------------------------------------------------------------
# create user according to user definition
- include_tasks: roles/postgres/tasks/createuser.yml
vars:
user: "{{ pg_user_definition }}"
#------------------------------------------------------------------------------
# Pgbouncer Reload (entire cluster)
#------------------------------------------------------------------------------
- name: Reload pgbouncer to add user
when: pg_user_definition.pgbouncer is defined and pg_user_definition.pgbouncer|bool
tags: pgbouncer_reload
systemd: name=pgbouncer state=reloaded enabled=yes daemon_reload=yes
...
使用样例
./pgsql-createuser.yml -l pg-test -e pg_user=test
执行结果
$ ./pgsql-createuser.yml -l pg-test -e pg_user=test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
PLAY [Create user in cluster] *****************************************************************************************************************************************************
TASK [Check parameter pg_user] ****************************************************************************************************************************************************
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [Fetch user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
"msg": {
"comment": "default test user for production usage",
"name": "test",
"password": "test",
"pgbouncer": true,
"roles": [
"dbrole_readwrite"
]
}
}
ok: [10.10.10.12] => {
"msg": {
"comment": "default test user for production usage",
"name": "test",
"password": "test",
"pgbouncer": true,
"roles": [
"dbrole_readwrite"
]
}
}
ok: [10.10.10.13] => {
"msg": {
"comment": "default test user for production usage",
"name": "test",
"password": "test",
"pgbouncer": true,
"roles": [
"dbrole_readwrite"
]
}
}
TASK [Check user definition] ******************************************************************************************************************************************************
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [Render user test creation sql] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Execute user test creation sql on primary] **********************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Add user to pgbouncer] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [Reload pgbouncer to add user] ***********************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11 : ok=9 changed=4 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
10.10.10.12 : ok=7 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
10.10.10.13 : ok=7 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
任务详情
默认任务如下:
playbook: ./pgsql-createuser.yml
play #1 (all): Create user in cluster TAGS: []
tasks:
Check parameter pg_user TAGS: []
Fetch user definition TAGS: []
debug TAGS: []
Check user definition TAGS: []
include_tasks TAGS: []
Reload pgbouncer to add user TAGS: [pgbouncer_reload]
5.3.7 - 创建与修改服务
如何在数据库集群中新建或修改服务?
剧本概览
创建业务数据库:可以在现有集群中创建新的数据库或修改现有数据库:pgsql-service.yml
日常管理
# 在 pg-test 集群创建所有服务
./pgsql-service.yml -l pg-test
剧本说明
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : pgsql-service.yml
# Ctime : 2021-03-12
# Mtime : 2021-03-12
# Desc : reload service for postgres clusters
# Path : pgsql-service.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# PLEASE USE COMPLETE INVENTORY (at least contains a complete cluster definition!)
#------------------------------------------------------------------------------
# haproxy reload
# will not reload if haproxy_reload=false
#------------------------------------------------------------------------------
- name: Reload haproxy
become: yes
hosts: all
gather_facts: no
tags: haproxy
tasks:
- include_tasks: roles/service/tasks/haproxy_config.yml
when: haproxy_enabled
- include_tasks: roles/service/tasks/haproxy_reload.yml
when: haproxy_enabled and haproxy_reload|bool
#------------------------------------------------------------------------------
# l2-vip reload
# will only config without reload if vip_reload=false
#------------------------------------------------------------------------------
- name: Reload l2 VIP
become: yes
hosts: all
gather_facts: no
tags: vip_l2
tasks:
- include_tasks: roles/service/tasks/vip_l2_config.yml
when: vip_mode == 'l2'
- include_tasks: roles/service/tasks/vip_l2_reload.yml
when: vip_mode == 'l2' and vip_reload|bool
#------------------------------------------------------------------------------
# l4-vip reload
# will not reload if vip_reload=false
#------------------------------------------------------------------------------
- name: Reload l4 VIP
become: yes
hosts: all
gather_facts: no
tags: vip_l4
tasks:
- include_tasks: roles/service/tasks/vip_l4_config.yml
- include_tasks: roles/service/tasks/vip_l4_reload.yml
...
使用样例
./pgsql-service.yml -l pg-test
执行结果
$ ./pgsql-service.yml -l pg-test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
PLAY [Reload haproxy] *************************************************************************************************************************************************************
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/haproxy_config.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [Fetch postgres cluster memberships] *****************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [Templating /etc/haproxy/haproxy.cfg] ****************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/haproxy_reload.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [Reload haproxy load balancer service] ***************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.11]
PLAY [Reload l2 VIP] **************************************************************************************************************************************************************
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/vip_l2_config.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [Templating /etc/default/vip-manager.yml] ************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.13]
ok: [10.10.10.12]
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/service/tasks/vip_l2_reload.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [Launch vip-manager] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
PLAY [Reload l4 VIP] **************************************************************************************************************************************************************
TASK [include_tasks] **************************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [include_tasks] **************************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11 : ok=9 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
10.10.10.12 : ok=9 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
10.10.10.13 : ok=9 changed=2 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
任务详情
默认任务如下:
playbook: ./pgsql-service.yml
play #1 (all): Reload haproxy TAGS: [haproxy]
tasks:
include_tasks TAGS: [haproxy]
include_tasks TAGS: [haproxy]
play #2 (all): Reload l2 VIP TAGS: [vip_l2]
tasks:
include_tasks TAGS: [vip_l2]
include_tasks TAGS: [vip_l2]
play #3 (all): Reload l4 VIP TAGS: [vip_l4]
tasks:
include_tasks TAGS: [vip_l4]
include_tasks TAGS: [vip_l4]
5.3.8 - 创建业务数据库
如何在数据库集群中新建或修改业务数据库?
剧本概览
创建业务数据库:可以在现有集群中创建新的数据库或修改现有数据库:pgsql-createdb.yml
日常管理
# 在 pg-test 集群创建名为 test 的数据库
./pgsql-createdb.yml -l pg-test -e pg_database=test
剧本说明
#!/usr/bin/env ansible-playbook
---
#==============================================================#
# File : pgsql-createdb.yml
# Ctime : 2021-02-27
# Mtime : 2021-02-27
# Desc : create database on running cluster
# Deps : templates/pg-db.sql
# Path : pgsql-createdb.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
#=============================================================================#
# How to create database ?
# 1. define database in your configuration file! <cluster>.vars.pg_databases
# 2. execute this playbook with pg_database set to your new database.name
# 3. run playbook on target cluster
# It essentially does:
# 1. create sql file in /pg/tmp/pg-db-{{ database.name }}.sql
# 2. create database on primary instance with that sql
# 3. if {{ database.pgbouncer }}, add to all cluster members and reload
#=============================================================================#
- name: Create Database In Cluster
become: yes
hosts: all
gather_facts: no
vars:
##################################################################################
# IMPORTANT: Change this or use cli-arg to specify target database in inventory #
##################################################################################
pg_database: test
tasks:
#------------------------------------------------------------------------------
# pre-flight check: validate pg_database and database definition
# ------------------------------------------------------------------------------
- name: Preflight
block:
- name: Check parameter pg_database
connection: local
assert:
that:
- pg_database is defined
- pg_database != ''
- pg_database != 'postgres'
fail_msg: variable 'pg_database' should be specified to create target database
- name: Fetch database definition
connection: local
set_fact:
pg_database_definition={{ pg_databases | json_query(pg_database_definition_query) }}
vars:
pg_database_definition_query: "[?name=='{{ pg_database }}'] | [0]"
# print database definition
- debug:
msg: "{{ pg_database_definition }}"
- name: Check database definition
assert:
that:
- pg_database_definition is defined
- pg_database_definition != None
- pg_database_definition != ''
- pg_database_definition != {}
fail_msg: database definition for {{ pg_database }} should exists in pg_databases
#------------------------------------------------------------------------------
# Create database on cluster primary and add pgbouncer entry to cluster members
#------------------------------------------------------------------------------
# create database according to database definition
- include_tasks: roles/postgres/tasks/createdb.yml
vars:
database: "{{ pg_database_definition }}"
#------------------------------------------------------------------------------
# Pgbouncer Reload (entire cluster)
#------------------------------------------------------------------------------
- name: Reload pgbouncer to add database
when: pg_database_definition.pgbouncer is not defined or pg_database_definition.pgbouncer|bool
tags: pgbouncer_reload
systemd: name=pgbouncer state=reloaded enabled=yes daemon_reload=yes
...
使用样例
./pgsql-createdb.yml -l pg-test -e pg_database=test
执行结果
$ ./pgsql-createdb.yml -l pg-test -e pg_database=test
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
PLAY [Create Database In Cluster] *************************************************************************************************************************************************
TASK [Check parameter pg_database] ************************************************************************************************************************************************
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [Fetch database definition] **************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
"msg": {
"name": "test"
}
}
ok: [10.10.10.12] => {
"msg": {
"name": "test"
}
}
ok: [10.10.10.13] => {
"msg": {
"name": "test"
}
}
TASK [Check database definition] **************************************************************************************************************************************************
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [include_tasks] **************************************************************************************************************************************************************
included: /Volumes/Data/pigsty/roles/postgres/tasks/createdb.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13
TASK [debug] **********************************************************************************************************************************************************************
ok: [10.10.10.11] => {
"msg": {
"name": "test"
}
}
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [Render database test creation sql] ******************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Render database test baseline sql] ******************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [Execute database test creation command] *************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Execute database test creation sql] *****************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [Execute database test creation sql] *****************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [Add pgbouncer busniess database] ********************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [Reload pgbouncer to add database] *******************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
PLAY RECAP ************************************************************************************************************************************************************************
10.10.10.11 : ok=11 changed=5 unreachable=0 failed=0 skipped=2 rescued=0 ignored=0
10.10.10.12 : ok=7 changed=2 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
10.10.10.13 : ok=7 changed=2 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
任务详情
默认任务如下:
playbook: ./pgsql-createdb.yml
play #1 (all): Create Database In Cluster TAGS: []
tasks:
Check parameter pg_database TAGS: []
Fetch database definition TAGS: []
debug TAGS: []
Check database definition TAGS: []
include_tasks TAGS: []
Reload pgbouncer to add database TAGS: [pgbouncer_reload]
5.4 - 部署样例
在实际环境中部署Pigsty的几个例子
这里给出几个典型的部署样例,仅供参考。
5.4.1 - Vagrant沙箱环境
针对本地Vagrant沙箱的Pigsty配置示例
概述
这个配置文件,是Pigsty自带的沙箱环境所使用的配置文件。
Github原地址为:https://github.com/Vonng/pigsty/blob/master/pigsty.yml
该配置文件可作为一个标准的学习样例,例如使用相同规格的虚拟机环境部署时,通常只需要在这份配置文件的基础上进行极少量修改就可以直接使用:例如,将10.10.10.10
替换为您的元节点IP,将10.10.10.*
替换为数据库节点的IP,修改或移除 ansible_host
系列连接参数以提供正确的连接信息。就可以将Pigsty部署到一组虚拟机上了。
配置文件
---
######################################################################
# File : pigsty.yml
# Path : pigsty.yml
# Desc : Pigsty Configuration file
# Note : follow ansible inventory file format
# Ctime : 2020-05-22
# Mtime : 2021-03-16
# Copyright (C) 2018-2021 Ruohang Feng
######################################################################
######################################################################
# Development Environment Inventory #
######################################################################
all: # top-level namespace, match all hosts
#==================================================================#
# Clusters #
#==================================================================#
# postgres database clusters are defined as kv pair in `all.children`
# where the key is cluster name and the value is the object consist
# of cluster members (hosts) and ad-hoc variables (vars)
# meta node are defined in special group "meta" with `meta_node=true`
children:
#-----------------------------
# meta controller
#-----------------------------
meta: # special group 'meta' defines the main controller machine
vars:
meta_node: true # mark node as meta controller
ansible_group_priority: 99 # meta group is top priority
# nodes in meta group
hosts: {10.10.10.10: {ansible_host: meta}}
#-----------------------------
# cluster: pg-meta
#-----------------------------
pg-meta:
# - cluster members - #
hosts:
10.10.10.10: {pg_seq: 1, pg_role: primary, ansible_host: meta}
# - cluster configs - #
vars:
pg_cluster: pg-meta # define actual cluster name
pg_version: 13 # define installed pgsql version
node_tune: tiny # tune node into oltp|olap|crit|tiny mode
pg_conf: tiny.yml # tune pgsql into oltp/olap/crit/tiny mode
patroni_mode: pause # enter maintenance mode, {default|pause|remove}
patroni_watchdog_mode: off # disable watchdog (require|automatic|off)
pg_lc_ctype: en_US.UTF8 # enabled pg_trgm i18n char support
pg_users:
# complete example of user/role definition for production user
- name: dbuser_meta # example production user have read-write access
password: DBUser.Meta # example user's password, can be encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
connlimit: -1 # connection limit, -1 disable limit
expire_at: '2030-12-31' # 'timestamp' when this role is expired
expire_in: 365 # now + n days when this role is expired (OVERWRITE expire_at)
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
parameters: # user's default search path
search_path: public
comment: test user
# simple example for personal user definition
- name: dbuser_vonng2 # personal user example which only have limited access to offline instance
password: DBUser.Vonng # or instance with explict mark `pg_offline_query = true`
roles: [dbrole_offline] # personal/stats/ETL user should be grant with dbrole_offline
expire_in: 365 # expire in 365 days since creation
pgbouncer: false # personal user should NOT be allowed to login with pgbouncer
comment: example personal user for interactive queries
pg_databases:
- name: meta # name is the only required field for a database
# owner: postgres # optional, database owner
# template: template1 # optional, template1 by default
# encoding: UTF8 # optional, UTF8 by default , must same as template database, leave blank to set to db default
# locale: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
# tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
pg_default_database: meta # default database will be used as primary monitor target
# proxy settings
vip_mode: l2 # enable/disable vip (require members in same LAN)
vip_address: 10.10.10.2 # virtual ip address
vip_cidrmask: 8 # cidr network mask length
vip_interface: eth1 # interface to add virtual ip
#-----------------------------
# cluster: pg-test
#-----------------------------
pg-test: # define cluster named 'pg-test'
# - cluster members - #
hosts:
10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}
# - cluster configs - #
vars:
# basic settings
pg_cluster: pg-test # define actual cluster name
pg_version: 13 # define installed pgsql version
node_tune: tiny # tune node into oltp|olap|crit|tiny mode
pg_conf: tiny.yml # tune pgsql into oltp/olap/crit/tiny mode
# business users, adjust on your own needs
pg_users:
- name: test # example production user have read-write access
password: test # example user's password
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly|dbrole_offline
pgbouncer: true # production user that access via pgbouncer
comment: default test user for production usage
pg_databases: # create a business database 'test'
- name: test # use the simplest form
pg_default_database: test # default database will be used as primary monitor target
# proxy settings
vip_mode: l2 # enable/disable vip (require members in same LAN)
vip_address: 10.10.10.3 # virtual ip address
vip_cidrmask: 8 # cidr network mask length
vip_interface: eth1 # interface to add virtual ip
#==================================================================#
# Globals #
#==================================================================#
vars:
#------------------------------------------------------------------------------
# CONNECTION PARAMETERS
#------------------------------------------------------------------------------
# this section defines connection parameters
# ansible_user: vagrant # admin user with ssh access and sudo privilege
proxy_env: # global proxy env when downloading packages
no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"
# http_proxy: ''
# https_proxy: ''
# all_proxy: ''
#------------------------------------------------------------------------------
# REPO PROVISION
#------------------------------------------------------------------------------
# this section defines how to build a local repo
# - repo basic - #
repo_enabled: true # build local yum repo on meta nodes?
repo_name: pigsty # local repo name
repo_address: yum.pigsty # repo external address (ip:port or url)
repo_port: 80 # listen address, must same as repo_address
repo_home: /www # default repo dir location
repo_rebuild: false # force re-download packages
repo_remove: true # remove existing repos
# - where to download - #
repo_upstreams:
- name: base
description: CentOS-$releasever - Base - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
gpgcheck: no
failovermethod: priority
- name: updates
description: CentOS-$releasever - Updates - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
gpgcheck: no
failovermethod: priority
- name: extras
description: CentOS-$releasever - Extras - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
gpgcheck: no
failovermethod: priority
- name: epel
description: CentOS $releasever - EPEL - Aliyun Mirror
baseurl: http://mirrors.aliyun.com/epel/$releasever/$basearch
gpgcheck: no
failovermethod: priority
- name: grafana
description: Grafana - TsingHua Mirror
gpgcheck: no
baseurl: https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm
- name: prometheus
description: Prometheus and exporters
gpgcheck: no
baseurl: https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch
# consider using ZJU PostgreSQL mirror in mainland china
- name: pgdg-common
description: PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch
gpgcheck: no
# baseurl: https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-$releasever-$basearch
baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch
- name: pgdg13
description: PostgreSQL 13 for RHEL/CentOS $releasever - $basearch
gpgcheck: no
# baseurl: https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-$releasever-$basearch
baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch
- name: centos-sclo
description: CentOS-$releasever - SCLo
gpgcheck: no
mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo
- name: centos-sclo-rh
description: CentOS-$releasever - SCLo rh
gpgcheck: no
mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh
- name: nginx
description: Nginx Official Yum Repo
skip_if_unavailable: true
gpgcheck: no
baseurl: http://nginx.org/packages/centos/$releasever/$basearch/
- name: haproxy
description: Copr repo for haproxy
skip_if_unavailable: true
gpgcheck: no
baseurl: https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/
# for latest consul & kubernetes
- name: harbottle
description: Copr repo for main owned by harbottle
skip_if_unavailable: true
gpgcheck: no
baseurl: https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/
# - what to download - #
repo_packages:
# repo bootstrap packages
- epel-release nginx wget yum-utils yum createrepo # bootstrap packages
# node basic packages
- ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned # basic system util
- readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates # basic pg dependency
- numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet # system utils
# dcs & monitor packages
- grafana prometheus2 pushgateway alertmanager # monitor and ui
- node_exporter postgres_exporter nginx_exporter blackbox_exporter # exporter
- consul consul_exporter consul-template etcd # dcs
# python3 dependencies
- ansible python python-pip python-psycopg2 audit # ansible & python
- python3 python3-psycopg2 python36-requests python3-etcd python3-consul # python3
- python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography # python3 patroni extra deps
# proxy and load balancer
- haproxy keepalived dnsmasq # proxy and dns
# postgres common Packages
- patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity # major components
- pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel # other common utils
- barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail
# postgres 13 packages
- postgresql13* postgis31* citus_13 timescaledb_13 # pgrouting_13 # postgres 13 and postgis 31
- pg_repack13 pg_squeeze13 # maintenance extensions
- pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13 # stats extensions
- plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13 # PL extensions # pl extensions
- hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13 # FDW extensions
- wal2json13 count_distinct13 ddlx_13 geoip13 orafce13 # MISC extensions
- rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13
- pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13
- pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13
- pguint13 pguri13 prefix13 safeupdate_13 semver13 table_version13 tdigest13
repo_url_packages:
- https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter-0.3.2-1.el7.x86_64.rpm
- https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm
- http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm
#------------------------------------------------------------------------------
# NODE PROVISION
#------------------------------------------------------------------------------
# this section defines how to provision nodes
# nodename: # if defined, node's hostname will be overwritten
# - node dns - #
node_dns_hosts: # static dns records in /etc/hosts
- 10.10.10.10 yum.pigsty
node_dns_server: add # add (default) | none (skip) | overwrite (remove old settings)
node_dns_servers: # dynamic nameserver in /etc/resolv.conf
- 10.10.10.10
node_dns_options: # dns resolv options
- options single-request-reopen timeout:1 rotate
- domain service.consul
# - node repo - #
node_repo_method: local # none|local|public (use local repo for production env)
node_repo_remove: true # whether remove existing repo
node_local_repo_url: # local repo url (if method=local, make sure firewall is configured or disabled)
- http://yum.pigsty/pigsty.repo
# - node packages - #
node_packages: # common packages for all nodes
- wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
- numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
- python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
- python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
- node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
node_extra_packages: # extra packages for all nodes
- patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
node_meta_packages: # packages for meta nodes only
- grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
- dnsmasq,nginx,ansible,pgbadger,polysh
# - node features - #
node_disable_numa: false # disable numa, important for production database, reboot required
node_disable_swap: false # disable swap, important for production database
node_disable_firewall: true # disable firewall (required if using kubernetes)
node_disable_selinux: true # disable selinux (required if using kubernetes)
node_static_network: true # keep dns resolver settings after reboot
node_disk_prefetch: false # setup disk prefetch on HDD to increase performance
# - node kernel modules - #
node_kernel_modules:
- softdog
- br_netfilter
- ip_vs
- ip_vs_rr
- ip_vs_rr
- ip_vs_wrr
- ip_vs_sh
- nf_conntrack_ipv4
# - node tuned - #
node_tune: tiny # install and activate tuned profile: none|oltp|olap|crit|tiny
node_sysctl_params: # set additional sysctl parameters, k:v format
net.bridge.bridge-nf-call-iptables: 1 # for kubernetes
# - node user - #
node_admin_setup: true # setup an default admin user ?
node_admin_uid: 88 # uid and gid for admin user
node_admin_username: admin # default admin user
node_admin_ssh_exchange: true # exchange ssh key among cluster ?
node_admin_pks: # public key list that will be installed
- 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com'
# - node ntp - #
node_ntp_service: ntp # ntp or chrony
node_ntp_config: true # overwrite existing ntp config?
node_timezone: Asia/Shanghai # default node timezone
node_ntp_servers: # default NTP servers
- pool cn.pool.ntp.org iburst
- pool pool.ntp.org iburst
- pool time.pool.aliyun.com iburst
- server 10.10.10.10 iburst
#------------------------------------------------------------------------------
# META PROVISION
#------------------------------------------------------------------------------
# - ca - #
ca_method: create # create|copy|recreate
ca_subject: "/CN=root-ca" # self-signed CA subject
ca_homedir: /ca # ca cert directory
ca_cert: ca.crt # ca public key/cert
ca_key: ca.key # ca private key
# - nginx - #
nginx_upstream:
- { name: home, host: pigsty, url: "127.0.0.1:3000"}
- { name: consul, host: c.pigsty, url: "127.0.0.1:8500" }
- { name: grafana, host: g.pigsty, url: "127.0.0.1:3000" }
- { name: prometheus, host: p.pigsty, url: "127.0.0.1:9090" }
- { name: alertmanager, host: a.pigsty, url: "127.0.0.1:9093" }
- { name: haproxy, host: h.pigsty, url: "127.0.0.1:9091" }
# - nameserver - #
dns_records: # dynamic dns record resolved by dnsmasq
- 10.10.10.2 pg-meta # sandbox vip for pg-meta
- 10.10.10.3 pg-test # sandbox vip for pg-test
- 10.10.10.10 meta-1 # sandbox node meta-1 (node-0)
- 10.10.10.11 node-1 # sandbox node node-1
- 10.10.10.12 node-2 # sandbox node node-2
- 10.10.10.13 node-3 # sandbox node node-3
- 10.10.10.10 pigsty
- 10.10.10.10 y.pigsty yum.pigsty
- 10.10.10.10 c.pigsty consul.pigsty
- 10.10.10.10 g.pigsty grafana.pigsty
- 10.10.10.10 p.pigsty prometheus.pigsty
- 10.10.10.10 a.pigsty alertmanager.pigsty
- 10.10.10.10 n.pigsty ntp.pigsty
- 10.10.10.10 h.pigsty haproxy.pigsty
# - prometheus - #
prometheus_data_dir: /export/prometheus/data # prometheus data dir
prometheus_options: '--storage.tsdb.retention=30d'
prometheus_reload: false # reload prometheus instead of recreate it
prometheus_sd_method: consul # service discovery method: static|consul|etcd
prometheus_scrape_interval: 2s # global scrape & evaluation interval
prometheus_scrape_timeout: 1s # scrape timeout
prometheus_sd_interval: 2s # service discovery refresh interval
# - grafana - #
grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
grafana_admin_password: admin # default grafana admin user password
grafana_plugin: install # none|install|reinstall
grafana_cache: /www/pigsty/grafana/plugins.tar.gz # path to grafana plugins tarball
grafana_customize: true # customize grafana resources
grafana_plugins: # default grafana plugins list
- redis-datasource
- simpod-json-datasource
- fifemon-graphql-datasource
- sbueringer-consul-datasource
- camptocamp-prometheus-alertmanager-datasource
- ryantxu-ajax-panel
- marcusolsson-hourly-heatmap-panel
- michaeldmoore-multistat-panel
- marcusolsson-treemap-panel
- pr0ps-trackmap-panel
- dalvany-image-panel
- magnesium-wordcloud-panel
- cloudspout-button-panel
- speakyourcode-button-panel
- jdbranham-diagram-panel
- grafana-piechart-panel
- snuids-radar-panel
- digrich-bubblechart-panel
grafana_git_plugins:
- https://github.com/Vonng/grafana-echarts
#------------------------------------------------------------------------------
# DCS PROVISION
#------------------------------------------------------------------------------
service_registry: consul # where to register services: none | consul | etcd | both
dcs_type: consul # consul | etcd | both
dcs_name: pigsty # consul dc name | etcd initial cluster token
dcs_servers: # dcs server dict in name:ip format
meta-1: 10.10.10.10 # you could use existing dcs cluster
# meta-2: 10.10.10.11 # host which have their IP listed here will be init as server
# meta-3: 10.10.10.12 # 3 or 5 dcs nodes are recommend for production environment
dcs_exists_action: clean # abort|skip|clean if dcs server already exists
dcs_disable_purge: false # set to true to disable purge functionality for good (force dcs_exists_action = abort)
consul_data_dir: /var/lib/consul # consul data dir (/var/lib/consul by default)
etcd_data_dir: /var/lib/etcd # etcd data dir (/var/lib/consul by default)
#------------------------------------------------------------------------------
# POSTGRES INSTALLATION
#------------------------------------------------------------------------------
# - dbsu - #
pg_dbsu: postgres # os user for database, postgres by default (change it is not recommended!)
pg_dbsu_uid: 26 # os dbsu uid and gid, 26 for default postgres users and groups
pg_dbsu_sudo: limit # none|limit|all|nopass (Privilege for dbsu, limit is recommended)
pg_dbsu_home: /var/lib/pgsql # postgresql binary
pg_dbsu_ssh_exchange: false # exchange ssh key among same cluster
# - postgres packages - #
pg_version: 13 # default postgresql version
pgdg_repo: false # use official pgdg yum repo (disable if you have local mirror)
pg_add_repo: false # add postgres related repo before install (useful if you want a simple install)
pg_bin_dir: /usr/pgsql/bin # postgres binary dir
pg_packages:
- postgresql${pg_version}*
- postgis31_${pg_version}*
- pgbouncer patroni pg_exporter pgbadger
- patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity
- python3 python3-psycopg2 python36-requests python3-etcd python3-consul
- python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography
pg_extensions:
- pg_repack${pg_version} pg_qualstats${pg_version} pg_stat_kcache${pg_version} wal2json${pg_version}
# - ogr_fdw${pg_version} mysql_fdw_${pg_version} redis_fdw_${pg_version} mongo_fdw${pg_version} hdfs_fdw_${pg_version}
# - count_distinct${version} ddlx_${version} geoip${version} orafce${version} # popular features
# - hypopg_${version} ip4r${version} jsquery_${version} logerrors_${version} periods_${version} pg_auto_failover_${version} pg_catcheck${version}
# - pg_fkpart${version} pg_jobmon${version} pg_partman${version} pg_prioritize_${version} pg_track_settings${version} pgaudit15_${version}
# - pgcryptokey${version} pgexportdoc${version} pgimportdoc${version} pgmemcache-${version} pgmp${version} pgq-${version} pgquarrel pgrouting_${version}
# - pguint${version} pguri${version} prefix${version} safeupdate_${version} semver${version} table_version${version} tdigest${version}
#------------------------------------------------------------------------------
# POSTGRES PROVISION
#------------------------------------------------------------------------------
# - identity - #
# pg_cluster: # [REQUIRED] cluster name (validated during pg_preflight)
# pg_seq: 0 # [REQUIRED] instance seq (validated during pg_preflight)
# pg_role: replica # [REQUIRED] service role (validated during pg_preflight)
pg_hostname: false # overwrite node hostname with pg instance name
pg_nodename: true # overwrite consul nodename with pg instance name
# - retention - #
# pg_exists_action, available options: abort|clean|skip
# - abort: abort entire play's execution (default)
# - clean: remove existing cluster (dangerous)
# - skip: end current play for this host
# pg_exists: false # auxiliary flag variable (DO NOT SET THIS)
pg_exists_action: clean
pg_disable_purge: false # set to true to disable pg purge functionality for good (force pg_exists_action = abort)
# - storage - #
pg_data: /pg/data # postgres data directory
pg_fs_main: /export # data disk mount point /pg -> {{ pg_fs_main }}/postgres/{{ pg_instance }}
pg_fs_bkup: /var/backups # backup disk mount point /pg/* -> {{ pg_fs_bkup }}/postgres/{{ pg_instance }}/*
# - connection - #
pg_listen: '0.0.0.0' # postgres listen address, '0.0.0.0' by default (all ipv4 addr)
pg_port: 5432 # postgres port (5432 by default)
pg_localhost: /var/run/postgresql # localhost unix socket dir for connection
# - patroni - #
# patroni_mode, available options: default|pause|remove
# - default: default ha mode
# - pause: into maintenance mode
# - remove: remove patroni after bootstrap
patroni_mode: default # pause|default|remove
pg_namespace: /pg # top level key namespace in dcs
patroni_port: 8008 # default patroni port
patroni_watchdog_mode: automatic # watchdog mode: off|automatic|required
pg_conf: tiny.yml # user provided patroni config template path
# - localization - #
pg_encoding: UTF8 # default to UTF8
pg_locale: C # default to C
pg_lc_collate: C # default to C
pg_lc_ctype: en_US.UTF8 # default to en_US.UTF8
# - pgbouncer - #
pgbouncer_port: 6432 # pgbouncer port (6432 by default)
pgbouncer_poolmode: transaction # pooling mode: (transaction pooling by default)
pgbouncer_max_db_conn: 100 # important! do not set this larger than postgres max conn or conn limit
#------------------------------------------------------------------------------
# POSTGRES TEMPLATE
#------------------------------------------------------------------------------
# - template - #
pg_init: pg-init # init script for cluster template
# - system roles - #
pg_replication_username: replicator # system replication user
pg_replication_password: DBUser.Replicator # system replication password
pg_monitor_username: dbuser_monitor # system monitor user
pg_monitor_password: DBUser.Monitor # system monitor password
pg_admin_username: dbuser_admin # system admin user
pg_admin_password: DBUser.Admin # system admin password
# - default roles - #
# chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
pg_default_roles:
# common production readonly user
- name: dbrole_readonly # production read-only roles
login: false
comment: role for global readonly access
# common production read-write user
- name: dbrole_readwrite # production read-write roles
login: false
roles: [dbrole_readonly] # read-write includes read-only access
comment: role for global read-write access
# offline have same privileges as readonly, but with limited hba access on offline instance only
# for the purpose of running slow queries, interactive queries and perform ETL tasks
- name: dbrole_offline
login: false
comment: role for restricted read-only access (offline instance)
# admin have the privileges to issue DDL changes
- name: dbrole_admin
login: false
bypassrls: true
comment: role for object creation
roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]
# dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
- name: postgres
superuser: true
comment: system superuser
# default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
- name: replicator
replication: true # for replication user
bypassrls: true # logical replication require bypassrls
roles: [pg_monitor, dbrole_readonly] # logical replication require select privileges
comment: system replicator
# default replication user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
- name: dbuser_monitor
connlimit: 16
comment: system monitor user
roles: [pg_monitor, dbrole_readonly]
# default admin user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
- name: dbuser_admin
bypassrls: true
superuser: true
comment: system admin user
roles: [dbrole_admin]
# default stats user, for ETL and slow queries
- name: dbuser_stats
password: DBUser.Stats
comment: business offline user for offline queries and ETL
roles: [dbrole_offline]
# - privileges - #
# object created by dbsu and admin will have their privileges properly set
pg_default_privileges:
- GRANT USAGE ON SCHEMAS TO dbrole_readonly
- GRANT SELECT ON TABLES TO dbrole_readonly
- GRANT SELECT ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
- GRANT USAGE ON SCHEMAS TO dbrole_offline
- GRANT SELECT ON TABLES TO dbrole_offline
- GRANT SELECT ON SEQUENCES TO dbrole_offline
- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline
- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
- GRANT CREATE ON SCHEMAS TO dbrole_admin
# - schemas - #
pg_default_schemas: [monitor] # default schemas to be created
# - extension - #
pg_default_extensions: # default extensions to be created
- { name: 'pg_stat_statements', schema: 'monitor' }
- { name: 'pgstattuple', schema: 'monitor' }
- { name: 'pg_qualstats', schema: 'monitor' }
- { name: 'pg_buffercache', schema: 'monitor' }
- { name: 'pageinspect', schema: 'monitor' }
- { name: 'pg_prewarm', schema: 'monitor' }
- { name: 'pg_visibility', schema: 'monitor' }
- { name: 'pg_freespacemap', schema: 'monitor' }
- { name: 'pg_repack', schema: 'monitor' }
- name: postgres_fdw
- name: file_fdw
- name: btree_gist
- name: btree_gin
- name: pg_trgm
- name: intagg
- name: intarray
# - hba - #
pg_offline_query: false # set to true to enable offline query on instance
pg_reload: true # reload postgres after hba changes
pg_hba_rules: # postgres host-based authentication rules
- title: allow meta node password access
role: common
rules:
- host all all 10.10.10.10/32 md5
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
- title: allow intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
- title: allow local read/write (local production user via pgbouncer)
role: common
rules:
- local all +dbrole_readonly md5
- host all +dbrole_readonly 127.0.0.1/32 md5
- title: allow offline query (ETL,SAGA,Interactive) on offline instance
role: offline
rules:
- host all +dbrole_offline 10.0.0.0/8 md5
- host all +dbrole_offline 172.16.0.0/12 md5
- host all +dbrole_offline 192.168.0.0/16 md5
pg_hba_rules_extra: [] # extra hba rules (for cluster/instance overwrite)
pgbouncer_hba_rules: # pgbouncer host-based authentication rules
- title: local password access
role: common
rules:
- local all all md5
- host all all 127.0.0.1/32 md5
- title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
pgbouncer_hba_rules_extra: [] # extra pgbouncer hba rules (for cluster/instance overwrite)
# pg_users: [] # business users
# pg_databases: [] # business databases
#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
exporter_install: none # none|yum|binary, none by default
exporter_repo_url: '' # if set, repo will be added to /etc/yum.repos.d/ before yum installation
# - collect - #
exporter_metrics_path: /metrics # default metric path for pg related exporter
# - node exporter - #
node_exporter_enabled: true # setup node_exporter on instance
node_exporter_port: 9100 # default port for node exporter
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'
# - pg exporter - #
pg_exporter_config: pg_exporter-demo.yaml # default config files for pg_exporter
pg_exporter_enabled: true # setup pg_exporter on instance
pg_exporter_port: 9630 # default port for pg exporter
pg_exporter_url: '' # optional, if not set, generate from reference parameters
# - pgbouncer exporter - #
pgbouncer_exporter_enabled: true # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
pgbouncer_exporter_port: 9631 # default port for pgbouncer exporter
pgbouncer_exporter_url: '' # optional, if not set, generate from reference parameters
#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100 # default load balance weight (instance level)
# - service - #
pg_services: # how to expose postgres service in cluster?
# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary # service name {{ pg_cluster }}_primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
pg_services_extra: [] # extra services to be added
# - haproxy - #
haproxy_enabled: true # enable haproxy among every cluster members
haproxy_reload: true # reload haproxy after config
haproxy_admin_auth_enabled: false # enable authentication for haproxy admin?
haproxy_admin_username: admin # default haproxy admin username
haproxy_admin_password: admin # default haproxy admin password
haproxy_exporter_port: 9101 # default admin/exporter port
haproxy_client_timeout: 3h # client side connection timeout
haproxy_server_timeout: 3h # server side connection timeout
# - vip - #
vip_mode: none # none | l2 | l4
vip_reload: true # whether reload service after config
# vip_address: 127.0.0.1 # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24 # virtual ip address cidr mask (l2 only)
# vip_interface: eth0 # virtual ip network interface (l2 only)
...
5.4.2 - 腾讯云VPC部署
使用腾讯云VPC虚拟机部署Pigsty
本样例将基于腾讯云VPC部署Pigsty
资源准备
申请虚拟机
买几台虚拟机,如下图所示,其中11这一台作为元节点,带有公网IP,数据库节点3台,普通1核1G即可。
配置SSH远程登录
现在假设我们的管理用户名为vonng
,就是我啦!现在首先配置我在元节点上到其他三台节点的ssh免密码访问。
# vonng@172.21.0.11 # meta
ssh-copy-id root@172.21.0.3 # pg-test-1
ssh-copy-id root@172.21.0.4 # pg-test-2
ssh-copy-id root@172.21.0.16 # pg-test-3
scp ~/.ssh/id_rsa.pub root@172.21.0.3:/tmp/
scp ~/.ssh/id_rsa.pub root@172.21.0.4:/tmp/
scp ~/.ssh/id_rsa.pub root@172.21.0.16:/tmp/
ssh root@172.21.0.3 'useradd vonng; mkdir -m 700 -p /home/vonng/.ssh; mv /tmp/id_rsa.pub /home/vonng/.ssh/authorized_keys; chown -R vonng /home/vonng; chmod 0600 /home/vonng/.ssh/authorized_keys;'
ssh root@172.21.0.4 'useradd vonng; mkdir -m 700 -p /home/vonng/.ssh; mv /tmp/id_rsa.pub /home/vonng/.ssh/authorized_keys; chown -R vonng /home/vonng; chmod 0600 /home/vonng/.ssh/authorized_keys;'
ssh root@172.21.0.16 'useradd vonng; mkdir -m 700 -p /home/vonng/.ssh; mv /tmp/id_rsa.pub /home/vonng/.ssh/authorized_keys; chown -R vonng /home/vonng; chmod 0600 /home/vonng/.ssh/authorized_keys;'
然后配置该用户免密码执行sudo的权限:
ssh root@172.21.0.3 "echo '%vonng ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/vonng"
ssh root@172.21.0.4 "echo '%vonng ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/vonng"
ssh root@172.21.0.16 "echo '%vonng ALL=(ALL) NOPASSWD: ALL' > /etc/sudoers.d/vonng"
# 校验配置是否成功
ssh 172.21.0.3 'sudo ls'
ssh 172.21.0.4 'sudo ls'
ssh 172.21.0.16 'sudo ls'
下载项目
# 从Github克隆代码
git clone https://github.com/Vonng/pigsty
# 如果您不能访问Github,也可以使用Pigsty CDN下载代码包
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/latest/pigsty.tar.gz -o pigsty.tgz && tar -xf pigsty.tgz && cd pigsty
下载离线安装包
# 从Github Release页面下载
# https://github.com/Vonng/pigsty
# 如果您不能访问Github,也可以使用Pigsty CDN下载离线软件包
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/latest/pkg.tgz -o files/pkg.tgz
# 将离线安装包解压至元节点指定位置 (也许要sudo)
mv -rf /www/pigsty /www/pigsty-backup && mkdir -p /www/pigsty
tar -xf files/pkg.tgz --strip-component=1 -C /www/pigsty/
调整配置
我们可以基于Pigsty沙箱的配置文件进行调整。因为都是普通低配虚拟机,因此不需要任何实质配置修改,只需要修改连接参数与节点信息即可。简单的说,只要改IP地址就可以了!
现在将沙箱中的IP地址全部替换为云环境中的实际IP地址。(如果使用了L2 VIP,VIP也需要替换为合理的地址)
说明 |
沙箱IP |
虚拟机IP |
|
元节点 |
10.10.10.10 |
172.21.0.11 |
|
数据库节点1 |
10.10.10.11 |
172.21.0.3 |
|
数据库节点2 |
10.10.10.12 |
172.21.0.4 |
|
数据库节点3 |
10.10.10.13 |
172.21.0.16 |
|
pg-meta VIP |
10.10.10.2 |
172.21.0.8 |
|
pg-test VIP |
10.10.10.3 |
172.21.0.9 |
|
编辑配置文件:pigsty.yml
,如果都是规格差不多的虚拟机,通常您只需要修改IP地址即可。特别需要注意的是在沙箱中我们是通过SSH Alias来连接的(诸如meta
, node-1
之类),记得移除所有ansible_host
配置,我们将直接使用IP地址连接目标节点。
cat pigsty.yml | \
sed 's/10.10.10.10/172.21.0.11/g' |\
sed 's/10.10.10.11/172.21.0.3/g' |\
sed 's/10.10.10.12/172.21.0.4/g' |\
sed 's/10.10.10.13/172.21.0.16/g' |\
sed 's/10.10.10.2/172.21.0.8/g' |\
sed 's/10.10.10.3/172.21.0.9/g' |\
sed 's/10.10.10.3/172.21.0.9/g' |\
sed 's/, ansible_host: meta//g' |\
sed 's/ansible_host: meta//g' |\
sed 's/, ansible_host: node-[123]//g' |\
sed 's/vip_interface: eth1/vip_interface: eth0/g' |\
sed 's/vip_cidrmask: 8/vip_cidrmask: 24/g' > pigsty2.yml
mv pigsty.yml pigsty-backup.yml; mv pigsty2.yml pigsty.yml
就这?
是的,配置文件已经修改完了!我们可以看看到底修改了什么东西
$ diff pigsty.yml pigsty-backup.yml
38c38
< hosts: {172.21.0.11: {}}
---
> hosts: {10.10.10.10: {ansible_host: meta}}
46c46
< 172.21.0.11: {pg_seq: 1, pg_role: primary}
---
> 10.10.10.10: {pg_seq: 1, pg_role: primary, ansible_host: meta}
109,111c109,111
< vip_address: 172.21.0.8 # virtual ip address
< vip_cidrmask: 24 # cidr network mask length
< vip_interface: eth0 # interface to add virtual ip
---
> vip_address: 10.10.10.2 # virtual ip address
> vip_cidrmask: 8 # cidr network mask length
> vip_interface: eth1 # interface to add virtual ip
120,122c120,122
< 172.21.0.3: {pg_seq: 1, pg_role: primary}
< 172.21.0.4: {pg_seq: 2, pg_role: replica}
< 172.21.0.16: {pg_seq: 3, pg_role: offline}
---
> 10.10.10.11: {pg_seq: 1, pg_role: primary, ansible_host: node-1}
> 10.10.10.12: {pg_seq: 2, pg_role: replica, ansible_host: node-2}
> 10.10.10.13: {pg_seq: 3, pg_role: offline, ansible_host: node-3}
147,149c147,149
< vip_address: 172.21.0.9 # virtual ip address
< vip_cidrmask: 24 # cidr network mask length
< vip_interface: eth0 # interface to add virtual ip
---
> vip_address: 10.10.10.3 # virtual ip address
> vip_cidrmask: 8 # cidr network mask length
> vip_interface: eth1 # interface to add virtual ip
326c326
< - 172.21.0.11 yum.pigsty
---
> - 10.10.10.10 yum.pigsty
329c329
< - 172.21.0.11
---
> - 10.10.10.10
393c393
< - server 172.21.0.11 iburst
---
> - server 10.10.10.10 iburst
417,430c417,430
< - 172.21.0.8 pg-meta # sandbox vip for pg-meta
< - 172.21.0.9 pg-test # sandbox vip for pg-test
< - 172.21.0.11 meta-1 # sandbox node meta-1 (node-0)
< - 172.21.0.3 node-1 # sandbox node node-1
< - 172.21.0.4 node-2 # sandbox node node-2
< - 172.21.0.16 node-3 # sandbox node node-3
< - 172.21.0.11 pigsty
< - 172.21.0.11 y.pigsty yum.pigsty
< - 172.21.0.11 c.pigsty consul.pigsty
< - 172.21.0.11 g.pigsty grafana.pigsty
< - 172.21.0.11 p.pigsty prometheus.pigsty
< - 172.21.0.11 a.pigsty alertmanager.pigsty
< - 172.21.0.11 n.pigsty ntp.pigsty
< - 172.21.0.11 h.pigsty haproxy.pigsty
---
> - 10.10.10.2 pg-meta # sandbox vip for pg-meta
> - 10.10.10.3 pg-test # sandbox vip for pg-test
> - 10.10.10.10 meta-1 # sandbox node meta-1 (node-0)
> - 10.10.10.11 node-1 # sandbox node node-1
> - 10.10.10.12 node-2 # sandbox node node-2
> - 10.10.10.13 node-3 # sandbox node node-3
> - 10.10.10.10 pigsty
> - 10.10.10.10 y.pigsty yum.pigsty
> - 10.10.10.10 c.pigsty consul.pigsty
> - 10.10.10.10 g.pigsty grafana.pigsty
> - 10.10.10.10 p.pigsty prometheus.pigsty
> - 10.10.10.10 a.pigsty alertmanager.pigsty
> - 10.10.10.10 n.pigsty ntp.pigsty
> - 10.10.10.10 h.pigsty haproxy.pigsty
442c442
< grafana_url: http://admin:admin@172.21.0.11:3000 # grafana url
---
> grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
478,480c478,480
< meta-1: 172.21.0.11 # you could use existing dcs cluster
< # meta-2: 172.21.0.3 # host which have their IP listed here will be init as server
< # meta-3: 172.21.0.4 # 3 or 5 dcs nodes are recommend for production environment
---
> meta-1: 10.10.10.10 # you could use existing dcs cluster
> # meta-2: 10.10.10.11 # host which have their IP listed here will be init as server
> # meta-3: 10.10.10.12 # 3 or 5 dcs nodes are recommend for production environment
692c692
< - host all all 172.21.0.11/32 md5
---
> - host all all 10.10.10.10/32 md5
执行剧本
您可以使用同样的 沙箱初始化 来完成 基础设施和数据库集群的初始化。
其输出结果除了IP地址,与沙箱并无区别。参考输出
访问Demo
现在,您可以通过公网IP访问元节点上的服务了!请注意做好信息安全工作。
与沙箱环境不同的是,如果您需要从公网访问Pigsty管理界面,需要自己把定义的域名写入/etc/hosts
中,或者使用真正申请的域名。
否则就只能通过IP端口直连的方式访问,例如: http://<meta_node_public_ip>:3000
。
Nginx监听的域名可以通过可以通过 nginx_upstream 选项。
nginx_upstream:
- { name: home, host: pigsty.cc, url: "127.0.0.1:3000"}
- { name: consul, host: c.pigsty.cc, url: "127.0.0.1:8500" }
- { name: grafana, host: g.pigsty.cc, url: "127.0.0.1:3000" }
- { name: prometheus, host: p.pigsty.cc, url: "127.0.0.1:9090" }
- { name: alertmanager, host: a.pigsty.cc, url: "127.0.0.1:9093" }
- { name: haproxy, host: h.pigsty.cc, url: "127.0.0.1:9091" }
5.4.3 - 生产环境部署
基于高规格硬件执行生产环境部署
本样例将基于一个真实生产环境作为样例。
该环境包括了200台高规格 x86 物理机:Dell R740 64核CPU / 400GB内存 / 4TB PCI-E SSD / 双万兆网卡
资源准备
调整配置
执行剧本
访问服务
5.4.4 - 集成阿里云MyBase
如何单独部署Pigsty监控系统,监控阿里云针MyBase for PostgreSQL
Pigsty内置了数据库供给方案,但也可以单纯作为监控系统与外部供给方案集成,例如阿里云MyBase for PostgreSQL。
与外部系统集成时,用户只需要部署一个元节点,用于设置监控基础设施。同时在监控目标机器上,需要安装Node Exporter与PG Exporter采集指标。
Pigsty提供了静态服务发现机制与Exporter二进制部署模式,以减少对外部系统的侵入。
下面将以一个实际例子介绍如何使用Pigsty监控阿里云MyBase。
资源申请
部署监控基础设施
部署监控Exporter
管理实例身份
更新实例列表
5.5 - 仅监控部署
如何将Pigsty与外部供给方案相集成,只使用Pigsty的监控系统部分。
如果用户只希望使用Pigsty的监控系统部分,比如希望使用Pigsty监控系统监控已有的PostgreSQL实例,那么可以使用 仅监控部署(monitor only) 模式。
仅监控模式的部署流程与标准模式大体上保持一致,但省略了很多步骤
- 在元节点上完成基础设施初始化的部分,与标准流程一致。
- 修改配置文件,在仅监控模式中,通常只需要修改监控系统部分的参数。
- 使用专用的剧本在数据库节点上完成仅监控部署,
./pgsql-monitor.yml
。
部署说明
监控用户
Pigsty在 PG供给 的阶段会创建监控用户,仅监控模式跳过了这些步骤,因此用户需要自行创建用于监控的用户。
用户需要自行在目标数据库集群上创建监控用户,并创建重要的监控模式与扩展(只有pg_stat_statements
是必选项)。在待监控数据库实例上执行以下SQL以创建监控用户。
-- 创建监控用户
CREATE USER "dbuser_monitor" ;
ALTER ROLE "dbuser_monitor" PASSWORD 'DBUser.Monitor';
ALTER USER "dbuser_monitor" CONNECTION LIMIT 16;
GRANT "pg_monitor" TO "dbuser_monitor";
GRANT "dbrole_readonly" TO "dbuser_monitor";
-- 创建监控模式与扩展
CREATE SCHEMA IF NOT EXISTS monitor;
GRANT USAGE ON SCHEMA monitor TO "dbuser_monitor";
CREATE EXTENSION IF NOT EXISTS "pg_stat_statements" WITH SCHEMA "monitor";
-- 额外的监控函数,用于监控共享内存指标,只有PG13及以上版本才需要。
CREATE OR REPLACE FUNCTION monitor.pg_shmem() RETURNS SETOF
pg_shmem_allocations AS $$ SELECT * FROM pg_shmem_allocations;$$ LANGUAGE SQL SECURITY DEFINER;
COMMENT ON FUNCTION monitor.pg_shmem() IS 'security wrapper for pg_shmem';
监控连接串
默认情况下,Pigsty会尝试使用以下规则生成数据库与连接池的连接串。
PG_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pg_port }}/{{ pg_default_database }}?host={{ pg_localhost }}&sslmode=disable'
PGBOUNCER_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pgbouncer_port }}/pgbouncer?host={{ pg_localhost }}&sslmode=disable'
如果用户使用的监控角色连接串无法通过该规则生成,则可以使用以下参数直接配置数据库与连接池的连接信息:
作为样例,沙箱环境中元节点连接至数据库的连接串为:
PG_EXPORTER_URL='postgres://dbuser_monitor:DBUser.Monitor@:5432/meta?host=/var/run/postgresql&sslmode=disable'
懒人方案
如果不怎么关心安全性与权限,也可以直接使用dbsu ident认证的方式,例如postgres
用户进行监控。
pg_exporter
默认以 dbsu
的用户执行,如果允许dbsu
通过本地ident
认证免密访问数据库(Pigsty默认配置),则可以直接使用超级用户监控数据库。Pigsty非常不推荐这种部署方式,但它确实很方便,既不用创建新用户,也不用配置权限。
PG_EXPORTER_URL='postgres:///postgres?host=/var/run/postgresql&sslmode=disable'
相关参数
使用仅监控部署时,只会用到Pigsty参数的一个子集。
基础设施部分
基础设施与元节点仍然与常规部署保持一致,除了以下两个参数必须强制使用指定的配置选项。
service_registry: none # 须关闭服务注册,因为目标环境可能没有DCS基础设施。
prometheus_sd_method: static # 须使用静态文件服务发现,因为目标实例可能并没有使用服务发现与服务注册
目标节点部分
目标节点的身份参数仍然为必选项,除此之外,通常只有监控系统参数需要调整。
---
#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
exporter_install: none # none|yum|binary, none by default
exporter_repo_url: '' # if set, repo will be added to /etc/yum.repos.d/ before yum installation
# - collect - #
exporter_metrics_path: /metrics # default metric path for pg related exporter
# - node exporter - #
node_exporter_enabled: true # setup node_exporter on instance
node_exporter_port: 9100 # default port for node exporter
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'
# - pg exporter - #
pg_exporter_config: pg_exporter-demo.yaml # default config files for pg_exporter
pg_exporter_enabled: true # setup pg_exporter on instance
pg_exporter_port: 9630 # default port for pg exporter
pg_exporter_url: '' # optional, if not set, generate from reference parameters
# - pgbouncer exporter - #
pgbouncer_exporter_enabled: true # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
pgbouncer_exporter_port: 9631 # default port for pgbouncer exporter
pgbouncer_exporter_url: '' # optional, if not set, generate from reference parameters
# - postgres variables reference - #
pg_dbsu: postgres
pg_port: 5432 # postgres port (5432 by default)
pgbouncer_port: 6432 # pgbouncer port (6432 by default)
pg_localhost: /var/run/postgresql # localhost unix socket dir for connection
pg_default_database: postgres # default database will be used as primary monitor target
pg_monitor_username: dbuser_monitor # system monitor username, for postgres and pgbouncer
pg_monitor_password: DBUser.Monitor # system monitor user's password
service_registry: consul # none | consul | etcd | both
...
通常来说,需要调整的参数包括:
exporter_install: binary # none|yum|binary 建议使用拷贝二进制的方式安装Exporter
pgbouncer_exporter_enabled: false # 如果目标实例没有关联的Pgbouncer实例,则需关闭Pgbouncer监控
pg_exporter_url: '' # 连接至 Postgres 的URL,如果不采用默认的URL拼合规则,则可使用此参数
pgbouncer_exporter_url: '' # 连接至 Pgbouncer 的URL,如果不采用默认的URL拼合规则,则可使用此参数
局限性
Pigsty监控系统 与 Pigsty供给方案 配合紧密,原装的总是最好的。尽管Pigsty并不推荐拆分使用,但这样做确实是可行的,只是存在一些局限性。
指标缺失
Pigsty会集成多种来源的指标,包括机器节点,数据库,Pgbouncer连接池,Haproxy负载均衡器。如果用户自己的供给方案中缺少这些组件,则相应指标也会发生缺失。
通常Node与PG的监控指标总是存在,而PGbouncer与Haproxy的缺失通常会导致100~200个不等的指标损失。
特别是,Pgbouncer监控指标中包含极其重要的PG QPS,TPS,RT,而这些指标是无法从PostgreSQL本身获取的。
工作假设
Pigsty监控系统 如果要与外部供给方案配合,监控已有数据库集群,需要一些工作假设:
- 数据库采用独占式部署,与节点存在一一对应关系。只有这样,节点指标才能有意义地与数据库指标关联。
- 目标节点可以被Ansible管理(NOPASS SSH与NOPASS SUDO),一些云厂商RDS产品并不允许这样做。
- 数据库需要创建可用于访问监控指标的监控用户,安装必须的监控模式与扩展,并合理配置其访问控制权限。
服务发现
外部供给方案通常拥有自己的身份管理机制,因此Pigsty不会越俎代庖地部署DCS用于服务发现。这意味着用户只能采用 静态配置文件 的方式管理监控对象的身份,通常这并不是一个问题。
在Pigsty沙箱中,当实例的角色身份发生变化时,系统会通过回调函数与反熵过程及时修正实例的角色信息,如将primary
修改为replica
,将其他角色修改为primary
。
pg_up{cls="pg-meta", ins="pg-meta-1", instance="10.10.10.10:9630", ip="10.10.10.10", job="pg", role="primary", svc="pg-meta-primary"}
但与外部供给方案集成时,除非用户显式通知或回调 监控系统,根据最新角色定义生成配置文件,否则监控系统无法意识到主从发生了切换。上面的样例监控指标中,role
与svc
标签会因为不及时的角色调整受到影响,这意味着Service
级别的监控数据准确性会受到影响(即pg:svc:*
系列指标,例如服务的QPS)。但其他层次的监控指标与图表不受主从切换影响,因此影响不大,且有其他办法解决。
管理权限
Pigsty的监控指标依赖 node_exporter
与 pg_exporter
获取。
尽管pg_exporter
可以采用exporter拉取远程数据库实例信息的方式部署,但node_exporter
必须部署在数据库所属的节点上。
这意味着,用户必须拥有数据库所在机器的SSH登陆与sudo
权限才能完成部署。换句话说,目标节点必须可以被Ansible纳入管理,而云厂商RDS通常不会给出此类权限。
6 - 配置
Pigsty提供的配置参数与定制选项
Pigsty采用声明式配置:用户配置描述状态,而Pigsty负责将真实组件调整至所期待的状态。
Pigsty配置文件遵循Ansible规则,采用YAML格式,详见配置文件 。
Pigsty包含了168个配置项,分为十类五级,详见配置项。
绝大多数配置参数无需修改,可直接使用默认值;定义新数据库集群只有三个必选身份参数。
6.1 - 配置文件
Pigsty配置文件的结构,内容,合并与拆分方式。
Pigsty配置文件遵循Ansible规则,采用YAML格式,默认使用单一配置文件,参考范例。
Pigsty的配置文件默认为 pigsty.yml
,配置文件需要与Ansible 配合使用,这是一个流行的DevOps工具。
用户可以在当前目录的 ansible.cfg
中指定默认配置文件路径,或在执行剧本时通过命令行参数:-i pigsty.yml
的方式显式指定配置文件路径。
配置文件结构
Pigsty的配置文件采用Ansible YAML Inventory格式,顶层结构如下:
all: # 顶层对象 all
vars: <123 keys> # 全局配置 all.vars
children: # 分组定义:all.children 每一个项目定义了一个数据库集群
meta: <2 keys>...
pg-meta: <2 keys>...
pg-test: <2 keys>... # 一个具体的数据库集群 pg-test 的详细定义
...
每一个具体的数据库集群,以Ansible Group的形式存在,如下所示:
pg-test: # 数据库集群名称默认作为群组名称
vars: # 数据库集群级别变量
pg_cluster: pg-test # 一个定义在集群级别的必选配置项,在整个pg-test中保持一致。
hosts: # 数据库集群成员
10.10.10.11: {pg_seq: 1, pg_role: primary} # 数据库实例成员
10.10.10.12: {pg_seq: 2, pg_role: replica} # 必须定义身份参数 pg_role 与 pg_seq
10.10.10.13: {pg_seq: 3, pg_role: offline} # 可以在此指定实例级别的变量
配置项
在Pigsty的配置文件中,配置项 可以出现在三种位置:
层级 |
范围 |
优先级 |
说明 |
位置 |
Global |
全局 |
低 |
在同一套部署环境内一致 |
all.vars.xxx |
Cluster |
集群 |
中 |
在同一套集群内保持一致 |
all.children.<cls>.vars.xxx |
Instance |
实例 |
高 |
最细粒度的配置层次 |
all.children.<cls>.hosts.<ins>.xxx |
每一个配置项都由一对键值组成。键是配置项的名称,值是配置项的内容。值的类型各异,详情请参考 配置项 。
集群vars
中定义的配置项会以同名键覆盖的方式覆盖全局配置项,实例中定义的配置项又会覆盖集群配置项与全局配置项。因此用户可以有的放矢,可以在不同层次,不同粒度上针对具体集群与具体实例进行精细配置。
分立式配置文件
有时候用户希望采用每个数据库集群一个配置文件的方式使用Pigsty,而不是共用一个巨大的配置清单。
这样做的好处是如果发生误操作,影响范围会局限在这个集群中,避免全局恶性事件。例如,下线某个集群时,错误地指定执行范围,有可能产生误删整个环境中所有数据库。
用户可以使用任何满足Ansible规则与和Pigsty变量层次语义的配置方式,但Pigsty推荐采用以下形式的配置文件拆分规则:
group_vars/all.yml
: 在这里定义所有全局变量。
group_vars/<pg_cluster>.yml
:在这里定义数据库集群<pg_cluster>
的集群变量。
pgsql/<pg_cluster>.yml
:在这里定义数据库集群<pg_cluster>
的实例成员,以及实例变量。
host_vars/<pg_instance>.yml
:如果单个实例的配置项非常复杂,可在此列为独立配置文件。
采用分立式配置文件的Pigsty沙箱目录结构如下所示:
pigsty
|
^- group_vars # 全局/集群 配置项定义 (此目录名称固定)
| ^------ all.yml # 全局配置项
| ^------ meta.yml # 元节点配置项
| ^------ pg-meta.yml # pg-meta集群配置项 (覆盖全局定义)
| ^------ pg-test.yml # pg-test集群配置项 (覆盖全局定义)
| ^------ <cluster>.yml # <pg_cluster>集群配置项(覆盖全局定义)
|
^- host_vars # 【可选】抽离实例级变量定义
|. ^------ 10.10.10.10. # 定义了10.10.10.10的实例级配置项 (覆盖全局/集群配置项定义)
|
^- pgsql # 集群成员定义/实例级配置项(此目录名称随意)
^------ pg-meta.yml # pg-meta成员与实例配置项(覆盖全局/集群配置项定义)
^------ pg-test.yml # pg-test成员与实例配置项(覆盖全局/集群配置项定义)
^------ <cluster>.yml # <pg_cluster>成员与实例配置项(覆盖全局/集群配置项定义)
6.2 - 配置项
介绍Pigsty中的配置项及其分类
配置项分类
Pigsty的配置项总计168个,按照领域分为以下10大类。
配置项粒度
Pigsty的参数可以在不同的粒度进行配置。
Pigsty默认提供三种粒度:全局,集群,实例。
在Pigsty的配置文件中,配置项 可以出现在三种位置。
粒度 |
范围 |
优先级 |
说明 |
位置 |
Global |
全局 |
低 |
在同一套部署环境内一致 |
all.vars.xxx |
Cluster |
集群 |
中 |
在同一套集群内保持一致 |
all.children.<cls>.vars.xxx |
Instance |
实例 |
高 |
最细粒度的配置层次 |
all.children.<cls>.hosts.<ins>.xxx |
每一个配置项都由一对键值组成。键是配置项的名称,值是配置项的内容。值的类型各异
集群vars
中定义的配置项会以同名键覆盖的方式覆盖全局配置项,实例中定义的配置项又会覆盖集群配置项与全局配置项。因此用户可以有的放矢,可以在不同层次,不同粒度上针对具体集群与具体实例进行精细配置。
除了配置项粒度中指定的三种配置粒度,Pigsty配置项目中还有两种额外的优先级。
- 默认:当一个配置项在全局/集群/实例级别都没有出现时,将使用默认配置项。默认值的优先级最低,所有配置项都有默认值。
- 参数:当用户通过命令行传入参数时,参数指定的配置项具有最高优先级,将覆盖一切层次的配置。一些配置项只能通过命令行参数的方式指定与使用。
层级 |
来源 |
优先级 |
说明 |
位置 |
Default |
默认 |
最低 |
代码逻辑定义的默认值 |
roles/<role>/default/main.yml |
Global |
全局 |
低 |
在同一套部署环境内一致 |
all.vars.xxx |
Cluster |
集群 |
中 |
在同一套集群内保持一致 |
all.children.<cls>.vars.xxx |
Instance |
实例 |
高 |
最细粒度的配置层次 |
all.children.<cls>.hosts.<ins>.xxx |
Argument |
参数 |
最高 |
通过命令行参数传入 |
-e |
配置项列表
6.3 - 连接参数
Pigsty中与连接、代理有关的参数
参数概览
参数详解
proxy_env
在某些受到“互联网封锁”的地区,有些软件的下载会受到影响。
例如,从中国大陆访问PostgreSQL的官方源,下载速度可能只有几KB每秒。但如果使用了合适的HTTP代理,则可以达到几MB每秒。因此如果用户有代理服务器,请通过proxy_env
进行配置,样例如下:
proxy_env: # global proxy env when downloading packages
http_proxy: 'http://username:password@proxy.address.com'
https_proxy: 'http://username:password@proxy.address.com'
all_proxy: 'http://username:password@proxy.address.com'
no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn"
ansible_host
如果用户的环境使用了跳板机,或者进行了某些定制化修改,无法通过简单的ssh <ip>
方式访问,那么可以考虑使用Ansible的连接参数。ansible_host
是ansiblel连接参数中最典型的一个。
Ansible中关于SSH连接的参数
-
ansible_host
The name of the host to connect to, if different from the alias you wish to give to it.
-
ansible_port
The ssh port number, if not 22
-
ansible_user
The default ssh user name to use.
-
ansible_ssh_pass
The ssh password to use (never store this variable in plain text; always use a vault. See Variables and Vaults)
-
ansible_ssh_private_key_file
Private key file used by ssh. Useful if using multiple keys and you don’t want to use SSH agent.
-
ansible_ssh_common_args
This setting is always appended to the default command line for sftp, scp, and ssh. Useful to configure a ProxyCommand
for a certain host (or group).
-
ansible_sftp_extra_args
This setting is always appended to the default sftp command line.
-
ansible_scp_extra_args
This setting is always appended to the default scp command line.
-
ansible_ssh_extra_args
This setting is always appended to the default ssh command line.
-
ansible_ssh_pipelining
Determines whether or not to use SSH pipelining. This can override the pipelining
setting in ansible.cfg
.
最简单的用法是将ssh alias
配置为ansible_host
,只要用户可以通过 ssh <name>
的方式访问目标机器,那么将ansible_host
配置为<name>
即可。
注意这些变量都是实例级别的变量。
Caveat
请注意,沙箱环境的默认配置使用了 SSH 别名 作为连接参数,这是因为vagrant宿主机访问虚拟机时使用了SSH别名配置。生产环境建议直接使用IP连接。
pg-meta:
hosts:
10.10.10.10: {pg_seq: 1, pg_role: primary, ansible_host: meta}
6.4 - 本地仓库
Pigsty中关于本地Yum源的配置项
Pigsty是一个复杂的软件系统,为了确保系统的稳定,Pigsty会在初始化过程中从互联网下载所有依赖的软件包并建立本地Yum源。
所有依赖的软件总大小约1GB左右,下载速度取决于您的网络情况。尽管Pigsty已经尽量使用镜像源以加速下载,但少量包的下载仍可能受到防火墙的阻挠,可能出现非常慢的情况。您可以通过proxy_env
配置项设置下载代理以完成首次下载,或直接下载预先打包好的离线安装包。
建立本地Yum源时,如果{{ repo_home }}/{{ repo_name }}
目录已经存在,而且里面有repo_complete
的标记文件,Pigsty会认为本地Yum源已经初始化完毕,因此跳过软件下载阶段,显著加快速度。离线安装包即是把{{ repo_home }}/{{ repo_name }}
目录整个打成压缩包。
参数概览
默认参数
repo_enabled: true # 是否启用本地源功能
repo_name: pigsty # 本地源名称
repo_address: yum.pigsty # 外部可访问的源地址 (ip:port 或 url)
repo_port: 80 # 源HTTP服务器监听地址
repo_home: /www # 默认根目录
repo_rebuild: false # 强制重新下载软件包
repo_remove: true # 移除已有的yum源
repo_upstreams: [...] # 上游Yum源
repo_packages: [...] # 需要下载的软件包
repo_url_packages: [...] # 通过URL下载的软件
参数详解
repo_enabled
如果为true
(默认情况),执行正常的本地yum源创建流程,否则跳过构建本地yum源的操作。
repo_name
本地yum源的名称,默认为pigsty
,您可以改为自己喜欢的名称,例如pgsql-rhel7
等。
repo_address
本地yum源对外提供服务的地址,可以是域名也可以是IP地址,默认为yum.pigsty
。
如果使用域名,您必须确保在当前环境中该域名会解析到本地源所在的服务器,也就是元节点。
如果您的本地yum源没有使用标准的80端口,您需要在地址中加入端口,并与repo_port
变量保持一致。
您可以通过节点参数中的静态DNS配置来为环境中的所有节点写入Pigsty
本地源的域名,沙箱环境中即是采用这种方式来解析默认的yum.pigsty
域名。
repo_port
本地yum源使用的HTTP端口,默认为80端口。
repo_home
本地yum源的根目录,默认为www
。
该目录将作为HTTP服务器的根对外暴露。
repo_rebuild
如果为false
(默认情况),什么都不发生,如果为true
,那么在任何情况下都会执行Repo重建的工作。
repo_remove
在执行本地源初始化的过程中,是否移除/etc/yum.repos.d
中所有已有的repo?默认为true
。
原有repo文件会备份至/etc/yum.repos.d/backup
中。
因为操作系统已有的源内容不可控,建议强制移除并通过repo_upstreams
进行显式配置。
repo_upstream
所有添加到/etc/yum.repos.d
中的Yum源,Pigsty将从这些源中下载软件。
Pigsty默认使用阿里云的CentOS7镜像源,清华大学Grafana镜像源,PackageCloud的Prometheus源,PostgreSQL官方源,以及SCLo,Harbottle,Nginx, Haproxy等软件源。
- name: base
description: CentOS-$releasever - Base - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
gpgcheck: no
failovermethod: priority
- name: updates
description: CentOS-$releasever - Updates - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
gpgcheck: no
failovermethod: priority
- name: extras
description: CentOS-$releasever - Extras - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
gpgcheck: no
failovermethod: priority
- name: epel
description: CentOS $releasever - EPEL - Aliyun Mirror
baseurl: http://mirrors.aliyun.com/epel/$releasever/$basearch
gpgcheck: no
failovermethod: priority
- name: grafana
description: Grafana - TsingHua Mirror
gpgcheck: no
baseurl: https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm
- name: prometheus
description: Prometheus and exporters
gpgcheck: no
baseurl: https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch
- name: pgdg-common
description: PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch
gpgcheck: no
baseurl: https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-$releasever-$basearch
- name: pgdg13
description: PostgreSQL 13 for RHEL/CentOS $releasever - $basearch - Updates testing
gpgcheck: no
baseurl: https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-$releasever-$basearch
- name: centos-sclo
description: CentOS-$releasever - SCLo
gpgcheck: no
mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo
- name: centos-sclo-rh
description: CentOS-$releasever - SCLo rh
gpgcheck: no
mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh
- name: nginx
description: Nginx Official Yum Repo
skip_if_unavailable: true
gpgcheck: no
baseurl: http://nginx.org/packages/centos/$releasever/$basearch/
- name: haproxy
description: Copr repo for haproxy
skip_if_unavailable: true
gpgcheck: no
baseurl: https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/
# for latest consul & kubernetes
- name: harbottle
description: Copr repo for main owned by harbottle
skip_if_unavailable: true
gpgcheck: no
baseurl: https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/
repo_packages
需要下载的rpm安装包列表,默认下载的软件包如下所示:
# - what to download - #
repo_packages:
# repo bootstrap packages
- epel-release nginx wget yum-utils yum createrepo # bootstrap packages
# node basic packages
- ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned # basic system util
- readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates # basic pg dependency
- numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet # system utils
# dcs & monitor packages
- grafana prometheus2 pushgateway alertmanager # monitor and ui
- node_exporter postgres_exporter nginx_exporter blackbox_exporter # exporter
- consul consul_exporter consul-template etcd # dcs
# python3 dependencies
- ansible python python-pip python-psycopg2 # ansible & python
- python3 python3-psycopg2 python36-requests python3-etcd python3-consul # python3
- python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography # python3 patroni extra deps
# proxy and load balancer
- haproxy keepalived dnsmasq # proxy and dns
# postgres common Packages
- patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity # major components
- pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel # other common utils
- barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail
# postgres 13 packages
- postgresql13* postgis31* citus_13 pgrouting_13 # postgres 13 and postgis 31
- pg_repack13 pg_squeeze13 # maintenance extensions
- pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13 # stats extensions
- plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13 # PL extensions # pl extensions
- hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13 # FDW extensions
- wal2json13 count_distinct13 ddlx_13 geoip13 orafce13 # MISC extensions
- rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13
- pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13
- pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13
- pguint13 pguri13 prefix13 safeupdate_13 semver13 table_version13 tdigest13
repo_url_packages
采用URL直接下载,而非yum下载的软件包。您可以将自定义的软件包连接添加到这里。
Pigsty默认会通过URL下载三款软件:
pg_exporter
(必须,监控系统核心组件)
vip-manager
(可选,启用VIP时必须)
polysh
(可选,多机管理便捷工具)
repo_url_packages:
- https://github.com/Vonng/pg_exporter/releases/download/v0.3.1/pg_exporter-0.3.1-1.el7.x86_64.rpm
- https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm
- http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm
6.5 - 节点供给
Pigsty中关于机器与操作系统、基础设施的配置参数
Pigsty中关于机器与操作系统、基础设施的配置参数
参数概览
默认配置
#------------------------------------------------------------------------------
# NODE PROVISION
#------------------------------------------------------------------------------
# this section defines how to provision nodes
# nodename: # if defined, node's hostname will be overwritten
# - node dns - #
node_dns_hosts: # static dns records in /etc/hosts
- 10.10.10.10 yum.pigsty
node_dns_server: add # add (default) | none (skip) | overwrite (remove old settings)
node_dns_servers: # dynamic nameserver in /etc/resolv.conf
- 10.10.10.10
node_dns_options: # dns resolv options
- options single-request-reopen timeout:1 rotate
- domain service.consul
# - node repo - #
node_repo_method: local # none|local|public (use local repo for production env)
node_repo_remove: true # whether remove existing repo
node_local_repo_url: # local repo url (if method=local, make sure firewall is configured or disabled)
- http://yum.pigsty/pigsty.repo
# - node packages - #
node_packages: # common packages for all nodes
- wget,yum-utils,sshpass,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
- numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
- python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
- python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
- node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
node_extra_packages: # extra packages for all nodes
- patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
node_meta_packages: # packages for meta nodes only
- grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
- dnsmasq,nginx,ansible,pgbadger,polysh
# build & devel packages (add to repo_packages too if you want build database & extensions from source)
# - gcc,gcc-c++,clang,coreutils,diffutils,rpm-build,rpm-devel,rpmlint,rpmdevtools
# - zlib-devel,openssl-libs,openssl-devel,pam-devel,libxml2-devel,libxslt-devel,openldap-devel,systemd-devel,tcl-devel,python-devel
# - node features - #
node_disable_numa: false # disable numa, important for production database, reboot required
node_disable_swap: false # disable swap, important for production database
node_disable_firewall: true # disable firewall (required if using kubernetes)
node_disable_selinux: true # disable selinux (required if using kubernetes)
node_static_network: true # keep dns resolver settings after reboot
node_disk_prefetch: false # setup disk prefetch on HDD to increase performance
# - node kernel modules - #
node_kernel_modules:
- softdog
- br_netfilter
- ip_vs
- ip_vs_rr
- ip_vs_rr
- ip_vs_wrr
- ip_vs_sh
- nf_conntrack_ipv4
# - node tuned - #
node_tune: tiny # install and activate tuned profile: none|oltp|olap|crit|tiny
node_sysctl_params: {} # set additional sysctl parameters, k:v format
# net.bridge.bridge-nf-call-iptables: 1 # example kv parameters
# - node user - #
node_admin_setup: true # setup an default admin user ?
node_admin_uid: 88 # uid and gid for admin user
node_admin_username: dba # default admin user: dba
node_admin_ssh_exchange: true # exchange admin's ssh key among cluster ?
node_admin_pk_current: false # add current user's ~/.ssh/id_rsa.pub to admin pk
node_admin_pks: # ssh public keys to be added to admin user
- 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com'
# - node ntp - #
node_ntp_service: ntp # ntp or chrony
node_ntp_config: true # overwrite existing ntp config?
node_timezone: Asia/Shanghai # default node timezone
node_ntp_servers: # default NTP servers
- pool cn.pool.ntp.org iburst
- pool pool.ntp.org iburst
- pool time.pool.aliyun.com iburst
- server 10.10.10.10 iburst
参数详解
nodename
如果配置了该参数,那么实例的HOSTNAM
将会被该名称覆盖。
该选项可用于为节点显式指定名称。如果要使用PG的实例名称作为节点名称,可以使用pg_hostname
选项
node_dns_hosts
机器节点的默认静态DNS解析记录,每一条记录都会在机器节点初始化时写入/etc/hosts
中,特别适合用于配置基础设施地址。
node_dns_hosts
是一个数组,每一个元素都是形如ip domain_name
的字符串,代表一条DNS解析记录。
默认情况下,Pigsty会向/etc/hosts
中写入10.10.10.10 yum.pigsty
,这样可以在DNS Nameserver启动之前,采用域名的方式访问本地yum源。
node_dns_server
机器节点默认的动态DNS服务器的配置方式,有三种模式:
add
:将node_dns_servers
中的记录追加至/etc/resolv.conf
,并保留已有DNS服务器。(默认)
overwrite
:使用将node_dns_servers
中的记录覆盖/etc/resolv.conf
none
:跳过DNS服务器配置
node_dns_servers
如果node_dns_server
配置为add
或overwrite
,则node_dns_servers
中的记录会被追加或覆盖至/etc/resolv.conf
中。具体格式请参考Linux文档关于/etc/resolv.conf
的说明。
Pigsty默认会添加元节点作为DNS Server,元节点上的DNSMASQ会响应环境中的DNS请求。
node_dns_servers: # dynamic nameserver in /etc/resolv.conf
- 10.10.10.10
node_dns_options
如果node_dns_server
配置为add
或overwrite
,则node_dns_options
中的记录会被追加或覆盖至/etc/resolv.conf
中。具体格式请参考Linux文档关于/etc/resolv.conf
的说明
Pigsty默认添加的解析选项为:
- options single-request-reopen timeout:1 rotate
- domain service.consul
node_repo_method
机器节点Yum软件源的配置方式,有三种模式:
local
:使用元节点上的本地Yum源,默认行为,推荐。
public
:直接使用互联网源安装,将repo_upstream
中的公共repo写入/etc/yum.repos.d/
none
:不对本地源进行配置与修改。
node_repo_remove
原有Yum源的处理方式,是否移除节点上原有的Yum源?
Pigsty默认会移除/etc/yum.repos.d
中原有的配置文件,并备份至/etc/yum.repos.d/backup
node_local_repo_url
如果node_repo_method
配置为local
,则这里列出的Repo文件URL会被下载至/etc/yum.repos.d
中
这里是一个Repo File URL 构成的数组,Pigsty默认会将元节点上的本地Yum源加入机器的源配置中。
node_local_repo_url:
- http://yum.pigsty/pigsty.repo
node_packages
通过yum安装的软件包列表。
软件包列表为数组,但每个元素可以包含由逗号分隔的多个软件包,Pigsty默认安装的软件包列表如下:
node_packages: # common packages for all nodes
- wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
- numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
- python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
- python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
- node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
通过yum安装的额外软件包列表。
与node_packages
类似,但node_packages
通常是全局统一配置,而node_extra_packages
则是针对具体节点进行例外处理。例如,您可以为运行PG的节点安装额外的工具包。该变量通常在集群和实例级别进行覆盖定义。
Pigsty默认安装的额外软件包列表如下:
- patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
通过yum安装的元节点软件包列表。
与node_packages
和node_extra_packages
类似,但node_meta_packages
中列出的软件包只会在元节点上安装。因此通常都是监控软件,管理工具,构建工具等。
Pigsty默认安装的元节点软件包列表如下:
node_meta_packages: # packages for meta nodes only
- grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
- dnsmasq,nginx,ansible,pgbadger,polysh
node_disable_numa
是否关闭Numa,注意,该选项需要重启后生效!默认不关闭,但生产环境强烈建议关闭NUMA。
node_disable_swap
是否禁用SWAP,如果您有足够的内存,且数据库采用独占式部署,建议直接关闭SWAP提高性能,默认关闭。
node_disable_firewall
是否关闭防火墙,这个东西非常讨厌,建议关闭,默认关闭。
node_disable_selinux
是否关闭SELinux,这个东西非常讨厌,建议关闭,默认关闭。
node_static_network
是否采用静态网络配置,默认启用
启用静态网络,意味着您的DNS Resolv配置不会因为机器重启与网卡变动被覆盖。建议启用。
node_disk_prefetch
是否启用磁盘预读?
针对HDD部署的实例可以优化吞吐量,默认关闭。
node_kernel_modules
需要安装的内核模块
Pigsty默认会启用以下内核模块
node_kernel_modules: [softdog, ip_vs, ip_vs_rr, ip_vs_rr, ip_vs_wrr, ip_vs_sh]
node_tune
针对机器进行调优的预制方案
node_sysctl_params
修改sysctl系统参数
字典KV结构
node_admin_setup
是否在每个节点上创建管理员用户(免密sudo与ssh),默认会创建。
Pigsty默认会创建名为admin (uid=88)
的管理用户,可以从元节点上通过SSH免密访问环境中的其他节点并执行免密sudo。
node_admin_uid
管理员用户的uid
,默认为88
node_admin_username
管理员用户的名称,默认为admin
node_admin_ssh_exchange
是否在当前执行命令的机器之间相互交换管理员用户的SSH密钥?
默认会执行交换,这样管理员可以在机器间快速跳转。
node_admin_pks
写入到管理员~/.ssh/authorized_keys
中的密钥
持有对应私钥的用户可以以管理员身份登陆。
node_admin_current_pk
布尔类型,通常用作命令行参数。用于将当前用户的SSH公钥(~/.ssh/id_rsa.pub)拷贝至管理员用户的authorized_keys
中。默认不拷贝。
node_ntp_service
指明系统使用的NTP服务类型:
ntp
:传统NTP服务
chrony
:CentOS 7/8默认使用的时间服务
node_ntp_config
是否覆盖现有NTP配置?
布尔选项,默认覆盖。
node_timezone
默认使用的时区
Pigsty默认使用Asia/Shanghai
,请根据您的实际情况调整。
node_ntp_servers
NTP服务器地址
Pigsty默认会使用以下NTP服务器
- pool cn.pool.ntp.org iburst
- pool pool.ntp.org iburst
- pool time.pool.aliyun.com iburst
- server 10.10.10.10 iburst
6.6 - 基础设施
Pigsty中关于基础设施的配置参数:CA,DNS,Nginx,Prometheus,Grafana
这一节定义了部署于元节点上的 基础设施 ,包括:
参数概览
默认参数
#------------------------------------------------------------------------------
# META PROVISION
#------------------------------------------------------------------------------
# - ca - #
ca_method: create # create|copy|recreate
ca_subject: "/CN=root-ca" # self-signed CA subject
ca_homedir: /ca # ca cert directory
ca_cert: ca.crt # ca public key/cert
ca_key: ca.key # ca private key
# - nginx - #
nginx_upstream:
- { name: home, host: pigsty, url: "127.0.0.1:3000"}
- { name: consul, host: c.pigsty, url: "127.0.0.1:8500" }
- { name: grafana, host: g.pigsty, url: "127.0.0.1:3000" }
- { name: prometheus, host: p.pigsty, url: "127.0.0.1:9090" }
- { name: alertmanager, host: a.pigsty, url: "127.0.0.1:9093" }
- { name: haproxy, host: h.pigsty, url: "127.0.0.1:9091" }
# - nameserver - #
dns_records: # dynamic dns record resolved by dnsmasq
- 10.10.10.2 pg-meta # sandbox vip for pg-meta
- 10.10.10.3 pg-test # sandbox vip for pg-test
- 10.10.10.10 meta-1 # sandbox node meta-1 (node-0)
- 10.10.10.11 node-1 # sandbox node node-1
- 10.10.10.12 node-2 # sandbox node node-2
- 10.10.10.13 node-3 # sandbox node node-3
- 10.10.10.10 pigsty
- 10.10.10.10 y.pigsty yum.pigsty
- 10.10.10.10 c.pigsty consul.pigsty
- 10.10.10.10 g.pigsty grafana.pigsty
- 10.10.10.10 p.pigsty prometheus.pigsty
- 10.10.10.10 a.pigsty alertmanager.pigsty
- 10.10.10.10 n.pigsty ntp.pigsty
- 10.10.10.10 h.pigsty haproxy.pigsty
# - prometheus - #
prometheus_data_dir: /export/prometheus/data # prometheus data dir
prometheus_options: '--storage.tsdb.retention=30d'
prometheus_reload: false # reload prometheus instead of recreate it
prometheus_sd_method: consul # service discovery method: static|consul|etcd
prometheus_scrape_interval: 5s # global scrape & evaluation interval
prometheus_scrape_timeout: 4s # scrape timeout
prometheus_sd_interval: 5s # service discovery refresh interval
# - grafana - #
grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
grafana_admin_password: admin # default grafana admin user password
grafana_plugin: install # none|install|reinstall
grafana_cache: /www/pigsty/grafana/plugins.tar.gz # path to grafana plugins tarball
grafana_customize: false # customize grafana resources
grafana_plugins: # default grafana plugins list
- redis-datasource
- simpod-json-datasource
- fifemon-graphql-datasource
- sbueringer-consul-datasource
- camptocamp-prometheus-alertmanager-datasource
- ryantxu-ajax-panel
- marcusolsson-hourly-heatmap-panel
- michaeldmoore-multistat-panel
- marcusolsson-treemap-panel
- pr0ps-trackmap-panel
- dalvany-image-panel
- magnesium-wordcloud-panel
- cloudspout-button-panel
- speakyourcode-button-panel
- jdbranham-diagram-panel
- grafana-piechart-panel
- snuids-radar-panel
- digrich-bubblechart-panel
grafana_git_plugins:
- https://github.com/Vonng/grafana-echarts
# - loki - #
loki_clean: false # whether remove existing loki data
loki_data_dir: /export/loki # default loki data dir
参数详解
ca_method
- create:创建新的公私钥用于CA
- copy:拷贝现有的CA公私钥用于构建CA
(Pigsty开源版暂未使用CA基础设施高级安全特性)
ca_subject
CA自签名的主题
默认主题为:
"/CN=root-ca"
ca_homedir
CA文件的根目录
默认为/ca
ca_cert
CA公钥证书名称
默认为:ca.crt
ca_key
CA私钥文件名称
默认为ca.key
nginx_upstream
Nginx上游服务的URL与域名
Nginx会通过Host进行流量转发,因此确保访问Pigsty基础设施服务时,配置有正确的域名。
不要修改name
部分的定义。
nginx_upstream:
- { name: home, host: pigsty, url: "127.0.0.1:3000"}
- { name: consul, host: c.pigsty, url: "127.0.0.1:8500" }
- { name: grafana, host: g.pigsty, url: "127.0.0.1:3000" }
- { name: prometheus, host: p.pigsty, url: "127.0.0.1:9090" }
- { name: alertmanager, host: a.pigsty, url: "127.0.0.1:9093" }
- { name: haproxy, host: h.pigsty, url: "127.0.0.1:9091" }
dns_records
动态DNS解析记录
每一条记录都会写入元节点的/etc/hosts
中,并由元节点上的域名服务器提供解析。
prometheus_data_dir
Prometheus数据目录
默认位于/export/prometheus/data
prometheus_options
Prometheus命令行参数
默认参数为:--storage.tsdb.retention=30d
,即保留30天的监控数据
参数prometheus_retention
的功能被此参数覆盖,于v0.6后弃用。
prometheus_reload
如果为true
,执行Prometheus任务时不会清除已有数据目录。
默认为:false
,即执行prometheus
剧本时会清除已有监控数据。
prometheus_sd_method
Prometheus使用的服务发现机制,默认为consul
,可选项:
consul
:基于Consul进行服务发现
static
:基于本地配置文件进行服务发现
Pigsty建议使用consul
服务发现,当服务器发生Failover时,监控系统会自动更正目标实例所注册的身份。
static
服务发现依赖/etc/prometheus/targets/*.yml
中的配置进行服务发现。采用这种方式的优势是不依赖Consul。当Pigsty监控系统与外部管控方案集成时,这种模式对原系统的侵入性较小。同时,当集群内发生主从切换时,您需要自行维护实例角色信息。
手动维护时,可以根据以下命令从配置文件生成Prometheus所需的监控对象配置文件。
./infra.yml --tags=prometheus_targtes,prometheus_reload
详细信息请参考:服务发现
prometheus_sd_target(过时)
目前Pigsty中Prometheus的服务发现对象统一采用集群模式管理,不再提供配置。
当 prometheus_sd_method == 'static'
时,监控目标定义文件管理的方式:
batch
:使用批量管理的单一配置文件:/etc/prometheus/targets/all.yml
single
:使用每个实例一个的配置文件:/etc/prometheus/targets/{{ pg_instance }}.yml
使用批量管理的单一配置文件管理简单,但用户必须使用默认的单一配置文件方式(即所有数据库集群的定义都在同一个配置文件中),才可以使用这种管理方式。
当使用分立式的配置文件(每个集群一个配置文件)时,用户需要使用 single
管理模式。每一个新数据库实例都会在元节点的 /etc/prometheus/targets/
目录下创建一个新的定义文件。
prometheus_scrape_interval
Prometheus抓取周期
默认为2s
,建议在生产环境中使用15s
。
prometheus_scrape_timeout
Prometheus抓取超时
默认为1s
,建议在生产环境中使用10s
,或根据实际需求进行配置。
prometheus_sd_interval
Prometheus刷新服务发现列表的周期
默认为5s
,建议在生产环境中使用更长的间隔,或根据实际需求进行配置。
prometheus_metrics_path (弃用)
Prometheus 抓取指标暴露器的URL路径,默认为/metrics
已经被外部变量引用exporter_metrics_path
所替代,不再生效。
prometheus_retention(弃用)
Prometheus数据保留期限,默认配置30天
参数prometheus_retention
的功能被参数prometheus_options
覆盖,于v0.6后弃用。
grafana_url
Grafana对外提供服务的端点,需要带上用户名与密码。
Grafana Provision的过程中会使用该URL调用Grafana API
grafana_admin_password
Grafana管理用户的密码
默认为admin
grafana_plugin
Grafana插件的供给方式
none
:不安装插件
install
: 安装Grafana插件(默认)
reinstall
: 强制重新安装Grafana插件
Grafana需要访问互联网以下载若干扩展插件,如果您的元节点没有互联网访问,离线安装包中已经包含了所有下载好的Grafana插件。Pigsty会在插件下载完成后重新制作新的插件缓存安装包。
grafana_cache
Grafana插件缓存文件地址
离线安装包中已经包含了所有下载并打包好的Grafana插件,如果插件包目录已经存在,Pigsty就不会尝试从互联网重新下载Grafana插件。
默认的离线插件缓存地址为:/www/pigsty/grafana/plugins.tar.gz
(假设本地Yum源名为pigsty
)
grafana_customize
标记,是否要定制Grafana
如果选择是,Grafana的Logo会被替换为Pigsty,你懂的。
grafana_plugins
Grafana插件列表
数组,每个元素是一个插件名称。
插件会通过grafana-cli plugins install
的方式进行安装。
默认安装的插件有:
grafana_plugins: # default grafana plugins list
- redis-datasource
- simpod-json-datasource
- fifemon-graphql-datasource
- sbueringer-consul-datasource
- camptocamp-prometheus-alertmanager-datasource
- ryantxu-ajax-panel
- marcusolsson-hourly-heatmap-panel
- michaeldmoore-multistat-panel
- marcusolsson-treemap-panel
- pr0ps-trackmap-panel
- dalvany-image-panel
- magnesium-wordcloud-panel
- cloudspout-button-panel
- speakyourcode-button-panel
- jdbranham-diagram-panel
- grafana-piechart-panel
- snuids-radar-panel
- digrich-bubblechart-panel
grafana_git_plugins
Grafana的Git插件
一些插件无法通过官方命令行下载,但可以通过Git Clone的方式下载,则可以考虑使用本参数。
数组,每个元素是一个插件名称。
插件会通过cd /var/lib/grafana/plugins && git clone
的方式进行安装。
默认会下载一个可视化插件:
grafana_git_plugins:
- https://github.com/Vonng/grafana-echarts
loki_clean
bool类型,命令行参数,用于指明安装Loki时是否先清理Loki数据目录?
Loki不属于默认安装的监控组件,该参数目前只会被 infra-loki.yml
剧本使用。
loki_data_dir
字符串类型,文件系统路径,用于指定Loki数据目录位置。
默认位于/export/loki/
Loki不属于默认安装的监控组件,该参数目前只会被 infra-loki.yml
剧本使用。
6.7 - 元数据库
Pigsty中关于元数据库(Consul/Etcd)的配置参数
Pigsty使用DCS(Distributive Configuration Storage)作为元数据库。DCS有三个重要作用:
- 主库选举:Patroni基于DCS进行选举与切换
- 配置管理:Patroni使用DCS管理Postgres的配置
- 身份管理:监控系统基于DCS管理并维护数据库实例的身份信息。
DCS对于数据库的稳定至关重要,Pigsty出于演示目的提供了基本的Consul与Etcd支持,在元节点部署了DCS服务。建议在生产环境中使用专用机器部署专用DCS集群。
参数概览
默认参数
#------------------------------------------------------------------------------
# DCS PROVISION
#------------------------------------------------------------------------------
service_registry: consul # where to register services: none | consul | etcd | both
dcs_type: consul # consul | etcd | both
dcs_name: pigsty # consul dc name | etcd initial cluster token
dcs_servers: # dcs server dict in name:ip format
meta-1: 10.10.10.10 # you could use existing dcs cluster
# meta-2: 10.10.10.11 # host which have their IP listed here will be init as server
# meta-3: 10.10.10.12 # 3 or 5 dcs nodes are recommend for production environment
dcs_exists_action: skip # abort|skip|clean if dcs server already exists
dcs_disable_purge: false # set to true to disable purge functionality for good (force dcs_exists_action = abort)
consul_data_dir: /var/lib/consul # consul data dir (/var/lib/consul by default)
etcd_data_dir: /var/lib/etcd # etcd data dir (/var/lib/consul by default)
参数详解
service_registry
服务注册的地址,被多个组件引用。
none
:不执行服务注册(当执行仅监控部署时,必须指定none
模式)
consul
:将服务注册至Consul中
etcd
:将服务注册至Etcd中(尚未支持)
dcs_type
DCS类型,有两种选项:
dcs_name
DCS集群名称
默认为pigsty
在Consul中代表 DataCenter名称
dcs_servers
DCS服务器名称与地址,采用字典格式,Key为DCS服务器实例名称,Value为对应的IP地址。
可以使用外部的已有DCS服务器,也可以在目标机器上初始化新的DCS服务器。
如果采用初始化新DCS实例的方式,建议先在所有DCS Server(通常也是元节点)上完成DCS初始化。
尽管您也可以一次性初始化所有的DCS Server与DCS Agent,但必须在完整初始化时将所有Server囊括在内。此时所有IP地址匹配dcs_servers
项的目标机器将会在DCS初始化过程中,初始化为DCS Server。
强烈建议使用奇数个DCS Server,演示环境可使用单个DCS Server,生产环境建议使用3~5个确保DCS可用性。
您必须根据实际情况显式配置DCS Server,例如在沙箱环境中,您可以选择启用1个或3个DCS节点。
dcs_servers:
meta-1: 10.10.10.10
meta-2: 10.10.10.11
meta-3: 10.10.10.12
dcs_exists_action
安全保险,当Consul实例已经存在时,系统应当执行的动作
- abort: 中止整个剧本的执行(默认行为)
- clean: 抹除现有DCS实例并继续(极端危险)
- skip: 忽略存在DCS实例的目标(中止),在其他目标机器上继续执行。
如果您真的需要强制清除已经存在的DCS实例,建议先使用pgsql-rm.yml
完成集群与实例的下线与销毁,在重新执行初始化。否则,则需要通过命令行参数-e dcs_exists_action=clean
完成覆写,强制在初始化过程中抹除已有实例。
dcs_disable_purge
双重安全保险,默认为false
。如果为true
,强制设置dcs_exists_action
变量为abort
。
等效于关闭dcs_exists_action
的清理功能,确保任何情况下DCS实例都不会被抹除。
consul_data_dir
Consul数据目录地址
默认为/var/lib/consul
etcd_data_dir
Etcd数据目录地址
默认为/var/lib/etcd
6.8 - PG安装
Pigsty中关于Postgres安装的相关参数
PG Install 部分负责在一台装有基本软件的机器上完成所有PostgreSQL依赖项的安装。用户可以配置数据库超级用户的名称、ID、权限、访问,配置安装所用的源,配置安装地址,安装的版本,所需的软件包与扩展插件。
这里的大多数参数只需要在整体升级数据库大版本时修改,用户可以通过pg_version
指定需要安装的软件版本,并在集群层面进行覆盖,为不同的集群安装不同的数据库版本。
参数概览
默认参数
#------------------------------------------------------------------------------
# POSTGRES INSTALLATION
#------------------------------------------------------------------------------
# - dbsu - #
pg_dbsu: postgres # os user for database, postgres by default (change it is not recommended!)
pg_dbsu_uid: 26 # os dbsu uid and gid, 26 for default postgres users and groups
pg_dbsu_sudo: limit # none|limit|all|nopass (Privilege for dbsu, limit is recommended)
pg_dbsu_home: /var/lib/pgsql # postgresql binary
pg_dbsu_ssh_exchange: false # exchange ssh key among same cluster
# - postgres packages - #
pg_version: 13 # default postgresql version
pgdg_repo: false # use official pgdg yum repo (disable if you have local mirror)
pg_add_repo: false # add postgres related repo before install (useful if you want a simple install)
pg_bin_dir: /usr/pgsql/bin # postgres binary dir
pg_packages:
- postgresql${pg_version}*
- postgis31_${pg_version}*
- pgbouncer patroni pg_exporter pgbadger
- patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity
- python3 python3-psycopg2 python36-requests python3-etcd python3-consul
- python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography
pg_extensions:
- pg_repack${pg_version} pg_qualstats${pg_version} pg_stat_kcache${pg_version} wal2json${pg_version}
# - ogr_fdw${pg_version} mysql_fdw_${pg_version} redis_fdw_${pg_version} mongo_fdw${pg_version} hdfs_fdw_${pg_version}
# - count_distinct${version} ddlx_${version} geoip${version} orafce${version} # popular features
# - hypopg_${version} ip4r${version} jsquery_${version} logerrors_${version} periods_${version} pg_auto_failover_${version} pg_catcheck${version}
# - pg_fkpart${version} pg_jobmon${version} pg_partman${version} pg_prioritize_${version} pg_track_settings${version} pgaudit15_${version}
# - pgcryptokey${version} pgexportdoc${version} pgimportdoc${version} pgmemcache-${version} pgmp${version} pgq-${version} pgquarrel pgrouting_${version}
# - pguint${version} pguri${version} prefix${version} safeupdate_${version} semver${version} table_version${version} tdigest${version}
参数详解
pg_dbsu
数据库默认使用的操作系统用户(超级用户)的用户名称,默认为postgres
,不建议修改。
pg_dbsu_uid
数据库默认使用的操作系统用户(超级用户)的UID,默认为26
。
与CentOS下PostgreSQL官方RPM包的配置一致,不建议修改。
pg_dbsu_sudo
数据库超级用户的默认权限:
none
:没有sudo权限
limit
:有限的sudo权限,可以执行数据库相关组件的systemctl命令,默认
all
:带有完整sudo
权限,但需要密码。
nopass
:不需要密码的完整sudo
权限(不建议)
pg_dbsu_home
数据库超级用户的家目录,默认为/var/lib/pgsql
pg_dbsu_ssh_exchange
是否在执行的机器之间交换超级用户的SSH公私钥
pg_version
希望安装的PostgreSQL版本,默认为13
建议在集群级别按需覆盖此变量。
pgdg_repo
标记,是否使用PostgreSQL官方源?默认不使用
使用该选项,可以在没有本地源的情况下,直接从互联网官方源下载安装PostgreSQL相关软件包。
pg_add_repo
如果使用,则会在安装PostgreSQL前添加PGDG的官方源
启用此选项,则可以在未执行基础设施初始化的前提下直接执行数据库初始化,尽管可能会很慢,但对于缺少基础设施的场景尤为实用。
pg_bin_dir
PostgreSQL二进制目录
默认为/usr/pgsql/bin/
,这是一个安装过程中手动创建的软连接,指向安装的具体Postgres版本目录。
例如/usr/pgsql -> /usr/pgsql-13
。
pg_packages
默认安装的PostgreSQL软件包
软件包中的${pg_version}
会被替换为实际安装的PostgreSQL版本。
pg_extensions
需要安装的PostgreSQL扩展插件软件包
软件包中的${pg_version}
会被替换为实际安装的PostgreSQL版本。
默认安装的插件包括:
pg_repack${pg_version}
pg_qualstats${pg_version}
pg_stat_kcache${pg_version}
wal2json${pg_version}
按需启用,但强烈建议安装pg_repack
扩展。
6.9 - PG供给
Pigsty中关于如何拉起一套数据库集群的定义参数
PG供给,是在一台安装完Postgres的机器上,创建并拉起一套数据库的过程,包括:
- 集群身份定义,清理现有实例,创建目录结构,拷贝工具与脚本,配置环境变量
- 渲染Patroni模板配置文件,使用Patroni拉起主库,使用Patroni拉起从库
- 配置Pgbouncer,初始化业务用户与数据库,将数据库与数据源服务注册至DCS。
参数概览
默认参数
#------------------------------------------------------------------------------
# POSTGRES PROVISION
#------------------------------------------------------------------------------
# - identity - #
# pg_cluster: # [REQUIRED] cluster name (validated during pg_preflight)
# pg_seq: 0 # [REQUIRED] instance seq (validated during pg_preflight)
# pg_role: replica # [REQUIRED] service role (validated during pg_preflight)
pg_hostname: false # overwrite node hostname with pg instance name
pg_nodename: true # overwrite consul nodename with pg instance name
# - retention - #
# pg_exists_action, available options: abort|clean|skip
# - abort: abort entire play's execution (default)
# - clean: remove existing cluster (dangerous)
# - skip: end current play for this host
# pg_exists: false # auxiliary flag variable (DO NOT SET THIS)
pg_exists_action: clean
pg_disable_purge: false # set to true to disable pg purge functionality for good (force pg_exists_action = abort)
# - storage - #
pg_data: /pg/data # postgres data directory
pg_fs_main: /export # data disk mount point /pg -> {{ pg_fs_main }}/postgres/{{ pg_instance }}
pg_fs_bkup: /var/backups # backup disk mount point /pg/* -> {{ pg_fs_bkup }}/postgres/{{ pg_instance }}/*
# - connection - #
pg_listen: '0.0.0.0' # postgres listen address, '0.0.0.0' by default (all ipv4 addr)
pg_port: 5432 # postgres port (5432 by default)
pg_localhost: /var/run/postgresql # localhost unix socket dir for connection
# - patroni - #
# patroni_mode, available options: default|pause|remove
# - default: default ha mode
# - pause: into maintenance mode
# - remove: remove patroni after bootstrap
patroni_mode: default # pause|default|remove
pg_namespace: /pg # top level key namespace in dcs
patroni_port: 8008 # default patroni port
patroni_watchdog_mode: automatic # watchdog mode: off|automatic|required
pg_conf: tiny.yml # user provided patroni config template path
# - localization - #
pg_encoding: UTF8 # default to UTF8
pg_locale: C # default to C
pg_lc_collate: C # default to C
pg_lc_ctype: en_US.UTF8 # default to en_US.UTF8
# - pgbouncer - #
pgbouncer_port: 6432 # pgbouncer port (6432 by default)
pgbouncer_poolmode: transaction # pooling mode: (transaction pooling by default)
pgbouncer_max_db_conn: 100 # important! do not set this larger than postgres max conn or conn limit
身份参数
pg_cluster
,pg_role
,pg_seq
属于 身份参数
除了IP地址外,这三个参数是定义一套新的数据库集群的最小必须参数集,如下面的配置所示。
其他参数都可以继承自全局配置或默认配置,但身份参数必须显式指定,手工分配。
pg_cluster
标识了集群的名称,在集群层面进行配置。
pg_role
在实例层面进行配置,标识了实例的角色,只有primary
角色会进行特殊处理,如果不填,默认为replica
角色,此外,还有特殊的delayed
与offline
角色。
pg_seq
用于在集群内标识实例,通常采用从0或1开始递增的整数,一旦分配不再更改。
{{ pg_cluster }}-{{ pg_seq }}
被用于唯一标识实例,即pg_instance
{{ pg_cluster }}-{{ pg_role }}
用于标识集群内的服务,即pg_service
pg-test:
hosts:
10.10.10.11: {pg_seq: 1, pg_role: replica}
10.10.10.12: {pg_seq: 2, pg_role: primary}
10.10.10.13: {pg_seq: 3, pg_role: replica}
vars:
pg_cluster: pg-test
参数详解
pg_cluster
PG数据库集群的名称,将用作集群内资源的命名空间。
集群命名需要遵循特定命名规则:[a-z][a-z0-9-]*
,以兼容不同约束对身份标识的要求。
身份参数,必填参数,集群级参数
pg_seq
数据库实例的序号,在集群内部唯一,用于区别与标识集群内的不同实例,从0或1开始分配。
身份参数,必填参数,实例级参数
pg_role
数据库实例的角色,默认角色包括:primary
, replica
。
后续可选角色包括:offline
与delayed
。
身份参数,必填参数,实例级参数
pg_shard
只有分片集群需要设置此参数。
当多个数据库集群以水平分片的方式共同服务于同一个 业务时,Pigsty将这一组集群称为 分片集簇(Sharding Cluster) 。pg_shard
是数据库集群所属分片集簇的名称,一个分片集簇可以指定任意名称,但Pigsty建议采用具有意义的命名规则。
例如参与分片集簇的集群,可以使用 分片集簇名 pg_shard
+ shard
+ 集群所属分片编号pg_sindex
构成集群名称:
shard: test
pg-testshard1
pg-testshard2
pg-testshard3
pg-testshard4
身份参数,可选参数,集群级参数
pg_sindex
集群在分片集簇中的编号,通常从0或1开始依次分配。
只有分片集群需要设置此参数。
身份参数,选填参数,集群级参数
pg_hostname
是否将PG实例的名称pg_instance
注册为主机名,默认禁用。
pg_nodename
是否将PG实例的名称注册为Consul中的节点名称,默认启用。
pg_exists
PG实例是否存在的标记位,不可配置。
pg_exists_action
安全保险,当PostgreSQL实例已经存在时,系统应当执行的动作
- abort: 中止整个剧本的执行(默认行为)
- clean: 抹除现有实例并继续(极端危险)
- skip: 忽略存在实例的目标(中止),在其他目标机器上继续执行。
如果您真的需要强制清除已经存在的数据库实例,建议先使用pgsql-rm.yml
完成集群与实例的下线与销毁,在重新执行初始化。否则,则需要通过命令行参数-e pg_exists_action=clean
完成覆写,强制在初始化过程中抹除已有实例。
pg_disable_purge
双重安全保险,默认为false
。如果为true
,强制设置pg_exists_action
变量为abort
。
等效于关闭pg_exists_action
的清理功能,确保任何情况下Postgres实例都不会被抹除。
这意味着您需要通过专用下线脚本pgsql-rm.yml
来完成已有实例的清理,然后才可以在清理干净的节点上重新完成数据库的初始化。
pg_data
默认数据目录,默认为/pg/data
pg_fs_main
主数据盘目录,默认为/export
Pigsty的默认目录结构假设系统中存在一个主数据盘挂载点,用于盛放数据库目录。
pg_fs_bkup
归档与备份盘目录,默认为/var/backups
Pigsty的默认目录结构假设系统中存在一个备份数据盘挂载点,用于盛放备份与归档数据。备份盘并不是必选项,如果系统中不存在备份盘,用户也可以指定一个主数据盘上的子目录作为备份盘根目录挂载点。
pg_listen
数据库监听的IP地址,默认为所有IPv4地址0.0.0.0
,如果要包括所有IPv6地址,可以使用*
。
pg_port
数据库监听的端口,默认端口为5432
,不建议修改。
pg_localhost
Unix Socket目录,用于盛放PostgreSQL与Pgbouncer的Unix socket文件。
默认为/var/run/postgresql
pg_upstream
实例级配置项,内容为IP地址或主机名,用于指明流复制上游节点。
当为集群的从库配置该参数时,填入的IP地址必须为集群内的其他节点。实例会从该节点进行流复制,此选项可用于构建级连复制。
当为集群的主库配置该参数时,意味着整个集群将以 备份集群(Standby Cluster) 的形式运行,从上游节点接受变更。集群中的primary
将扮演standby leader
的角色。
pg_backup
标记,实例级配置项,带有该标记的实例会用于存储基础备份(开源版Pigsty不提供此功能)
pg_delay
若实例为延迟从库,采用的延迟时长。(开源版Pigsty不提供此功能)。
使用PG接受的时间区间字符串格式,如1h
,30min
等。
patroni_mode
Patroni的工作模式:
default
: 启用Patroni
pause
: 启用Patroni,但在完成初始化后自动进入维护模式(不自动执行主从切换)
remove
: 依然使用Patroni初始化集群,但初始化完成后移除Patroni
pg_namespace
Patroni在DCS中使用的KV存储顶层命名空间
默认为pg
patroni_port
Patroni API服务器默认监听的端口
默认端口为8008
patroni_watchdog_mode
当发生主从切换时,Patroni会尝试在提升从库前关闭主库。如果指定超时时间内主库仍未成功关闭,Patroni会根据配置使用Linux内核功能softdog进行fencing关机。
off
:不使用watchdog
automatic
:如果内核启用了softdog
,则启用watchdog
,不强制,默认行为。
required
:强制使用watchdog
,如果系统未启用softdog
则拒绝启动。
pg_conf
拉起Postgres集群所用的Patroni模板。Pigsty预制了4种模板
oltp.yml
常规OLTP模板,默认配置
olap.yml
OLAP模板,提高并行度,针对吞吐量优化,针对长时间运行的查询进行优化。
crit.yml
) 核心业务模板,基于OLTP模板针对安全性,数据完整性进行优化,采用同步复制,强制启用数据校验和。
tiny.yml
微型数据库模板,针对低资源场景进行优化,例如运行于虚拟机中的演示数据库集群。
pg_encoding
PostgreSQL实例初始化时,使用的字符集编码。
默认为UTF8
,如果没有特殊需求,不建议修改此参数。
pg_locale
PostgreSQL实例初始化时,使用的本地化规则。
默认为C
,如果没有特殊需求,不建议修改此参数。
pg_lc_collate
PostgreSQL实例初始化时,使用的本地化字符串排序规则。
默认为C
,如果没有特殊需求,强烈不建议修改此参数。用户总是可以通过COLLATE
表达式实现本地化排序相关功能,错误的本地化排序规则可能导致某些操作产生成倍的性能损失,请在真的有本地化需求的情况下修改此参数。
pg_lc_ctype
PostgreSQL实例初始化时,使用的本地化字符集定义
默认为en_US.UTF8
,因为一些PG扩展(pg_trgm
)需要额外的字符分类定义才可以针对国际化字符正常工作,因此Pigsty默认会使用en_US.UTF8
字符集定义,不建议修改此参数。
pgbouncer_port
Pgbouncer连接池默认监听的端口
默认为6432
pgbouncer_poolmode
Pgbouncer连接池默认使用的Pool模式
默认为transaction
,即事务级连接池。其他可选项包括:session|statemente
pgbouncer_max_db_conn
允许连接池与单个数据库之间建立的最大连接数
默认值为100
使用事务Pooling模式时,活跃服务端连接数通常处于个位数。如果采用会话Pooling,可以适当增大此参数。
6.10 - PG模板
Pigsty中关于定制Postgres模板的相关参数
PG Provision负责拉起一套全新的Postgres集群,而PG Template负责在PG Provision的基础上,在这套全新的数据库集群中创建默认的对象,包括
- 基本角色:只读角色,读写角色、管理角色
- 基本用户:复制用户、超级用户、监控用户、管理用户
- 模板数据库中的默认权限
- 默认 模式
- 默认 扩展
- HBA黑白名单规则
关于定制数据库集群的更多信息,请参考 定制集群
参数概览
默认参数
#------------------------------------------------------------------------------
# POSTGRES TEMPLATE
#------------------------------------------------------------------------------
# - template - #
pg_init: pg-init # init script for cluster template
# - system roles - #
pg_replication_username: replicator # system replication user
pg_replication_password: DBUser.Replicator # system replication password
pg_monitor_username: dbuser_monitor # system monitor user
pg_monitor_password: DBUser.Monitor # system monitor password
pg_admin_username: dbuser_admin # system admin user
pg_admin_password: DBUser.Admin # system admin password
# - default roles - #
# chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
pg_default_roles:
# common production readonly user
- name: dbrole_readonly # production read-only roles
login: false
comment: role for global readonly access
# common production read-write user
- name: dbrole_readwrite # production read-write roles
login: false
roles: [dbrole_readonly] # read-write includes read-only access
comment: role for global read-write access
# offline have same privileges as readonly, but with limited hba access on offline instance only
# for the purpose of running slow queries, interactive queries and perform ETL tasks
- name: dbrole_offline
login: false
comment: role for restricted read-only access (offline instance)
# admin have the privileges to issue DDL changes
- name: dbrole_admin
login: false
bypassrls: true
comment: role for object creation
roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]
# dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
- name: postgres
superuser: true
comment: system superuser
# default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
- name: replicator
replication: true
roles: [pg_monitor, dbrole_readonly]
comment: system replicator
# default replication user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
- name: dbuser_monitor
connlimit: 16
comment: system monitor user
roles: [pg_monitor, dbrole_readonly]
# default admin user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
- name: dbuser_admin
bypassrls: true
comment: system admin user
roles: [dbrole_admin]
# default stats user, for ETL and slow queries
- name: dbuser_stats
password: DBUser.Stats
comment: business offline user for offline queries and ETL
roles: [dbrole_offline]
# - privileges - #
# object created by dbsu and admin will have their privileges properly set
pg_default_privileges:
- GRANT USAGE ON SCHEMAS TO dbrole_readonly
- GRANT SELECT ON TABLES TO dbrole_readonly
- GRANT SELECT ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
- GRANT USAGE ON SCHEMAS TO dbrole_offline
- GRANT SELECT ON TABLES TO dbrole_offline
- GRANT SELECT ON SEQUENCES TO dbrole_offline
- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline
- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
- GRANT CREATE ON SCHEMAS TO dbrole_admin
# - schemas - #
pg_default_schemas: [monitor] # default schemas to be created
# - extension - #
pg_default_extensions: # default extensions to be created
- { name: 'pg_stat_statements', schema: 'monitor' }
- { name: 'pgstattuple', schema: 'monitor' }
- { name: 'pg_qualstats', schema: 'monitor' }
- { name: 'pg_buffercache', schema: 'monitor' }
- { name: 'pageinspect', schema: 'monitor' }
- { name: 'pg_prewarm', schema: 'monitor' }
- { name: 'pg_visibility', schema: 'monitor' }
- { name: 'pg_freespacemap', schema: 'monitor' }
- { name: 'pg_repack', schema: 'monitor' }
- name: postgres_fdw
- name: file_fdw
- name: btree_gist
- name: btree_gin
- name: pg_trgm
- name: intagg
- name: intarray
# - hba - #
pg_offline_query: false # set to true to enable offline query on instance
pg_hba_rules: # postgres host-based authentication rules
- title: allow meta node password access
role: common
rules:
- host all all 10.10.10.10/32 md5
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
- title: allow intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
- title: allow local read/write (local production user via pgbouncer)
role: common
rules:
- local all +dbrole_readonly md5
- host all +dbrole_readonly 127.0.0.1/32 md5
- title: allow offline query (ETL,SAGA,Interactive) on offline instance
role: offline
rules:
- host all +dbrole_offline 10.0.0.0/8 md5
- host all +dbrole_offline 172.16.0.0/12 md5
- host all +dbrole_offline 192.168.0.0/16 md5
pg_hba_rules_extra: [] # extra hba rules (for cluster/instance overwrite)
pgbouncer_hba_rules: # pgbouncer host-based authentication rules
- title: local password access
role: common
rules:
- local all all md5
- host all all 127.0.0.1/32 md5
- title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
pgbouncer_hba_rules_extra: [] # extra pgbouncer hba rules (for cluster/instance overwrite)
参数详解
pg_init
用于初始化数据库模板的Shell脚本位置,默认为pg-init
,该脚本会被拷贝至/pg/bin/pg-init
后执行。
默认的pg-init
只是预渲染SQL命令的包装:
# system default roles
psql postgres -qAXwtf /pg/tmp/pg-init-roles.sql
# system default template
psql template1 -qAXwtf /pg/tmp/pg-init-template.sql
# make postgres same as templated database (optional)
psql postgres -qAXwtf /pg/tmp/pg-init-template.sql
用户可以在自定义的pg-init
脚本中添加自己的集群初始化逻辑。
pg_replication_username
用于执行PostgreSQL流复制的数据库用户名
默认为replicator
pg_replication_password
用于执行PostgreSQL流复制的数据库用户密码,必须使用明文
默认为DBUser.Replicator
,强烈建议修改!
pg_monitor_username
用于执行PostgreSQL与Pgbouncer监控任务的数据库用户名
默认为dbuser_monitor
pg_monitor_password
用于执行PostgreSQL与Pgbouncer监控任务的数据库用户密码,必须使用明文
默认为DBUser.Monitor
,强烈建议修改!
pg_admin_username
用于执行PostgreSQL数据库管理任务(DDL变更)的数据库用户名
默认为dbuser_admin
pg_admin_password
用于执行PostgreSQL数据库管理任务(DDL变更)的数据库用户密码,必须使用明文
默认为DBUser.Admin
,强烈建议修改!
pg_default_roles
定义了PostgreSQL中默认的角色与用户,形式为对象数组,每一个对象定义一个用户或角色。
每一个用户或角色必须指定 name
,其余字段均为可选项。
-
password
是可选项,如果留空则不设置密码,可以使用MD5密文密码。
-
login
, superuser
, createdb
, createrole
, inherit
, replication
, bypassrls
都是布尔类型,用于设置用户属性。如果不设置,则采用系统默认值。
-
用户通过CREATE USER
创建,所以默认具有login
属性,如果创建的是角色,需要指定login: false
。
-
expire_at
与expire_in
用于控制用户过期时间,expire_at
使用形如YYYY-mm-DD
的日期时间戳。expire_in
使用从现在开始的过期天数,如果expire_in
存在则会覆盖expire_at
选项。
-
新用户默认不会添加至Pgbouncer用户列表中,必须显式定义pgbouncer: true
,该用户才会被加入到Pgbouncer用户列表。
-
用户/角色会按顺序创建,后面定义的用户可以属于前面定义的角色。
pg_users:
# complete example of user/role definition for production user
- name: dbuser_meta # example production user have read-write access
password: DBUser.Meta # example user's password, can be encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
connlimit: -1 # connection limit, -1 disable limit
expire_at: '2030-12-31' # 'timestamp' when this role is expired
expire_in: 365 # now + n days when this role is expired (OVERWRITE expire_at)
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly|dbrole_offline
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
parameters: # user's default search path
search_path: public
comment: test user
Pigsty定义了由四个默认角色与四个默认用户组成的基本访问控制系统,详细信息请参考 访问控制。
pg_default_privileges
定义数据库模板中的默认权限。
任何由{{ dbsu」}}
与{{ pg_admin_username }}
创建的对象都会具有以下默认权限:
pg_default_privileges:
- GRANT USAGE ON SCHEMAS TO dbrole_readonly
- GRANT SELECT ON TABLES TO dbrole_readonly
- GRANT SELECT ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
- GRANT USAGE ON SCHEMAS TO dbrole_offline
- GRANT SELECT ON TABLES TO dbrole_offline
- GRANT SELECT ON SEQUENCES TO dbrole_offline
- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline
- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
- GRANT CREATE ON SCHEMAS TO dbrole_admin
详细信息请参考 访问控制。
pg_default_schemas
创建于模版数据库的默认模式
Pigsty默认会创建名为monitor
的模式用于安装监控扩展。
pg_default_schemas: [monitor] # default schemas to be created
pg_default_extensions
默认安装于模板数据库的扩展,对象数组。
如果没有指定schema
字段,扩展会根据当前的search_path
安装至对应模式中。
pg_default_extensions:
- { name: 'pg_stat_statements', schema: 'monitor' }
- { name: 'pgstattuple', schema: 'monitor' }
- { name: 'pg_qualstats', schema: 'monitor' }
- { name: 'pg_buffercache', schema: 'monitor' }
- { name: 'pageinspect', schema: 'monitor' }
- { name: 'pg_prewarm', schema: 'monitor' }
- { name: 'pg_visibility', schema: 'monitor' }
- { name: 'pg_freespacemap', schema: 'monitor' }
- { name: 'pg_repack', schema: 'monitor' }
- name: postgres_fdw
- name: file_fdw
- name: btree_gist
- name: btree_gin
- name: pg_trgm
- name: intagg
- name: intarray
pg_offline_query
实例级变量,布尔类型,默认为false
。
设置为true
时,无论当前实例的角色为何,用户组dbrole_offline
都可以连接至该实例并执行离线查询。
对于实例数量较少(例如一主一从)的情况较为实用,用户可以将唯一的从库标记为pg_offline_query = true
,从而接受ETL,慢查询与交互式访问。详细信息请参考 访问控制-离线用户。
pg_reload
命令行参数,布尔类型,默认为true
。
设置为true
时,Pigsty会在生成HBA规则后立刻执行pg_ctl reload
应用。
当您希望生成pg_hba.conf
文件,并手工比较后再应用生效时,可以指定-e pg_reload=false
来禁用它。
pg_hba_rules
设置数据库的客户端IP黑白名单规则。对象数组,每一个对象都代表一条规则。
每一条规则由三部分组成:
title
,规则标题,会转换为HBA文件中的注释
role
,应用角色,common
代表应用至所有实例,其他取值(如replica
, offline
)则仅会安装至匹配的角色上。例如role='replica'
代表这条规则只会应用到pg_role == 'replica'
的实例上。
rules
,字符串数组,每一条记录代表一条最终写入pg_hba.conf
的规则。
作为一个特例,role == 'offline'
的HBA规则,还会额外安装至 pg_offline_query == true
的实例上。
pg_hba_rules:
- title: allow meta node password access
role: common
rules:
- host all all 10.10.10.10/32 md5
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
- title: allow intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
- title: allow local read-write access (local production user via pgbouncer)
role: common
rules:
- local all +dbrole_readwrite md5
- host all +dbrole_readwrite 127.0.0.1/32 md5
- title: allow read-only user (stats, personal) password directly access
role: replica
rules:
- local all +dbrole_readonly md5
- host all +dbrole_readonly 127.0.0.1/32 md5
建议在全局配置统一的pg_hba_rules
,针对特定集群使用pg_hba_rules_extra
进行额外定制。
与pg_hba_rules
类似,但通常用于集群层面的HBA规则设置。
pg_hba_rules_extra
会以同样的方式 追加 至pg_hba.conf
中。
如果用户需要彻底覆写集群的HBA规则,即不想继承全局HBA配置,则应当在集群层面配置pg_hba_rules
并覆盖全局配置。
pgbouncer_hba_rules
与pg_hba_rules
类似,用于Pgbouncer的HBA规则设置。
默认的Pgbouncer HBA规则很简单,用户可以按照自己的需求进行定制。
默认的Pgbouncer HBA规则较为宽松:
- 允许从本地使用密码登陆
- 允许从内网网断使用密码登陆
pgbouncer_hba_rules:
- title: local password access
role: common
rules:
- local all all md5
- host all all 127.0.0.1/32 md5
- title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
与pg_hba_rules_extras
类似,用于在集群层次对Pgbouncer的HBA规则进行额外配置。
业务模板
以下两个参数属于业务模板,用户应当在这里定义所需的业务用户与业务数据库。
在这里定义的用户与数据库,会在以下两个步骤中完成应用,不仅仅包括数据库中的用户与DB,还有Pgbouncer连接池中的对应配置。
./pgsql.yml --tags=pg_biz_init,pg_biz_pgbouncer
pg_users
通常用于在数据库集群层面定义业务用户,与 pg_default_roles
采用相同的形式。
对象数组,每个对象定义一个业务用户。用户名name
字段为必选项,密码可以使用MD5密文密码
用户可以通过roles
字段为业务用户添加默认权限组:
dbrole_readonly
:默认生产只读用户,具有全局只读权限。(只读生产访问)
dbrole_offline
:默认离线只读用户,在特定实例上具有只读权限。(离线查询,个人账号,ETL)
dbrole_readwrite
:默认生产读写用户,具有全局CRUD权限。(常规生产使用)
dbrole_admin
:默认生产管理用户,具有执行DDL变更的权限。(管理员)
应当为生产账号配置 pgbouncer: true
,允许其通过连接池访问,普通用户不应当通过连接池访问数据库。
下面是一个创建业务账号的例子:
pg_users:
# complete example of user/role definition for production user
- name: dbuser_meta # example production user have read-write access
password: DBUser.Meta # example user's password, can be encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
connlimit: -1 # connection limit, -1 disable limit
expire_at: '2030-12-31' # 'timestamp' when this role is expired
expire_in: 365 # now + n days when this role is expired (OVERWRITE expire_at)
roles: [dbrole_readwrite] # dborole_admin|dbrole_readwrite|dbrole_readonly
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
parameters: # user's default search path
search_path: public
comment: test user
# simple example for personal user definition
- name: dbuser_vonng2 # personal user example which only have limited access to offline instance
password: DBUser.Vonng # or instance with explict mark `pg_offline_query = true`
roles: [dbrole_offline] # personal/stats/ETL user should be grant with dbrole_offline
expire_in: 365 # expire in 365 days since creation
pgbouncer: false # personal user should NOT be allowed to login with pgbouncer
comment: example personal user for interactive queries
pg_databases
对象数组,每个对象定义一个业务数据库。每个数据库定义中,数据库名称 name
为必选项,其余均为可选项。
name
:数据库名称,必选项。
owner
:数据库属主,默认为postgres
template
:数据库创建时使用的模板,默认为template1
encoding
:数据库默认字符编码,默认为UTF8
,默认与实例保持一致。建议不要配置与修改。
locale
:数据库默认的本地化规则,默认为C
,建议不要配置,与实例保持一致。
lc_collate
:数据库默认的本地化字符串排序规则,默认与实例设置相同,建议不要修改,必须与模板数据库一致。强烈建议不要配置,或配置为C
。
lc_ctype
:数据库默认的LOCALE,默认与实例设置相同,建议不要修改或设置,必须与模板数据库一致。建议配置为C或en_US.UTF8
。
allowconn
:是否允许连接至数据库,默认为true
,不建议修改。
revokeconn
:是否回收连接至数据库的权限?默认为false
。如果为true
,则数据库上的PUBLIC CONNECT
权限会被回收。只有默认用户(dbsu|monitor|admin|replicator|owner
)可以连接。此外,admin|owner
会拥有GRANT OPTION,可以赋予其他用户连接权限。
tablespace
:数据库关联的表空间,默认为pg_default
。
connlimit
:数据库连接数限制,默认为-1
,即没有限制。
extensions
:对象数组 ,每一个对象定义了一个数据库中的扩展,以及其安装的模式。
parameters
:KV对象,每一个KV定义了一个需要针对数据库通过ALTER DATABASE
修改的参数。
pgbouncer
:布尔选项,是否将该数据库加入到Pgbouncer中。所有数据库都会加入至Pgbouncer,除非显式指定pgbouncer: false
。
comment
:数据库备注信息。
pg_databases:
- name: meta # name is the only required field for a database
owner: postgres # optional, database owner
template: template1 # optional, template1 by default
encoding: UTF8 # optional, UTF8 by default
locale: C # optional, C by default
lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default
lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db default
allowconn: true # optional, true by default, false disable connect at all
revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
tablespace: pg_default # optional, 'pg_default' is the default tablespace
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
extensions: # optional, extension name and where to create
- {name: postgis, schema: public}
parameters: # optional, extra parameters with ALTER DATABASE
enable_partitionwise_join: true
pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
6.11 - 监控系统
Pigsty中与监控系统有关的参数
Pigsty的监控系统包含两个组件:Node Exporter , PG Exporter
Node Exporter用于暴露机器节点的监控指标,PG Exporter用于拉取数据库与Pgbouncer连接池的监控指标;此外,Haproxy将直接通过管理端口对外暴露监控指标。
默认情况下,所有监控Exporter都会被注册至Consul,Prometheus会通过服务发现的方式管理这些任务。但用户可以通过配置 prometheus_sd_method
为 static
改用静态服务发现,通过配置文件的方式管理所有Exporter。监控已有数据库实例时,建议采用这种方式。
Promtail用于收集Postgres,Patroni,Pgbouncer日志,目前处于beta
状态,是可选的额外安装组件。
参数概览
默认参数
#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
exporter_install: none # none|yum|binary, none by default
exporter_repo_url: '' # if set, repo will be added to /etc/yum.repos.d/ before yum installation
# - collect - #
exporter_metrics_path: /metrics # default metric path for pg related exporter
# - node exporter - #
node_exporter_enabled: true # setup node_exporter on instance
node_exporter_port: 9100 # default port for node exporter
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'
# - pg exporter - #
pg_exporter_config: pg_exporter-demo.yaml # default config files for pg_exporter
pg_exporter_enabled: true # setup pg_exporter on instance
pg_exporter_port: 9630 # default port for pg exporter
pg_exporter_url: '' # optional, if not set, generate from reference parameters
# - pgbouncer exporter - #
pgbouncer_exporter_enabled: true # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
pgbouncer_exporter_port: 9631 # default port for pgbouncer exporter
pgbouncer_exporter_url: '' # optional, if not set, generate from reference parameters
# - promtail - # # promtail is a beta feature which requires manual deployment
promtail_enabled: true # enable promtail logging collector?
promtail_clean: false # remove promtail status file? false by default
promtail_port: 9080 # default listen address for promtail
promtail_status_file: /tmp/promtail-status.yml
promtail_send_url: http://10.10.10.10:3100/loki/api/v1/push # loki url to receive logs
参数详解
exporter_install
指明安装Exporter的方式:
none
:不安装,(默认行为,Exporter已经在先前由 node.pkgs
任务完成安装)
yum
:使用yum安装(如果启用yum安装,在部署Exporter前执行yum安装 node_exporter
与 pg_exporter
)
binary
:使用拷贝二进制的方式安装(从files
中直接拷贝node_exporter
与 pg_exporter
二进制)
使用yum
安装时,如果指定了exporter_repo_url
(不为空),在执行安装时会首先将该URL下的REPO文件安装至/etc/yum.repos.d
中。这一功能可以在不执行节点基础设施初始化的环境下直接进行Exporter的安装。
使用binary
安装时,用户需要确保已经将 node_exporter
与 pg_exporter
的Linux二进制程序放置在files
目录中。
<meta>:<pigsty>/files/node_exporter -> <target>:/usr/bin/node_exporter
<meta>:<pigsty>/files/pg_exporter -> <target>:/usr/bin/pg_exporter
exporter_binary_install(弃用)
该参数已被expoter_install
参数覆盖
是否采用复制二进制文件的方式安装Node Exporter与PG Exporter,默认为false
该选项主要用于集成外部供给方案时,减少对原有系统的工作假设。启用该选项将直接将Linux二进制文件复制至目标机器。
<meta>:<pigsty>/files/node_exporter -> <target>:/usr/bin/node_exporter
<meta>:<pigsty>/files/pg_exporter -> <target>:/usr/bin/pg_exporter
用户需要通过files/download-exporter.sh
从Github下载Linux二进制程序至files
目录,方可启用该选项。
exporter_metrics_path
所有Exporter对外暴露指标的URL PATH,默认为/metrics
该变量被外部角色prometheus
引用,Prometheus会根据这里的配置,针对job = pg
的监控对象应用此配置。
node_exporter_enabled
是否安装并配置node_exporter
,默认为true
node_exporter_port
node_exporter
监听的端口
默认端口9100
node_exporter_options
node_exporter
使用的额外命令行选项。
该选项主要用于定制 node_exporter
启用的指标收集器,Node Exporter支持的收集器列表可以参考:Node Exporter Collectors
该选项的默认值为:
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'
pg_exporter_config
pg_exporter
使用的默认配置文件,定义了Pigsty中的指标。
Pigsty默认提供了两个配置文件:
如果用户采用了不同的Prometheus架构,建议对pg_exporter
的配置文件进行检查与调整。
Pigsty使用的PG Exporter配置文件默认从PostgreSQL 10.0 开始提供支持,目前支持至最新的PG 13版本
pg_exporter_enabled
是否安装并配置pg_exporter
,默认为true
pg_exporter_url
PG Exporter用于连接至数据库的PGURL
可选参数,默认为空字符串。
Pigsty默认使用以下规则生成监控的目标URL,如果配置了pg_exporter_url
选项,则会直接使用该URL作为连接串。
PG_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pg_port }}/{{ pg_default_database }}?host={{ pg_localhost }}&sslmode=disable'
该选项以环境变量的方式配置于 /etc/default/pg_exporter
中。
pgbouncer_exporter_enabled
是否安装并配置pgbouncer_exporter
,默认为true
pg_exporter_port
pg_exporter
监听的端口
默认端口9630
pgbouncer_exporter_port
pgbouncer_exporter
监听的端口
默认端口9631
pgbouncer_exporter_url
PGBouncer Exporter用于连接至数据库的URL
可选参数,默认为空字符串。
Pigsty默认使用以下规则生成监控的目标URL,如果配置了pgbouncer_exporter_url
选项,则会直接使用该URL作为连接串。
PG_EXPORTER_URL='postgres://{{ pg_monitor_username }}:{{ pg_monitor_password }}@:{{ pgbouncer_port }}/pgbouncer?host={{ pg_localhost }}&sslmode=disable'
该选项以环境变量的方式配置于 /etc/default/pgbouncer_exporter
中。
promtail_enabled
布尔类型,全局|集群变量,是否启用Promtail日志收集服务?默认启用。
但需要注意Loki与Promtail目前属于额外选装模块,不会在pgsql.yml
的Monitor部分安装,目前只会在pgsql-promtail.yml
剧本中使用。
promtail_clean
布尔类型,命令行参数。
是否在安装promtail时移除已有状态信息?状态文件记录在promtail_status_file
中,记录了所有日志的消费偏移量,默认不会清理。
promtail_port
promtail使用的默认端口,默认为9080
promtail_status_file
字符串类型,集群|全局变量,内容为保存Promtail状态信息的文件位置,默认为 /tmp/promtail-status.yml
。
promtail_send_url
HTTP URL,用于接收日志的loki服务endpoint
6.12 - 服务供给
Pigsty中关于流量代理与负载均衡相关的参数
参数概览
默认参数
#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100 # default load balance weight (instance level)
# - service - #
pg_services: # how to expose postgres service in cluster?
# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary # service name {{ pg_cluster }}_primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}_replica
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
pg_services_extra: [] # extra services to be added
# - haproxy - #
haproxy_enabled: true # enable haproxy among every cluster members
haproxy_reload: true # reload haproxy after config
haproxy_admin_auth_enabled: false # enable authentication for haproxy admin?
haproxy_admin_username: admin # default haproxy admin username
haproxy_admin_password: admin # default haproxy admin password
haproxy_exporter_port: 9101 # default admin/exporter port
haproxy_client_timeout: 3h # client side connection timeout
haproxy_server_timeout: 3h # server side connection timeout
# - vip - #
vip_mode: none # none | l2 | l4
vip_reload: true · # whether reload service after config
# vip_address: 127.0.0.1 # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24 # virtual ip address cidr mask (l2 only)
# vip_interface: eth0 # virtual ip network interface (l2 only)
参数详解
pg_weight
当执行负载均衡时,数据库实例的相对权重。默认为100
pg_services
由服务定义对象构成的数组,定义了每一个数据库集群中对外暴露的服务。
每一个集群都可以定义多个服务,每个服务包含任意数量的集群成员,服务通过端口进行区分。
每一个服务的定义结构如下例所示:
- name: default # service's actual name is {{ pg_cluster }}-{{ service.name }}
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
必选项目
-
名称(service.name
):
服务名称,服务的完整名称以数据库集群名为前缀,以service.name
为后缀,通过-
连接。例如在pg-test
集群中name=primary
的服务,其完整服务名称为pg-test-primary
。
-
端口(service.port
):
在Pigsty中,服务默认采用NodePort的形式对外暴露,因此暴露端口为必选项。但如果使用外部负载均衡服务接入方案,您也可以通过其他的方式区分服务。
-
选择器(service.selector
):
选择器指定了服务的实例成员,采用JMESPath的形式,从所有集群实例成员中筛选变量。默认的[]
选择器会选取所有的集群成员。
可选项目
-
备份选择器(service.selector
):
可选的 备份选择器service.selector_backup
会选择或标记用于服务备份的实例列表,即集群中所有其他成员失效时,备份实例才接管服务。例如可以将primary
实例加入replica
服务的备选集中,当所有从库失效后主库依然可以承载集群的只读流量。
-
源端IP(service.src_ip
) :
表示服务对外使用的IP地址,默认为*
,即本机所有IP地址。使用vip
则会使用vip_address
变量取值,或者也可以填入网卡支持的特定IP地址。
-
宿端口(service.dst_port
):
服务的流量将指向目标实例上的哪个端口?postgres
会指向数据库监听的端口,pgbouncer
会指向连接池所监听的端口,也可以填入固定的端口号。
-
健康检查方式(service.check_method
):
服务如何检查实例的健康状态?目前仅支持HTTP
-
健康检查端口(service.check_port
):
服务检查实例的哪个端口获取实例的健康状态? patroni
会从Patroni(默认8008)获取,pg_exporter
会从PG Exporter(默认9630)获取,用户也可以填入自定义的端口号。
-
健康检查路径(service.check_url
):
服务执行HTTP检查时,使用的URL PATH。默认会使用/
作为健康检查,PG Exporter与Patroni提供了多样的健康检查方式,可以用于主从流量区分。例如,/primary
仅会对主库返回成功,/replica
仅会对从库返回成功。/read-only
则会对任何支持只读的实例(包括主库)返回成功。
-
健康检查代码(service.check_code
):
HTTP健康检查所期待的代码,默认为200
-
Haproxy特定配置(service.haproxy
) :
关于服务供应软件(HAproxy)的专有配置项
由服务定义对象构成的数组,在集群层面定义,追加至全局的服务定义中。
如果用户希望为某一个数据库集群创建特殊的服务,例如单独为某一套带有延迟从库的集群创建特殊的服务,则可以使用本配置项。
haproxy_enabled
是否启用Haproxy组件
Pigsty默认会在所有数据库节点上部署Haproxy,您可以通过覆盖实例级变量,仅在特定实例/节点上启用Haproxy负载均衡器。
haproxy_admin_auth_enabled
是否启用为Haproxy管理界面启用基本认证
默认不启用,建议在生产环境启用,或在Nginx或其他接入层添加访问控制。
haproxy_admin_username
启用Haproxy管理界面认证默认用户名,默认为admin
haproxy_admin_password
启用Haproxy管理界面认证默认密码,默认为admin
haproxy_client_timeout
Haproxy客户端连接超时,默认为3小时
haproxy_server_timeout
Haproxy服务端连接超时,默认为3小时
haproxy_exporter_port
Haproxy管理界面与监控指标暴露端点所监听的端口。
默认端口为9101
vip_mode
VIP的模式,枚举类型,可选值包括:
- none:不设置VIP
- l2:配置绑定在主库上的二层VIP(需要所有成员位于同一个二层网络广播域中)
- l4 :通过外部L4负载均衡器进行流量分发。(未纳入Pigsty当前实现中)
VIP用于确保读写服务与负载均衡器的高可用,当使用L2 VIP时,Pigsty的VIP由vip-manager
托管,会绑定在集群主库上。
这意味着您始终可以通过VIP访问集群主库,或者通过VIP访问主库上的负载均衡器(如果主库的压力很大,这样做可能会有性能压力)。
注意,您必须保证VIP候选实例处于同一个二层网络(VLAN、交换机)下。
vip_address
VIP地址,可用于L2或L4 VIP。
vip_address
没有默认值,用户必须为每一个集群显式指定并分配VIP地址
vip_cidrmask
VIP的CIDR网络长度,仅当使用L2 VIP时需要。
vip_cidrmask
没有默认值,用户必须为每一个集群显式指定VIP的网络CIDR。
vip_interface
VIP网卡名称,仅当使用L2 VIP时需要。
默认为eth0
,用户必须为每一个集群/实例指明VIP使用的网卡名称。
过时参数
这些参数现在定义于服务中,不再使用。
haproxy_policy
haproxy负载均衡所使用的算法,可选策略为roundrobin
与leastconn
默认为roundrobin
haproxy_check_port
Haproxy对后端PostgreSQL进程执行健康检查的端口。
默认端口为8008
,即Patroni的端口。
其他的选项包括9630
,即使用pg_exporter
作为健康检查的端口。
haproxy_primary_port
Haproxy中集群读写服务默认端口,所有链接至该端口的客户端链接都会被转发至主实例的对应端口。
默认读写服务的端口为5433
haproxy_replica_port
Haproxy中集群只读服务默认端口,所有链接至该端口的客户端链接都会被转发至主从例的对应端口。
默认读写服务的端口为5434
haproxy_backend_port
Haproxy将客户端连接转发至后端的对应端口,可选:5432/6432
默认为6432
,即Haproxy会将流量转发至6432连接池端口,修改为5432
表示直接将流量转发至数据库。
haproxy_weight
Haproxy进行负载均衡时的标准权重,默认为100,建议在实例层次进行覆盖。
haproxy_weight_fallback
用于控制主库承载只读流量的权重。
如果haproxy_weight_fallback
为0,主库不会承担任何只读流量(发送至haproxy_replica_port
)。
如果haproxy_weight_fallback
为1(或更高的值时),在集群正常工作时,主库会在从库服务集中承担 1/总权重 的微小流量,而当从库集中所有的只读实例故障时,只读流量可以漂移至主库承载。
该配置对于一主一从的情况非常实用,如果您有多台从库,建议将其配置为0。
7 - 任务
高可用演练,数据库试用,一些可以在Pigsty中探索的任务
在配置完Pigsty后,您可以用它做一些有趣的探索与实验。
7.1 - 基于逻辑复制的数据库迁移
本文将基于Pigsty沙箱环境,用实例演示基于PostgreSQL逻辑复制的数据库迁移。
本文基于Pigsty沙箱中的实例,介绍基于逻辑复制进行主从切换与数据库迁移的原理,细节与注意事项。
逻辑复制相关基础知识可参考 Postgres逻辑复制详解 一文。
0 逻辑复制迁移
逻辑复制通常可用于跨大版本跨操作系统在线升级PostgreSQL,例如从PG 10到PG 13,从Windows到Linux。
0.1 逻辑迁移的优点
相比原地pg_upgrade
升级与pg_dump
升级,逻辑复制的好处有:
- 在线:迁移可以在线进行,不需要或者只需要极小的停机窗口。
- 灵活:目标库的结构可以与源库不同,例如普通表改为分区表,加列等。可以跨越大版本使用。
- 安全:相比物理复制,目标库是可写的,因此在最终切换前,可以随意进行测试并重建。
- 快速:停机窗口很短,可以控制在秒级到分钟级。
0.2 逻辑迁移的局限性
逻辑复制的局限性主要在于设置相对繁琐,初始时刻拷贝数据较物理复制更慢,对于单实例多DB的情况需要迁移多次。大对象与序列号需要在迁移时手动同步。
总体来说都,属于可以解决或可以容忍的问题。
0.3 逻辑迁移的基本流程
整体上讲,基于逻辑复制的迁移遵循以下步骤:
其中准备工作与存量迁移部分耗时较长,但不需要停机,不会对生产业务产生影响。
切换时刻需要短暂的停机窗口,采用自动化的脚本可以将停机时间控制在秒级到分钟级。
下面将基于Pigsty沙箱介绍这些步骤涉及到的具体细节。
1 准备工作
1.1 准备源宿集群
在进行迁移之前,首先要确定迁移的源端集群与目标集群配置正确。
Pigsty标准沙箱由四个节点与两套数据库集群构成。
两套数据库集群pg-meta
与pg-test
将分别作为逻辑复制的源端(SRC)与宿端(DST)。
本例将pg-meta-1
作为发布者,pg-test-1
作为订阅者,将pgbench
相关表从pg-meta
迁移至pg-test
。
1.1.1 用户
迁移通常需要在原宿两端拥有两个用户,分别用于管理与复制。
CREATE USER dbuser_admin SUPERUSER; -- 超级用户用于创建发布与订阅
CREATE USER replicator REPLICATION BYPASSRLS; -- 复制用户用于订阅变更
1.1.2 HBA规则
同时,还需要配置相应的HBA规则,允许复制用户在原宿集群间相互访问。
此外,迁移通常会从中控机发起,应当允许管理用户从中控机访问原/宿集群。
因为创建订阅需要超级用户权限,建议为管理用户(永久或临时)配置SUPERUSER
权限。
1.1.3 配置项
必选的配置项是wal_level
,您必须在源端将wal_level
配置为logical
,方能启用逻辑复制。
其他一些关于复制的相关参数也需要合理配置,但除了wal_level
外的参数默认值都不会影响逻辑复制正常工作,均为可选。
推荐在源端与宿端使用相同的配置项,下面是在64核机器上,一些相关配置的参考值:
wal_level: logical # MANDATORY!
max_worker_processes: 64 # default 8 -> 64, set to CPU CORE 64
max_parallel_workers: 32 # default 8 -> 32, limit by max_worker_processes
max_parallel_maintenance_workers: 16 # default 2 -> 16, limit by parallel worker
max_parallel_workers_per_gather: 0 # default 2 -> 0, disable parallel query on OLTP instance
# max_parallel_workers_per_gather: 16 # default 2 -> 16, enable parallel query on OLAP instance
max_wal_senders: 24 # 10 -> 24
max_replication_slots: 16 # 10 -> 16
max_logical_replication_workers: 8 # 4 -> 8, 6 sync worker + 1~2 apply worker
max_sync_workers_per_subscription: 6 # 2 -> 6, 6 sync worker
对于数据库来说,通常还需要关注数据库的 编码(encoding)与 本地化 (locale)配置项是否正确,通常建议统一使用C.UTF8
。
1.1.4 连接信息
为了执行管理命令,您需要通过连接串访问原/宿集群的主库。
建议不要在连接串中使用明文密码,密码可以通过~/.pgpass
,~/.pg_service
,环境变量等方式管理,下面使用时将不会列出密码。
PGSRC='postgres://dbuser_admin@10.10.10.10/meta' # 源端发布者 (SU)
PGDST='postgres://dbuser_admin@10.10.10.11/test' # 宿端订阅者 (SU)
建议在中控机/元节点上执行迁移命令,并在操作过程中保持上面两个变量生效。
1.2 确定迁移对象
相比于物理复制,逻辑复制允许用户对复制的内容与过程施加更为精细的控制。您可以选择数据库内容的一个子集进行复制。不过在这个例子中,我们将进行整库复制。
在本例中,我们采用pgbench
提供的用例作为迁移标的。因此可以在源端集群使用pgbench
初始化相关表项。
此外,考虑到测试的覆盖范围,我们还将创建一张额外的测试数据表(用于测试Sequence的迁移)
psql ${PGSRC} -qAXtw <<-EOF
DROP TABLE IF EXISTS pgbench_extras;
CREATE TABLE IF NOT EXISTS pgbench_extras
(id BIGSERIAL PRIMARY KEY,v TIMESTAMP NOT NULL UNIQUE);
EOF
要注意,只有 基本表 (包括分区表)可以参与逻辑复制,其他类型的对象,包括 视图,物化视图,外部表,索引,序列号都无法加入到逻辑复制中。使用以下查询,可以列出当前数据库中可以加入逻辑复制的表的完全限定名。
SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')
在准备阶段,您需要筛选出希望进行复制的表。在存量迁移中将这些表的结构定义同步至宿集群中,并建立在这些表上的逻辑复制。
1.3 修复复制标识
并不是所有的表都可以直接纳入逻辑复制中并正常工作。在进行迁移前,您需要对所有待迁移的表进行检查,确认它们都已经正确配置了复制标识。
复制身份模式\表上的约束 |
主键(p) |
非空唯一索引(u) |
两者皆无(n) |
default |
有效 |
x |
x |
index |
x |
有效 |
x |
full |
低效 |
低效 |
低效 |
nothing |
x |
x |
x |
-
如果表上有主键,则会默认使用 REPLICA IDENTITY default
,这是最好的,不用进行任何修改。
-
如果表上没有主键,有条件的话请创建一个,没有条件的话,一个建立在非空列集上的唯一索引也可以起到同样的作用。在这种情况下需要显式的为表配置REPLICA IDENTITY USING <tbl_unique_key_idx_name>
。
-
如果表上既没有主键,也没有唯一索引,那么您可以为表配置REPLICA IDENTITY FULL
,将完整的一行作为复制标识。
使用FULL
身份标识的性能非常差,发布侧和订阅侧的删改操作都会导致顺序扫表,建议只将其作为保底手段使用。
另一种选择是为表配置REPLICA IDENTITY NOTHING
,这样任何在发布端对此表进行UPDATE|DELETE
操作都会直接报错中止。
使用以下查询,可以列出所有表的完全限定名,复制标识配置,以及表上是否有主键或唯一索引,
SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name, con.ri AS keys,
CASE relreplident WHEN 'd' THEN 'default' WHEN 'n' THEN 'nothing' WHEN 'f' THEN 'full' WHEN 'i' THEN 'index' END AS identity
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid, LATERAL (SELECT array_agg(contype) AS ri FROM pg_constraint WHERE conrelid = c.oid) con
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')
ORDER BY 2,3;
以1.2的测试场景为例:
name | keys | identity
-------------------------+-------+----------
public.spatial_ref_sys | {c,p} | default
public.pgbench_accounts | {p} | default
public.pgbench_branches | {p} | default
public.pgbench_tellers | {p} | default
public.pgbench_extras | {p,u} | default
public.pgbench_history | NULL | default
如果表上只有唯一索引,例如您需要检查该唯一索引是否满足要求:所有列都为非空,not deferrable
,not partial
,如果满足,则可以使用以下命令将表的复制身份修改为index
模式。
-- 一个例子:即使pgbench_extras上有主键,但也可以使用唯一索引作为身份标识
ALTER TABLE pgbench_extras REPLICA IDENTITY USING INDEX pgbench_extras_v_key;
如果表上没有主键,也没有唯一约束。如上面的pgbench_history
表,那就需要通过以下命令将其复制身份设置为FULL|NOTHING
。
ALTER TABLE pgbench_history REPLICA IDENTITY FULL;
完成修复后,所有表都应当具有合适的复制身份:
name | keys | identity
-------------------------+-------+----------
public.spatial_ref_sys | {c,p} | default
public.pgbench_accounts | {p} | default
public.pgbench_branches | {p} | default
public.pgbench_tellers | {p} | default
public.pgbench_extras | {p,u} | index
public.pgbench_history | NULL | full
2 存量迁移
2.1 同步数据库模式
2.1.1 转储
使用以下命令转储所有对象定义,并复制到宿端应用。
pg_dump ${PGSRC} --schema-only -n public | psql ${PGDST}
可以通过pg_dump
的-n
,-t
参数进行灵活控制,只转储所需的对象。例如,如果只需要public
模式下pgbench
的相关表,则可以通过以下命令转储:
pg_dump ${PGSRC} --schema-only -n public -t 'pgbench_*' | psql ${PGDST}
2.1.2 校验
同步完成后,通常需要进行模式校验。
- 所有目标表及其索引、序列号是否已经建立
- 函数、类型、模式、用户、权限是否均符合预期?
数据库模式需要根据用户自己的需求进行同步与校验,没有什么通用的方式。
2.2 在源端创建发布
源端集群主库作为发布者,需要创建发布,将所需的表加入到发布集中。
2.2.1 创建发布的方式
创建发布的语法如下所示:
CREATE PUBLICATION name
[ FOR TABLE [ ONLY ] table_name [ * ] [, ...]
| FOR ALL TABLES ]
[ WITH ( publication_parameter [= value] [, ... ] ) ]
针对所有表创建发布(需要超级用户权限):
CREATE PUBLICATION "pg_meta_pub" FOR ALL TABLES;
注意无论是发布还是订阅,名称都建议遵循PostgreSQL对象标识符命名规则([a-z][0-9a-z_]+
),特别是不要在名称中使用-
。以免不必要的麻烦,例如创建订阅同名的复制槽因命名不规范而失败。
如果需要控制订阅的事件类型(不常见),可以通过参数publish
指定,默认为insert, update, delete, truncate
。
如果源端上有分区表,有一个参数可以用于控制其复制行为。把分区表当成一张表(使用分区根表的复制标识),还是当成多张子表(使用子表上的复制标识)来处理。启用这个选项可以把分区表在逻辑上看成一张表(分区根表),而不是一系列的分区子表,所以订阅端只需要存在一张分区根表的同名表即可正常复制,这是13版本引入的新选项。该选项默认为false
,也就是说逻辑复制分区表时,源端的每一个分区都必须在订阅端存在。
额外的参数可以通过以下的形式传入:
CREATE PUBLICATION "pg_meta_pub" FOR ALL TABLES
WITH(publish = 'insert', publish_via_partition_root = true);
2.2.2 发布的内容
如果不希望发布所有的表,则可以在发布中具体指定所需的表名称。
例如在这个例子中spatial_ref_sys
是一张postgis
扩展使用的常量表,并不需要迁移,我们可以将其排除。利用以下SQL,可以直接在数据库中拼接出创建发布的SQL命令:
SELECT E'CREATE PUBLICATION pg_meta_pub FOR TABLE\n' ||
string_agg(quote_ident(nspname) || '.' || quote_ident(relname), E',\n') || ';' AS sql
FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')
AND relname ~ 'pgbench'; -- 只复制表名形如 pgbench* 的表
\gexec -- 在psql中执行上面命令生成的SQL语句
在这个例子中,实际生成并执行的命令如下:
psql ${PGSRC} -Xtw <<-EOF
CREATE PUBLICATION pg_meta_pub FOR TABLE
public.pgbench_accounts,
public.pgbench_branches,
public.pgbench_tellers,
public.pgbench_history,
public.pgbench_extras;
EOF
2.2.3 确认发布状态
建立完发布后,可以从 pg_publication
视图看到所创建的发布。
$ psql ${PGSRC} -Xxwc 'table pg_publication;'
-[ RECORD 1 ]+------------
oid | 24679
pubname | pg_meta_pub
pubowner | 10
puballtables | f
pubinsert | t
pubupdate | t
pubdelete | t
pubtruncate | t
pubviaroot | f
可以从pg_publication_tables
确认纳入到发布中的表有哪些。
$ psql ${PGSRC} -Xwc 'table pg_publication_tables;'
pubname | schemaname | tablename
-------------+------------+------------------
pg_meta_pub | public | pgbench_history
pg_meta_pub | public | pgbench_tellers
pg_meta_pub | public | pgbench_accounts
pg_meta_pub | public | pgbench_branches
pg_meta_pub | public | pgbench_extras
确认无误后,发布端的工作完成。接下来要在宿端集群主库上创建订阅,订阅源端集群主库上的这个发布。
2.3 在宿端创建订阅
宿端集群主库作为订阅者,需要创建订阅,从发布者上订阅所需的变更。
2.3.1 创建订阅
创建订阅需要SUPERUSER
权限,创建订阅的语法如下所示:
CREATE SUBSCRIPTION subscription_name
CONNECTION 'conninfo'
PUBLICATION publication_name [, ...]
[ WITH ( subscription_parameter [= value] [, ... ] ) ]
创建订阅必须使用CONNECTION
子句指定发布者的连接信息,通过PUBLICATION
子句指定发布名称。这里使用replicator
用户连接发布者,该用户的密码已经写入宿端实例下~/.pgpass
,因此这里可以在连接串中省去。
创建订阅还有一些其他的参数,通常只有手动管理复制槽时才需要修改这些参数:
copy_data
,默认为true
,当复制开始时,是否要复制全量数据。
create_slot
,默认为true
,该订阅是否会在发布实例上创建复制槽。
enabled
,默认为true
,是否立即开始订阅。
connect
,默认为true
,是否连接至订阅实例,如果不连接,上面几个选项都会被重置为false
。
这里,创建订阅的实际命令为:
psql ${PGDST} -Xtw <<-EOF
CREATE SUBSCRIPTION "pg_test_sub"
CONNECTION 'host=10.10.10.10 user=replicator dbname=meta'
PUBLICATION "pg_meta_pub";
EOF
2.3.2 订阅状态确认
成功创建订阅后,可以从 pg_subscription
视图看到所创建的发布。
$ psql ${PGDST} -Xxwc 'TABLE pg_subscription;'
-[ RECORD 1 ]---+---------------------------------------------
oid | 20759
subdbid | 19351
subname | pg_test_sub
subowner | 16390
subenabled | t
subconninfo | host=10.10.10.10 user=replicator dbname=meta
subslotname | pg_test_sub
subsynccommit | off
subpublications | {pg_meta_pub}
可以从pg_subscription_rel
中确认哪些表被纳入到订阅的范围,及其复制状态。
$ psql ${PGDST} -Xwc 'table pg_subscription_rel;'
srsubid | srrelid | srsubstate | srsublsn
---------+---------+------------+------------
20759 | 20742 | r | 0/B0BC1FB8
20759 | 20734 | r | 0/B0BC20B0
20759 | 20737 | r | 0/B0BC20B0
20759 | 20745 | r | 0/B0BC20B0
20759 | 20731 | r | 0/B0BC20B0
2.4 等待逻辑复制同步
创建订阅后,首先必须监控 发布端与订阅端两侧的数据库日志,确保没有错误产生。
2.4.1 逻辑复制状态机
如果一切正常,逻辑复制会自动开始,针对每张订阅中的表执行复制状态机逻辑,如下图所示。
当所有的表都完成复制,进入r
(ready)状态时,逻辑复制的存量同步阶段便完成了,发布端与订阅端整体进入同步状态。
stateDiagram-v2
[*] --> init : 表被加入到订阅集中
init --> data : 开始同步表的初始快照
data --> sync : 存量数据同步完成
sync --> ready : 同步期间的增量变更应用完毕,进入就绪状态
当创建或刷新订阅时,表会被加入到 订阅集 中,每一张订阅集中的表都会在pg_subscription_rel
视图中有一条对应纪录,展示这张表当前的复制状态。刚加入订阅集的表初始状态为i
,即initialize
,初始状态。
如果订阅的copy_data
选项为真(默认情况),且工作进程池中有空闲的Worker,PostgreSQL会为这张表分配一个同步工作进程,同步这张表上的存量数据,此时表的状态进入d
,即拷贝数据中。对表做数据同步类似于对数据库集群进行basebackup
,Sync Worker会在发布端创建临时的复制槽,获取表上的快照并通过COPY完成基础数据同步。
当表上的基础数据拷贝完成后,表会进入sync
模式,即数据同步,同步进程会追赶同步过程中发生的增量变更。当追赶完成时,同步进程会将这张表标记为r
(ready)状态,转交逻辑复制主Apply进程管理变更,表示这张表已经处于正常复制中。
2.4.2 同步进度跟踪
数据同步(d
)阶段可能需要花费一些时间,取决于网卡,网络,磁盘,表的大小与分布,逻辑复制的同步worker数量等因素。
作为参考,1TB的数据库,20张表,包含有250GB的大表,双万兆网卡,在6个数据同步worker的负责下大约需要6~8小时完成复制。
在数据同步过程中,每个表同步任务都会源端库上创建临时的复制槽。请确保逻辑复制初始同步期间不要给源端主库施加过大的不必要写入压力,以免WAL撑爆磁盘。
发布侧的 pg_stat_replication
,pg_replication_slots
,订阅端的pg_stat_subscription
,pg_subscription_rel
提供了逻辑复制状态的相关信息,需要关注。
psql ${PGDST} -Xxw <<-'EOF'
SELECT subname, json_object_agg(srsubstate, cnt) FROM
pg_subscription s JOIN
(SELECT srsubid, srsubstate, count(*) AS cnt FROM pg_subscription_rel
GROUP BY srsubid, srsubstate) sr
ON s.oid = sr.srsubid GROUP BY subname;
EOF
可以使用以下SQL确认订阅中表的状态,如果所有表的状态都显示为r
,则表示逻辑复制已经成功建立,订阅端可以用于切换。
subname | json_object_agg
-------------+-----------------
pg_test_sub | { "r" : 5 }
当然,最好的方式始终是通过监控系统来跟踪复制状态。
3 切换时刻
3.1 准备工作
一个良好的工程实践是,在搞事情之前,在源端宿端都执行几次存盘操作,避免后续操作因被内存刷盘拖慢。
也可以执行分析命令更新统计信息,便于后续快速对比校验数据完整性。
psql ${PGSRC} -Xxwc 'CHECKPOINT;ANALYZE;CHECKPOINT;'
psql ${PGSRC} -Xxwc 'CHECKPOINT;ANALYZE;CHECKPOINT;'
在此之后的操作,都处于服务不可用状态,因此尽可能快地进行。通常情况下在分钟级内完成较为合适。
3.2 停止源端写入流量
3.2.1 选择合适的停止方式
暂停源端写入有多种方式,请根据实际业务场景选择与组合:
- 告知业务方停止流量
- 停止解析源端主库域名
- 停止或暂停负载均衡器(Haproxy | VIP)的流量转发
- 停止或暂停连接池Pgbouncer
- 停止或暂停Postgres实例
- 修改数据库主库的参数,设置默认事务模式为只读。
- 修改数据库主库的HBA规则,拒绝业务访问。
通常建议使用修改HBA,修改连接池,修改负载均衡器的方式停止主库的写入流量。
请注意,无论使用何种方式,建议保持PostgreSQL存活,并且管理用户和复制用户仍然可以连接到源端主库。
3.2.2 确认源端写入流量停止
当源端主库停止接受写入后,首先执行确认逻辑,通过观察pg_stat_replication
,确认逻辑订阅者已经与发布者保持同步。
psql ${PGSRC} -Xxw <<-EOF
SELECT application_name AS name,
pg_current_wal_lsn() AS lsn,
pg_current_wal_lsn() - replay_lsn AS lag
FROM pg_stat_replication;
EOF
-[ RECORD 1 ]-----
name | pg_test_sub
lsn | 0/B0C24918
lag | 0
重复执行上述命令,如果lsn
字段保持不变,lag
始终为0
,就说明主库的写入流量已经正确停止,且逻辑从库上已经没有复制延迟,可以用于切换。
3.2.3 建立反向逻辑复制(可选)
如果要求迁移失败后业务可以随时回滚,可以在停止源端写入流量后,设置反向的逻辑复制,将后续订阅端(新主库)的变更反向同步至原来的发布端(旧主库)。不过此过程需要重新同步数据,耗时太久。通常情况下,只有在数据非常重要,且数据量不大或停机窗口足够长的情况下才适用于此方法。
首先停止宿端现有的逻辑订阅。必须停止现有逻辑复制才能继续后面的步骤,否则会形成循环复制。
停止源端写入流量后,继续维持逻辑复制没有意义,因此可以停止宿端的订阅。但建议保留该订阅,只是禁用它,以备迁移失败回滚。
psql ${PGDST} -qAXtwc 'ALTER SUBSCRIPTION pg_test_sub DISABLE;'
然后依照上述流程重新建立 反向的逻辑复制,这里只给出命令:
# 在宿端创建发布:pg_test_pub
psql ${PGDST} -Xtw <<-EOF
CREATE PUBLICATION pg_test_pub FOR TABLE
public.pgbench_accounts,
public.pgbench_branches,
public.pgbench_tellers,
public.pgbench_history,
public.pgbench_extras;
TABLE pg_publication;
EOF
# 在源端创建订阅
psql ${PGSRC} -Xtw <<-EOF
CREATE SUBSCRIPTION "pg_meta_sub"
CONNECTION 'host=10.10.10.11 user=replicator dbname=test'
PUBLICATION "pg_test_pub";
TABLE pg_subscription;
EOF
# 清空源端所有相关表(危险),等待/或者不等待同步完成
psql ${PGSRC} -Xtw <<-EOF
TRUNCATE TABLE
public.pgbench_accounts,
public.pgbench_branches,
public.pgbench_tellers,
public.pgbench_history,
public.pgbench_extras;
TABLE pg_publication;
EOF
3.3 同步序列号与其他对象
逻辑复制不复制序列号(Sequence),因此基于逻辑复制做Failover时,必须在切换前手工同步序列号的值。
3.3.1 从源端同步序列号值
如果您的序列号都是从表上的SERIAL
列定义自动创建的,而且宿端库也单纯只从源端订阅,那么同步序列号比较简单。从订阅端找出所有需要同步的序列号:
PGSRC='postgres://dbuser_admin@10.10.10.10/meta' # 源端发布者 (SU)
PGDST='postgres://dbuser_admin@10.10.10.11/test' # 宿端订阅者 (SU)
-- 查询订阅端,生成的用于同步SEQUENCE的shell命令
psql ${PGDST} -qAXtw <<-'EOF'
SELECT 'pg_dump ${PGSRC} -a ' ||
string_agg('-t ' || quote_ident(schemaname) || '.' || quote_ident(sequencename), ' ') ||
' | grep setval | psql -qAXtw ${PGDST}'
FROM pg_sequences;
EOF
在本例中,只有pgbench_extras.id
上有一个对应的SEQUENCE pgbench_extras_id_seq
。这里生成的同步语句为
pg_dump ${PGSRC} -a -t public.pgbench_extras_id_seq | grep setval | psql -qAXtw ${PGDST}
比较复杂的情况,需要您手工生成这条命令,通过-t
依次指定需要转储的序列号。
3.3.2 基于业务数据设置序列号值
另一种管理序列号的方式是直接根据表中的数据设置序列号的值,而无需从源端同步。
例如,表pgbench_extras.id
的最大值为100
,那么将订阅端端pgbench_extras_id_seq
直接设置为一个足够大的值,例如100+10000 = 10100
,就可以保证迁移后使用该序列号分配的新id
不会与已有数据冲突。
采用这种方式,可以直接在故障切换前进行序列号的设置,减少迁移切换所需的停机时间。但这样可能会导致业务数据序列号分配出现空洞,对于一些边界条件与特殊的序列号使用场景需要特别小心。例如:序列号从未被使用过,序列号的增长步长为负数,采用函数发号器调用Sequence等。
直接设置序列号的命令如下所示:
psql ${PGDST} -qAXtw <<-'EOF'
SELECT pg_catalog.setval('public.pgbench_extras_id_seq', (SELECT max(id) + 1000 FROM pgbench_extras));
EOF
3.3.3 其他对象的同步
某些无法被逻辑复制处理的对象,也需要在这里一并进行同步。
例如:刷新物化视图,手工迁移大对象等。但这些功能很少有人会用到,所以在此不详细展开。
3.4 校验数据一致性
如果逻辑复制工作正常,通常不用校验数据,您可以在第二步中间执行多次对比校验以增强对逻辑复制的信心。
在停机窗口期间,建议只进行简单基本的数据校验,例如,比较表中的行数,主键的最大最小值是否一致。
以下函数用于执行这一校验
function compare_relation(){
local relname=$1
local identity=${2-'id'}
psql ${3-${PGSRC}} -AXtwc "SELECT count(*) AS cnt, max($identity) AS max, min($identity) AS min FROM ${relname};"
psql ${4-${PGDST}} -AXtwc "SELECT count(*) AS cnt, max($identity) AS max, min($identity) AS min FROM ${relname};"
}
compare_relation pgbench_accounts aid
compare_relation pgbench_branches bid
compare_relation pgbench_history tid
compare_relation pgbench_tellers tid
function compare_relation() {
local src_url=${1}
local dst_url=${2}
local relname=${3}
res1=$(psql "${src_url}" -AXtwc "SELECT count(*) AS cnt FROM ${relname};")
res2=$(psql "${dst_url}" -AXtwc "SELECT count(*) AS cnt FROM ${relname};")
if [[ "${res1}" == "${res2}" ]]; then
echo -e "[ok] ${relname}\t\t\t${res1}\t${res2}"
else
echo -e "[xx] ${relname}\t\t\t${res1}\t${res2}"
fi
}
function compare_all() {
local src_url=${1}
local dst_url=${2}
tables=$(psql ${src_url} -AXtwc "SELECT quote_ident(nspname) || '.' || quote_ident(relname) AS name FROM pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid WHERE relkind = 'r' AND nspname NOT IN ('pg_catalog', 'information_schema', 'monitor', 'repack', 'pg_toast')")
for tbl in $tables; do
result=$(compare_relation "${src_url}" "${dst_url}" ${tbl})
echo ${result}
done
}
compare_all ${PGSRC} ${PGDST}
同时,也可以过一遍3.3中同步的序列号,确认其配置是否相同。
psql ${PGSRC} -qwXtc "SELECT schemaname || '.' || sequencename AS name, last_value AS v FROM pg_sequences;"
psql ${PGDST} -qwXtc "SELECT schemaname || '.' || sequencename AS name, last_value AS v FROM pg_sequences;"
其他在3.3.3中手工同步的对象请按需自行校验。如果需要进行其他业务侧的校验,也在这里进行。但停机窗口时间宝贵,花费在这里的时间越长,服务不可用时间也越久。
校验完成后,就可以进行最终的流量切换了。
3.5 流量切换与善后
完成数据校验后就可以进行流量切换。
流量切换的方式取决于您所使用的访问方式,通常与3.2中停流量的方式对偶。例如:
- 修改应用端连接串,并应用生效
- 将源端主库域名解析至新主库
- 将负载均衡器(Haproxy | VIP)的流量转发至新主库
- 将原主库上Pgbouncer连接池的流量转发至新主库
通过监控系统或其他方式,确认写入流量已经正确应用订阅端的新主库后,基于逻辑复制的迁移就完成了。
不要忘记一些善后清理工作,停用并删除订阅端的订阅,删除发布端的发布。
同时,应当继续确保原主库拒绝新的写入,以免有未清理干净的流量因为配置失误错漏仍然向旧主库访问。
# 删除订阅侧的 订阅
psql ${PGDST} -qAXtw <<-'EOF'
ALTER SUBSCRIPTION pg_test_sub DISABLE;
DROP SUBSCRIPTION pg_test_sub;
EOF
# 删除发布侧的 发布
psql ${PGSRC} -qAXtw <<-'EOF'
DROP PUBLICATION pg_meta_sub;
EOF
至此,基于逻辑复制的完整迁移结束。
7.2 - 慢查询优化
使用Pigsty优化慢查询的一个例子
下面以Pigsty自带的沙箱环境为例,介绍一个使用Pigsty监控系统处理慢查询的过程。
慢查询:模拟
因为没有实际的业务系统,这里我们以一种简单快捷的方式模拟系统中的慢查询。即pgbench自带的tpc-c
。
在主库上执行以下命令
ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey ;
该命令会移除 pgbench_accounts 表上的主键,导致相关查询变慢,系统瞬间雪崩过载。
图1:单个从库实例的QPS从500下降至7,Query RT下降至300ms
图2:系统负载达到200%,触发机器负载过大,与查询响应时间过长的报警规则。
慢查询:定位
首先,使用PG Cluster面板定位慢查询所在的具体实例,这里以 pg-test-2为例
然后,使用PG Query面板定位具体的慢查询:编号为 -6041100154778468427
图3:从查询总览中发现异常慢查询
该查询表现出:
- 响应时间显著上升: 17us 升至 280ms
- QPS 显著下降: 从500下降到 7
- 花费在该查询上的时间占比显著增加
可以确定,就是这个查询变慢了!
接下来,利用PG Stat Statements面板或PG Query Detail,根据查询ID定位慢查询的具体语句。
图4:定位的查询是SELECT abalance FROM pgbench_accounts WHERE aid = $1
慢查询:猜想
接下来,我们需要推断慢查询产生的原因。
SELECT abalance FROM pgbench_accounts WHERE aid = $1
该查询以 aid
作为过滤条件查询 pgbench_accounts
表,如此简单的查询变慢,大概率是这张表上的索引出了问题。
用屁股想都知道是索引少了,因为就是我们自己删掉的嘛!
分析查询后提出猜想: 该查询变慢是pgbench_accounts
表上aid
列缺少索引
下一步,查阅 PG Table Detail 面板,检查 pgbench_accounts
表上的访问,来验证我们的猜想
图5: pgbench_accounts
表上的访问情况
通过观察,我们发现表上的索引扫描归零,与此同时顺序扫描却有相应增长。这印证了我们的猜想!
慢查询:解决
确定了问题根源后,我们将着手解决。
尝试在 pgbench_accounts
表上为 aid
列添加索引,看看能否解决这个问题。
加上索引后,神奇的事情发生了。
图6:可以看到,查询的响应时间与QPS已经恢复正常。
图7:系统的负载也恢复正常
慢查询:样例
通过这篇教程,您已经掌握了慢查询优化的一般方法论。
图8:一个慢查询优化的实际例子,将系统的饱和度从40%降到了4%
7.3 - 高可用演练
模拟几种生产环境的常见故障,以测试Pigsty高可用数据库集群的自愈能力。
模拟几种生产环境的常见故障,以测试Pigsty高可用数据库集群的自愈能力。
Patroni快速上手
使用patronictl
对数据库集群进行控制,Pigsty已经创建了快捷方式pt
:
alias pt='patronictl -c /pg/bin/patroni.yml'
alias pt-up='sudo systemctl start patroni' # 启动Patroni
alias pt-dw='sudo systemctl stop patroni' # 停止Patroni
alias pt-st='systemctl status patroni' # 汇报Patroni抓昂泰
alias pt-ps='ps aux | grep patroni' # 查看Patroni进程
alias pt-log='tail -f /pg/log/patroni.log' # 监控Patroni日志
Patroni相关命令需要使用数据库超级用户(dbsu = postgres) 执行
$ pt --help
Usage: patronictl [OPTIONS] COMMAND [ARGS]...
Options:
-c, --config-file TEXT Configuration file
-d, --dcs TEXT Use this DCS
-k, --insecure Allow connections to SSL sites without certs
--help Show this message and exit.
Commands:
configure Create configuration file
dsn Generate a dsn for the provided member,...
edit-config Edit cluster configuration
failover Failover to a replica
flush Discard scheduled events
history Show the history of failovers/switchovers
list List the Patroni members for a given Patroni
pause Disable auto failover
query Query a Patroni PostgreSQL member
reinit Reinitialize cluster member
reload Reload cluster member configuration
remove Remove cluster from DCS
restart Restart cluster member
resume Resume auto failover
scaffold Create a structure for the cluster in DCS
show-config Show cluster configuration
switchover Switchover to a replica
topology Prints ASCII topology for given cluster
version Output version of patronictl command or a...
场景一:Switchover
Switch是主动切换集群领导者
$ pt switchover
Master [pg-test-3]: pg-test-3
Candidate ['pg-test-1', 'pg-test-2'] []: pg-test-1
When should the switchover take place (e.g. 2020-10-23T17:06 ) [now]: now
Current cluster topology
+ Cluster: pg-test (6886641621295638555) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Replica | running | 2 | 0 | clonefrom: true |
| pg-test-2 | 10.10.10.12 | Replica | running | 2 | 0 | clonefrom: true |
| pg-test-3 | 10.10.10.13 | Leader | running | 2 | | clonefrom: true |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Are you sure you want to switchover cluster pg-test, demoting current master pg-test-3? [y/N]: y
2020-10-23 16:06:11.76252 Successfully switched over to "pg-test-1"
场景二:Failover
# run as postgres @ any member of cluster `pg-test`
$ pt failover
Candidate ['pg-test-2', 'pg-test-3'] []: pg-test-3
Current cluster topology
+ Cluster: pg-test (6886641621295638555) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Leader | running | 1 | | clonefrom: true |
| pg-test-2 | 10.10.10.12 | Replica | running | 1 | 0 | clonefrom: true |
| pg-test-3 | 10.10.10.13 | Replica | running | 1 | 0 | clonefrom: true |
+-----------+-------------+---------+---------+----+-----------+-----------------+
Are you sure you want to failover cluster pg-test, demoting current master pg-test-1? [y/N]: y
+ Cluster: pg-test (6886641621295638555) -----+----+-----------+-----------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-----------+-------------+---------+---------+----+-----------+-----------------+
| pg-test-1 | 10.10.10.11 | Replica | running | 2 | 0 | clonefrom: true |
| pg-test-2 | 10.10.10.12 | Replica | running | 2 | 0 | clonefrom: true |
| pg-test-3 | 10.10.10.13 | Leader | running | 2 | | clonefrom: true |
+-----------+-------------+---------+---------+----+-----------+-----------------+
场景三:从库Patroni/Postgres宕机
场景四:主库Patroni/Postgres宕机
场景五:DCS不可用
场景六:维护模式
问题探讨
关键问题:DCS的SLA如何保障?
==在自动切换模式下,如果DCS挂了,当前主库会在retry_timeout 后Demote成从库,导致所有集群不可写==。
作为分布式共识数据库,Consul/Etcd是相当稳健的,但仍必须确保DCS的SLA高于DB的SLA。
解决方法:配置一个足够大的retry_timeout
,并通过几种以下方式从管理上解决此问题。
- SLA确保DCS一年的不可用时间短于该时长
- 运维人员能确保在
retry_timeout
之内解决DCS Service Down的问题。
- DBA能确保在
retry_timeout
之内将关闭集群的自动切换功能(打开维护模式)。
可以优化的点? 添加绕开DCS的P2P检测,如果主库意识到自己所处的分区仍为Major分区,不触发操作。
关键问题:HA策略,RPO优先或RTO优先?
可用性与一致性谁优先?例如,普通库RTO优先,金融支付类RPO优先。
普通库允许紧急故障切换时丢失极少量数据(阈值可配置,例如最近1M写入)
与钱相关的库不允许丢数据,相应地在故障切换时需要更多更审慎的检查或人工介入。
关键问题:Fencing机制,是否允许关机?
在正常情况下,Patroni会在发生Leader Change时先执行Primary Fencing,通过杀掉PG进程的方式进行。
但在某些极端情况下,比如vm暂停,软件Bug,或者极高负载,有可能没法成功完成这一点。那么就需要通过重启机器的方式一了百了。是否可以接受?在极端环境下会有怎样的表现?
关键操作:选主之后
选主之后要记得存盘。手工做一次Checkpoint确保万无一失。
关键问题:流量切换怎样做,2层,4层,7层
- 2层:VIP漂移
- 4层:Haproxy分发
- 7层:DNS域名解析
关键问题:一主一从的特殊场景
- 2层:VIP漂移
- 4层:Haproxy分发
- 7层:DNS域名解析
HA Procedure
Failure Detection
https://patroni.readthedocs.io/en/latest/SETTINGS.html#dynamic-configuration-settings
Fencing
https://patroni.readthedocs.io/en/latest/watchdog.html
Bad Cases
Traffic Routing
DNS
VIP
HAproxy
Pgbouncer
7.4 - 数据库应用
以ISD数据集为例,展现如何将数据导入数据库中
如果您拥有数据库后不知道干点什么,不妨参考作者的另一个开源项目:Vonng/isd
您可以直接复用监控系统Grafana,以交互式的方式查阅近30000个地面气象站过去120年间的亚小时级气象数据。
ISD —— Intergrated Surface Data
这里包含了下载、解析、处理、可视化NOAA ISD数据集所需的所有工具。
能让您查阅近30000个地面气象站过去120年间的亚小时级气象数据。并充分体验PostgreSQL带来的强大的数据分析与处理能力!
SYNOPSIS
Download, Parse, Visualize Intergrated Suface Dataset.
Including 30000 meteorology station, sub-hourly observation records, from 1900-2020.
Quick Started
-
Clone repo
git clone https://github.com/Vonng/isd && cd isd
-
Prepare a postgres database
Connect via something like isd
or postgres://user:pass@host/dbname
)
# skip this if you already have a viable database
PGURL=postgres
psql ${PGURL} -c 'CREATE DATABASE isd;'
# database connection string, something like `isd` or `postgres://user:pass@host/dbname`
PGURL='isd'
psql ${PGURL} -AXtwc 'CREATE EXTENSION postgis;'
# create tables, partitions, functions
psql ${PGURL} -AXtwf 'sql/schema.sql'
-
Download data
- ISD Station: Station metadata, id, name, location, country, etc…
- ISD History: Station observation records: observation count per month
- ISD Hourly: Yearly archived station (sub-)hourly observation records
- ISD Daily: Yearly archvied station daily aggregated summary
git clone https://github.com/Vonng/isd && cd isd
bin/get-isd-station.sh # download isd station from noaa (proxy makes it faster)
bin/get-isd-history.sh # download isd history observation from noaa
bin/get-isd-hourly.sh <year> # download isd hourly data (yearly tarball 1901-2020)
bin/get-isd-daily.sh <year> # download isd daily data (yearly tarball 1929-2020)
-
Build Parser
There are two ISD dataset parsers written in Golang : isdh
for isd hourly dataset and isdd
for isd daily dataset.
make isdh
and make isdd
will build it and copy to bin. These parsers are required for loading data into database.
You can download pre-compiled binary to bin/ dir to skip this phase.
-
Load data
Metadata includes world_fences
, china_fences
, isd_elements
, isd_mwcode
, isd_station
, isd_history
. These are gzipped csv file lies in data/meta/
. world_fences
, china_fences
, isd_elements
, isd_mwcode
are constant dict table. But isd_station
and isd_history
are frequently updated. You’ll have to download it from noaa before loading it.
# load metadata: fences, dicts, station, history,...
bin/load-meta.sh
# load a year's daily data to database
bin/load-isd-daily <year>
# load a year's hourly data to database
bin/laod-isd-hourly <year>
Note that the original isd_daily
dataset has some un-cleansed data, refer caveat for detail.
Data
Dataset
Hourly Data: Oringinal tarball size 105GB, Table size 1TB (+600GB Indexes).
Daily Data: Oringinal tarball size 3.2GB, table size 24 GB
It is recommended to have 2TB storage for a full installation, and at least 40GB for daily data only installation.
Schema
Data schema definition
Station
CREATE TABLE public.isd_station
(
station VARCHAR(12) PRIMARY KEY,
usaf VARCHAR(6) GENERATED ALWAYS AS (substring(station, 1, 6)) STORED,
wban VARCHAR(5) GENERATED ALWAYS AS (substring(station, 7, 5)) STORED,
name VARCHAR(32),
country VARCHAR(2),
province VARCHAR(2),
icao VARCHAR(4),
location GEOMETRY(POINT),
longitude NUMERIC GENERATED ALWAYS AS (Round(ST_X(location)::NUMERIC, 6)) STORED,
latitude NUMERIC GENERATED ALWAYS AS (Round(ST_Y(location)::NUMERIC, 6)) STORED,
elevation NUMERIC,
period daterange,
begin_date DATE GENERATED ALWAYS AS (lower(period)) STORED,
end_date DATE GENERATED ALWAYS AS (upper(period)) STORED
);
Hourly Data
CREATE TABLE public.isd_hourly
(
station VARCHAR(11) NOT NULL,
ts TIMESTAMP NOT NULL,
temp NUMERIC(3, 1),
dewp NUMERIC(3, 1),
slp NUMERIC(5, 1),
stp NUMERIC(5, 1),
vis NUMERIC(6),
wd_angle NUMERIC(3),
wd_speed NUMERIC(4, 1),
wd_gust NUMERIC(4, 1),
wd_code VARCHAR(1),
cld_height NUMERIC(5),
cld_code VARCHAR(2),
sndp NUMERIC(5, 1),
prcp NUMERIC(5, 1),
prcp_hour NUMERIC(2),
prcp_code VARCHAR(1),
mw_code VARCHAR(2),
aw_code VARCHAR(2),
pw_code VARCHAR(1),
pw_hour NUMERIC(2),
data JSONB
) PARTITION BY RANGE (ts);
Daily Data
CREATE TABLE public.isd_daily
(
station VARCHAR(12) NOT NULL,
ts DATE NOT NULL,
temp_mean NUMERIC(3, 1),
temp_min NUMERIC(3, 1),
temp_max NUMERIC(3, 1),
dewp_mean NUMERIC(3, 1),
slp_mean NUMERIC(5, 1),
stp_mean NUMERIC(5, 1),
vis_mean NUMERIC(6),
wdsp_mean NUMERIC(4, 1),
wdsp_max NUMERIC(4, 1),
gust NUMERIC(4, 1),
prcp_mean NUMERIC(5, 1),
prcp NUMERIC(5, 1),
sndp NuMERIC(5, 1),
is_foggy BOOLEAN,
is_rainy BOOLEAN,
is_snowy BOOLEAN,
is_hail BOOLEAN,
is_thunder BOOLEAN,
is_tornado BOOLEAN,
temp_count SMALLINT,
dewp_count SMALLINT,
slp_count SMALLINT,
stp_count SMALLINT,
wdsp_count SMALLINT,
visib_count SMALLINT,
temp_min_f BOOLEAN,
temp_max_f BOOLEAN,
prcp_flag CHAR,
PRIMARY KEY (ts, station)
) PARTITION BY RANGE (ts);
Update
ISD Daily and ISD hourly dataset will rolling update each day. Run following scripts to load latest data into database.
# download, clean, reload latest hourly dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh
# download, clean, reload latest daily dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh
# recalculate latest partition of monthly and yearly
bin/refresh-latest.sh
Parser
There are two parser: isdd
and isdh
, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consume by PostgreSQL Copy command).
NAME
isdh -- Intergrated Surface Dataset Hourly Parser
SYNOPSIS
isdh [-i <input|stdin>] [-o <output|st>] -p -d -c -v
DESCRIPTION
The isdh program takes isd hourly (yearly tarball file) as input.
And generate csv format as output
OPTIONS
-i <input> input file, stdin by default
-o <output> output file, stdout by default
-p <profpath> pprof file path (disable by default)
-v verbose progress report
-d de-duplicate rows (raw, ts-first, hour-first)
-c add comma separated extra columns
UI
ISD Station
ISD Monthly
8.1 - 配置文件
配置参数详细介绍
以下是用于沙箱环境的默认配置文件:pigsty.yml
---
######################################################################
# File : pigsty.yml
# Desc : Pigsty Configuration Example
# Note : Pigsty Sandbox Demo
# Link : https://pigsty.cc/zh/docs/config/
# Ctime : 2020-05-22
# Mtime : 2021-04-19
# Copyright (C) 2018-2021 Ruohang Feng
######################################################################
######################################################################
# Development Environment Inventory #
######################################################################
all: # top-level namespace
#==================================================================#
# Clusters #
#==================================================================#
# postgres database clusters are defined as kv pair in `all.children`
# where the key is cluster name and the value is the object consist
# of cluster members (hosts) and cluster specific variables (vars)
# meta nodes are defined in special group "meta" with `meta_node=true`
children:
#-----------------------------
# meta controller
#-----------------------------
meta: # special group 'meta' defines the main controller machine
vars:
meta_node: true # mark node as meta controller
ansible_group_priority: 99 # meta group has top priority
hosts:
10.10.10.10: {}
#-----------------------------
# cluster: pg-meta
#-----------------------------
# pg-meta is a single-node pgsql cluster deployed on meta node (10.10.10.10)
pg-meta:
# - cluster members - #
hosts:
10.10.10.10: {pg_seq: 1, pg_role: primary, pg_offline_query: true}
# - cluster configs - #
vars:
pg_cluster: pg-meta # define actual cluster name
pg_version: 13 # define installed pgsql version
node_tune: tiny # tune node into oltp|olap|crit|tiny mode
pg_conf: tiny.yml # tune pgsql into oltp|olap|crit|tiny mode
patroni_mode: pause # enter maintenance mode, {default|pause|remove}
patroni_watchdog_mode: off # disable watchdog (require|automatic|off)
pg_lc_ctype: en_US.UTF8 # enabled pg_trgm i18n char support
# - defining business users - #
pg_users:
# default production read-write user dbuser_meta
- name: dbuser_meta # user's name is required
password: md5d3d10d8cad606308bdb180148bf663e1 # md5 password is acceptable
pgbouncer: true # add user to pgbouncer userlist
roles: [dbrole_readwrite] # grant roles to user
comment: default production read-write user for meta database
# default production read-only user for grafana direct access
- name: dbuser_grafana
password: DBUser.Grafana
pgbouncer: true
roles: [dbrole_readonly]
comment: default readonly access for grafana datasource
# complete example of user/role definition
- name: dbuser_pigsty # pigsty user have admin access (DDL|DML)
password: DBUser.Pigsty # example user's password, can be md5 encrypted
login: true # can login, true by default (should be false for role)
superuser: false # is superuser? false by default
createdb: false # can create database? false by default
createrole: false # can create role? false by default
inherit: true # can this role use inherited privileges?
replication: false # can this role do replication? false by default
bypassrls: false # can this role bypass row level security? false by default
pgbouncer: true # add this user to pgbouncer? false by default (true for production user)
connlimit: -1 # connection limit, -1 disable limit
expire_in: 3650 # now + n days when this role is expired (OVERWRITE expire_at)
expire_at: '2030-12-31' # 'timestamp' when this role is expired (OVERWRITTEN by expire_in)
comment: pigsty admin user # comment on user/role
roles: [dbrole_admin] # dbrole_{admin,readonly,readwrite,offline}
parameters: # additional role level parameters with ALTER ROLE SET
search_path: pigsty,public # add pigsty schema into search_path
# - defining business databases - #
pg_databases:
- name: meta # name is the only required field for a database
# baseline: metadb/schema.sql # pigsty meta database baseline
# owner: postgres # optional, database owner
# template: template1 # optional, template1 by default
# encoding: UTF8 # optional, UTF8 by default , must same as template database, leave blank to set to db default
# locale: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_collate: C # optional, C by default , must same as template database, leave blank to set to db default
# lc_ctype: C # optional, C by default , must same as template database, leave blank to set to db default
# tablespace: pg_default # optional, 'pg_default' is the default tablespace
# allowconn: true # optional, true by default, false disable connect at all
# revokeconn: false # optional, false by default, true revoke connect from public # (only default user and owner have connect privilege on database)
# pgbouncer: true # optional, add this database to pgbouncer list? true by default
comment: pigsty meta database # optional, comment string for database
connlimit: -1 # optional, connection limit, -1 or none disable limit (default)
schemas: [pigsty] # optional, create additional schema
extensions: # optional, extension name and which schema to create
- {name: adminpack, schema: pg_catalog}
parameters: # optional, extra parameters with ALTER DATABASE
search_path: 'pigsty,public' # add pigsty to search_path
pg_default_database: meta # default database will be used as primary monitor target
vip_mode: l2 # none|l2|l4, l2 vip are used in sandbox demo
vip_address: 10.10.10.2 # virtual ip address
vip_cidrmask: 8 # cidr network mask length
vip_interface: eth1 # interface to add virtual ip
#-----------------------------
# cluster: pg-test
#-----------------------------
# uncomment this for complete 4-node sandbox demo environment
#pg-test: # define cluster named 'pg-test'
# # - cluster members - #
# hosts:
# 10.10.10.11: {pg_seq: 1, pg_role: primary}
# 10.10.10.12: {pg_seq: 2, pg_role: replica}
# 10.10.10.13: {pg_seq: 3, pg_role: offline}
#
# # - cluster configs - #
# vars:
# # basic settings
# pg_cluster: pg-test # define actual cluster name
# pg_version: 13 # define installed pgsql version
# node_tune: tiny # tune node into oltp|olap|crit|tiny mode
# pg_conf: tiny.yml # tune pgsql into oltp|olap|crit|tiny mode
# pg_users:
# - name: test # admin user for pg-test, have DDL
# password: test
# roles: [dbrole_admin]
# pgbouncer: true
# comment: default admin user for test database
#
# - name: dbuser_test # production rw-user
# password: DBUser.Test
# roles: [dbrole_readwrite]
# pgbouncer: true
# comment: default test user for production usage
#
# pg_databases: # create a business database 'test'
# - name: test # use the simplest form
# extensions: # install postgis to test database
# - {name: postgis, schema: public}
# pg_default_database: test # default database will be used as primary monitor target
#
# # extra service settings
# pg_services_extra: # extra services to be added
# - name: standby # service name pg-meta-standby
# src_ip: "*"
# src_port: 5435 # 5435 routes to sync replica
# dst_port: postgres
# check_url: /sync # use /sync health check
# selector: "[]" # jmespath to filter instances
# selector_backup: "[? pg_role == `primary`]" # primary used as backup server for standby service
#
# # proxy settings
# vip_mode: l2 # enable/disable vip (require members in same LAN)
# vip_address: 10.10.10.3 # virtual ip address
# vip_cidrmask: 8 # cidr network mask length
# vip_interface: eth1 # interface to add virtual ip
#==================================================================#
# Globals #
#==================================================================#
vars:
#------------------------------------------------------------------------------
# CONNECTION PARAMETERS
#------------------------------------------------------------------------------
# this section defines connection parameters
# ansible_user: vagrant # admin user with ssh access and sudo privilege
proxy_env: # global proxy env when downloading packages
no_proxy: "localhost,127.0.0.1,10.0.0.0/8,192.168.0.0/16,*.pigsty,*.aliyun.com,mirrors.aliyuncs.com,mirrors.tuna.tsinghua.edu.cn,mirrors.zju.edu.cn,*.myqcloud.com"
# http_proxy: ''
# https_proxy: ''
# all_proxy: ''
#------------------------------------------------------------------------------
# REPO PROVISION
#------------------------------------------------------------------------------
# this section defines how to build a local repo
# - repo basic - #
repo_enabled: true # build local yum repo on meta nodes?
repo_name: pigsty # local repo name
repo_address: yum.pigsty # repo external address (ip:port or url)
repo_port: 80 # listen address, must same as repo_address
repo_home: /www # default repo dir location
repo_rebuild: false # force re-download packages
repo_remove: true # remove existing repos
# - where to download - #
repo_upstreams:
- name: base
description: CentOS-$releasever - Base - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/
gpgcheck: no
failovermethod: priority
- name: updates
description: CentOS-$releasever - Updates - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/
gpgcheck: no
failovermethod: priority
- name: extras
description: CentOS-$releasever - Extras - Aliyun Mirror
baseurl:
- http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
- http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/
- http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/
gpgcheck: no
failovermethod: priority
- name: epel
description: CentOS $releasever - EPEL - Aliyun Mirror
baseurl: http://mirrors.aliyun.com/epel/$releasever/$basearch
gpgcheck: no
failovermethod: priority
- name: grafana
description: Grafana - TsingHua Mirror
gpgcheck: no
baseurl: https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm
- name: prometheus
description: Prometheus and exporters
gpgcheck: no
baseurl: https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch
# consider using ZJU PostgreSQL mirror in mainland china
- name: pgdg-common
description: PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch
gpgcheck: no
baseurl: https://download.postgresql.org/pub/repos/yum/common/redhat/rhel-$releasever-$basearch
# baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch
- name: pgdg13
description: PostgreSQL 13 for RHEL/CentOS $releasever - $basearch
gpgcheck: no
baseurl: https://download.postgresql.org/pub/repos/yum/13/redhat/rhel-$releasever-$basearch
# baseurl: http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch
- name: centos-sclo
description: CentOS-$releasever - SCLo
gpgcheck: no
mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo
- name: centos-sclo-rh
description: CentOS-$releasever - SCLo rh
gpgcheck: no
mirrorlist: http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh
- name: nginx
description: Nginx Official Yum Repo
skip_if_unavailable: true
gpgcheck: no
baseurl: http://nginx.org/packages/centos/$releasever/$basearch/
- name: haproxy
description: Copr repo for haproxy
skip_if_unavailable: true
gpgcheck: no
baseurl: https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/
# for latest consul & kubernetes
- name: harbottle
description: Copr repo for main owned by harbottle
skip_if_unavailable: true
gpgcheck: no
baseurl: https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/
# - what to download - #
repo_packages:
# repo bootstrap packages
- epel-release nginx wget yum-utils yum createrepo sshpass unzip # bootstrap packages
# node basic packages
- ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget git tuned # basic system util
- readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates # basic pg dependency
- numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet # system utils
# dcs & monitor packages
- grafana prometheus2 pushgateway alertmanager # monitor and ui
- node_exporter postgres_exporter nginx_exporter blackbox_exporter # exporter
- consul consul_exporter consul-template etcd # dcs
# python3 dependencies
- ansible python python-pip python-psycopg2 audit # ansible & python
- python3 python3-psycopg2 python36-requests python3-etcd python3-consul # python3
- python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography # patroni extra deps
# proxy and load balancer
- haproxy keepalived dnsmasq # proxy and dns
# postgres common Packages
- patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity # major components
- pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel # other common utils
- barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail
# postgres 13 packages
- postgresql13* postgis31* citus_13 timescaledb_13 # pgrouting_13 # postgres 13 and postgis 31
- pg_repack13 pg_squeeze13 # maintenance extensions
- pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13 # stats extensions
- plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13 # PL extensions
- hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13 # FDW extensions
- wal2json13 count_distinct13 ddlx_13 geoip13 orafce13 # MISC extensions
- rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13
- pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13
- pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13
- pguint13 pguri13 prefix13 safeupdate_13 semver13 table_version13 tdigest13
repo_url_packages:
# additional rpm packages
- https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter-0.3.2-1.el7.x86_64.rpm
- https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm
- http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm
# tar.gz and zip binary packages
- https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz # monitor binary
- https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter_v0.3.2_linux-amd64.tar.gz
- https://github.com/grafana/loki/releases/download/v2.2.1/loki-linux-amd64.zip # loki binary
- https://github.com/grafana/loki/releases/download/v2.2.1/promtail-linux-amd64.zip
- https://github.com/grafana/loki/releases/download/v2.2.1/logcli-linux-amd64.zip
- https://github.com/grafana/loki/releases/download/v2.2.1/loki-canary-linux-amd64.zip
# mirror in mainland china (use commented packages to install from official site)
# - http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/pg_exporter-0.3.2-1.el7.x86_64.rpm
# - http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/vip-manager_0.6-1_amd64.rpm
# - http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg/polysh-0.4-1.noarch.rpm
#------------------------------------------------------------------------------
# NODE PROVISION
#------------------------------------------------------------------------------
# this section defines how to provision nodes
# nodename: # if defined, node's hostname will be overwritten
# - node dns - #
node_dns_hosts: # static dns records in /etc/hosts
- 10.10.10.10 yum.pigsty
node_dns_server: add # add (default) | none (skip) | overwrite (remove old settings)
node_dns_servers: # dynamic nameserver in /etc/resolv.conf
- 10.10.10.10
node_dns_options: # dns resolv options
- options single-request-reopen timeout:1 rotate
- domain service.consul
# - node repo - #
node_repo_method: local # none|local|public (use local repo for production env)
node_repo_remove: true # whether remove existing repo
node_local_repo_url: # local repo url (if method=local, make sure firewall is configured or disabled)
- http://yum.pigsty/pigsty.repo
# - node packages - #
node_packages: # common packages for all nodes
- wget,yum-utils,sshpass,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl
- numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq
- python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul
- python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography
- node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager
node_extra_packages: # extra packages for all nodes
- patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity
node_meta_packages: # packages for meta nodes only
- grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway
- dnsmasq,nginx,ansible,pgbadger,polysh
# build & devel packages (add to repo_packages too if you want build database & extensions from source)
# - gcc,gcc-c++,clang,coreutils,diffutils,rpm-build,rpm-devel,rpmlint,rpmdevtools
# - zlib-devel,openssl-libs,openssl-devel,pam-devel,libxml2-devel,libxslt-devel,openldap-devel,systemd-devel,tcl-devel,python-devel
# - node features - #
node_disable_numa: false # disable numa, important for production database, reboot required
node_disable_swap: false # disable swap, important for production database
node_disable_firewall: true # disable firewall (required if using kubernetes)
node_disable_selinux: true # disable selinux (required if using kubernetes)
node_static_network: true # keep dns resolver settings after reboot
node_disk_prefetch: false # setup disk prefetch on HDD to increase performance
# - node kernel modules - #
node_kernel_modules:
- softdog
- br_netfilter
- ip_vs
- ip_vs_rr
- ip_vs_rr
- ip_vs_wrr
- ip_vs_sh
- nf_conntrack_ipv4
# - node tuned - #
node_tune: tiny # install and activate tuned profile: none|oltp|olap|crit|tiny
node_sysctl_params: {} # set additional sysctl parameters, k:v format
# net.bridge.bridge-nf-call-iptables: 1 # example kv parameters
# - node user - #
node_admin_setup: true # setup an default admin user ?
node_admin_uid: 88 # uid and gid for admin user
node_admin_username: dba # default admin user: dba
node_admin_ssh_exchange: true # exchange admin's ssh key among cluster ?
node_admin_pk_current: false # add current user's ~/.ssh/id_rsa.pub to admin pk
node_admin_pks: # ssh public keys to be added to admin user
- 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com'
# - node ntp - #
node_ntp_service: ntp # ntp or chrony
node_ntp_config: true # overwrite existing ntp config?
node_timezone: Asia/Shanghai # default node timezone
node_ntp_servers: # default NTP servers
- pool cn.pool.ntp.org iburst
- pool pool.ntp.org iburst
- pool time.pool.aliyun.com iburst
- server 10.10.10.10 iburst
#------------------------------------------------------------------------------
# META PROVISION
#------------------------------------------------------------------------------
# - ca - #
ca_method: create # create|copy|recreate
ca_subject: "/CN=root-ca" # self-signed CA subject
ca_homedir: /ca # ca cert directory
ca_cert: ca.crt # ca public key/cert
ca_key: ca.key # ca private key
# - nginx - #
nginx_upstream:
- { name: home, host: pigsty, url: "127.0.0.1:3000"}
- { name: consul, host: c.pigsty, url: "127.0.0.1:8500" }
- { name: grafana, host: g.pigsty, url: "127.0.0.1:3000" }
- { name: prometheus, host: p.pigsty, url: "127.0.0.1:9090" }
- { name: alertmanager, host: a.pigsty, url: "127.0.0.1:9093" }
- { name: haproxy, host: h.pigsty, url: "127.0.0.1:9091" }
# - nameserver - #
dns_records: # dynamic dns record resolved by dnsmasq
- 10.10.10.2 pg-meta # sandbox vip for pg-meta
- 10.10.10.3 pg-test # sandbox vip for pg-test
- 10.10.10.10 meta-1 # sandbox node meta-1 (node-0)
- 10.10.10.11 node-1 # sandbox node node-1
- 10.10.10.12 node-2 # sandbox node node-2
- 10.10.10.13 node-3 # sandbox node node-3
- 10.10.10.10 pigsty
- 10.10.10.10 y.pigsty yum.pigsty
- 10.10.10.10 c.pigsty consul.pigsty
- 10.10.10.10 g.pigsty grafana.pigsty
- 10.10.10.10 p.pigsty prometheus.pigsty
- 10.10.10.10 a.pigsty alertmanager.pigsty
- 10.10.10.10 n.pigsty ntp.pigsty
- 10.10.10.10 h.pigsty haproxy.pigsty
# - prometheus - #
prometheus_data_dir: /export/prometheus/data # prometheus data dir
prometheus_options: '--storage.tsdb.retention=30d'
prometheus_reload: false # reload prometheus instead of recreate it
prometheus_sd_method: consul # service discovery method: static|consul|etcd
prometheus_scrape_interval: 5s # global scrape & evaluation interval
prometheus_scrape_timeout: 4s # scrape timeout
prometheus_sd_interval: 5s # service discovery refresh interval
# - grafana - #
grafana_url: http://admin:admin@10.10.10.10:3000 # grafana url
grafana_admin_password: admin # default grafana admin user password
grafana_plugin: install # none|install|reinstall
grafana_cache: /www/pigsty/grafana/plugins.tgz # path to grafana plugins tarball
grafana_customize: false # customize grafana resources
grafana_plugins: # default grafana plugins list
- redis-datasource
- simpod-json-datasource
- fifemon-graphql-datasource
- sbueringer-consul-datasource
- camptocamp-prometheus-alertmanager-datasource
- ryantxu-ajax-panel
- marcusolsson-hourly-heatmap-panel
- michaeldmoore-multistat-panel
- marcusolsson-treemap-panel
- pr0ps-trackmap-panel
- dalvany-image-panel
- magnesium-wordcloud-panel
- cloudspout-button-panel
- speakyourcode-button-panel
- jdbranham-diagram-panel
- grafana-piechart-panel
- snuids-radar-panel
- digrich-bubblechart-panel
grafana_git_plugins:
- https://github.com/Vonng/grafana-echarts
# - loki - #
loki_clean: false # whether remove existing loki data
loki_data_dir: /export/loki # default loki data dir
#------------------------------------------------------------------------------
# DCS PROVISION
#------------------------------------------------------------------------------
service_registry: consul # where to register services: none | consul | etcd | both
dcs_type: consul # consul | etcd | both
dcs_name: pigsty # consul dc name | etcd initial cluster token
dcs_servers: # dcs server dict in name:ip format
meta-1: 10.10.10.10 # you could use existing dcs cluster
# meta-2: 10.10.10.11 # host which have their IP listed here will be init as server
# meta-3: 10.10.10.12 # 3 or 5 dcs nodes are recommend for production environment
dcs_exists_action: clean # abort|skip|clean if dcs server already exists
dcs_disable_purge: false # set to true to disable purge functionality for good (force dcs_exists_action = abort)
consul_data_dir: /var/lib/consul # consul data dir (/var/lib/consul by default)
etcd_data_dir: /var/lib/etcd # etcd data dir (/var/lib/consul by default)
#------------------------------------------------------------------------------
# POSTGRES INSTALLATION
#------------------------------------------------------------------------------
# - dbsu - #
pg_dbsu: postgres # os user for database, postgres by default (change it is not recommended!)
pg_dbsu_uid: 26 # os dbsu uid and gid, 26 for default postgres users and groups
pg_dbsu_sudo: limit # none|limit|all|nopass (Privilege for dbsu, limit is recommended)
pg_dbsu_home: /var/lib/pgsql # postgresql binary
pg_dbsu_ssh_exchange: false # exchange ssh key among same cluster
# - postgres packages - #
pg_version: 13 # default postgresql version
pgdg_repo: false # use official pgdg yum repo (disable if you have local mirror)
pg_add_repo: false # add postgres related repo before install (useful if you want a simple install)
pg_bin_dir: /usr/pgsql/bin # postgres binary dir
pg_packages:
- postgresql${pg_version}*
- postgis31_${pg_version}*
- pgbouncer patroni pg_exporter pgbadger
- patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity
- python3 python3-psycopg2 python36-requests python3-etcd python3-consul
- python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography
pg_extensions:
- pg_repack${pg_version} pg_qualstats${pg_version} pg_stat_kcache${pg_version} wal2json${pg_version}
# - ogr_fdw${pg_version} mysql_fdw_${pg_version} redis_fdw_${pg_version} mongo_fdw${pg_version} hdfs_fdw_${pg_version}
# - count_distinct${version} ddlx_${version} geoip${version} orafce${version} # popular features
# - hypopg_${version} ip4r${version} jsquery_${version} logerrors_${version} periods_${version} pg_auto_failover_${version} pg_catcheck${version}
# - pg_fkpart${version} pg_jobmon${version} pg_partman${version} pg_prioritize_${version} pg_track_settings${version} pgaudit15_${version}
# - pgcryptokey${version} pgexportdoc${version} pgimportdoc${version} pgmemcache-${version} pgmp${version} pgq-${version} pgquarrel pgrouting_${version}
# - pguint${version} pguri${version} prefix${version} safeupdate_${version} semver${version} table_version${version} tdigest${version}
#------------------------------------------------------------------------------
# POSTGRES PROVISION
#------------------------------------------------------------------------------
# - identity - #
# pg_cluster: # [REQUIRED] cluster name (cluster level, validated during pg_preflight)
# pg_seq: 0 # [REQUIRED] instance seq (instance level, validated during pg_preflight)
# pg_role: replica # [REQUIRED] service role (instance level, validated during pg_preflight)
# pg_shard: # [OPTIONAL] shard name (cluster level)
# pg_sindex: # [OPTIONAl] shard index (cluster level)
# - identity option -#
pg_hostname: false # overwrite node hostname with pg instance name
pg_nodename: true # overwrite consul nodename with pg instance name
# - retention - #
# pg_exists_action, available options: abort|clean|skip
# - abort: abort entire play's execution (default)
# - clean: remove existing cluster (dangerous)
# - skip: end current play for this host
# pg_exists: false # auxiliary flag variable (DO NOT SET THIS)
pg_exists_action: clean
pg_disable_purge: false # set to true to disable pg purge functionality for good (force pg_exists_action = abort)
# - storage - #
pg_data: /pg/data # postgres data directory
pg_fs_main: /export # data disk mount point /pg -> {{ pg_fs_main }}/postgres/{{ pg_instance }}
pg_fs_bkup: /var/backups # backup disk mount point /pg/* -> {{ pg_fs_bkup }}/postgres/{{ pg_instance }}/*
# - connection - #
pg_listen: '0.0.0.0' # postgres listen address, '0.0.0.0' by default (all ipv4 addr)
pg_port: 5432 # postgres port (5432 by default)
pg_localhost: /var/run/postgresql # localhost unix socket dir for connection
# pg_upstream: # [OPTIONAL] specify replication upstream (set on primary transform cluster into a standby cluster)
# - patroni - #
# patroni_mode, available options: default|pause|remove
# - default: default ha mode
# - pause: into maintenance mode
# - remove: remove patroni after bootstrap
patroni_mode: default # pause|default|remove
pg_namespace: /pg # top level key namespace in dcs
patroni_port: 8008 # default patroni port
patroni_watchdog_mode: automatic # watchdog mode: off|automatic|required
pg_conf: tiny.yml # user provided patroni config template path
# - flags - #
pg_backup: false # store base backup on this node
pg_delay: 0 # apply delay for offline|delayed instance
# - localization - #
pg_encoding: UTF8 # default to UTF8
pg_locale: C # default to C
pg_lc_collate: C # default to C
pg_lc_ctype: en_US.UTF8 # default to en_US.UTF8
# - pgbouncer - #
pgbouncer_port: 6432 # pgbouncer port (6432 by default)
pgbouncer_poolmode: transaction # pooling mode: (transaction pooling by default)
pgbouncer_max_db_conn: 100 # important! do not set this larger than postgres max conn or conn limit
#------------------------------------------------------------------------------
# POSTGRES TEMPLATE
#------------------------------------------------------------------------------
# - template - #
pg_init: pg-init # init script for cluster template
# - system roles - #
pg_replication_username: replicator # system replication user
pg_replication_password: DBUser.Replicator # system replication password
pg_monitor_username: dbuser_monitor # system monitor user
pg_monitor_password: DBUser.Monitor # system monitor password
pg_admin_username: dbuser_dba # system admin user
pg_admin_password: DBUser.DBA # system admin password
# - default roles - #
# chekc http://pigsty.cc/zh/docs/concepts/provision/acl/ for more detail
pg_default_roles:
# common production readonly user
- name: dbrole_readonly # production read-only roles
login: false
comment: role for global readonly access
# common production read-write user
- name: dbrole_readwrite # production read-write roles
login: false
roles: [dbrole_readonly] # read-write includes read-only access
comment: role for global read-write access
# offline have same privileges as readonly, but with limited hba access on offline instance only
# for the purpose of running slow queries, interactive queries and perform ETL tasks
- name: dbrole_offline
login: false
comment: role for restricted read-only access (offline instance)
# admin have the privileges to issue DDL changes
- name: dbrole_admin
login: false
bypassrls: true
comment: role for object creation
roles: [dbrole_readwrite,pg_monitor,pg_signal_backend]
# dbsu, name is designated by `pg_dbsu`. It's not recommend to set password for dbsu
- name: postgres
superuser: true
comment: system superuser
# default replication user, name is designated by `pg_replication_username`, and password is set by `pg_replication_password`
- name: replicator
replication: true # for replication user
bypassrls: true # logical replication require bypassrls
roles: [pg_monitor, dbrole_readonly] # logical replication require select privileges
comment: system replicator
# default monitor user, name is designated by `pg_monitor_username`, and password is set by `pg_monitor_password`
- name: dbuser_monitor
connlimit: 16
comment: system monitor user
roles: [pg_monitor, dbrole_readonly]
parameters:
log_min_duration_statement: 1000
# default admin super user, name is designated by `pg_admin_username`, and password is set by `pg_admin_password`
- name: dbuser_dba
superuser: true
comment: system admin user
roles: [dbrole_admin]
# default stats user, for ETL and slow queries
- name: dbuser_stats
password: DBUser.Stats
comment: business offline user for offline queries and ETL
roles: [dbrole_offline]
# - privileges - #
# object created by dbsu and admin will have their privileges properly set
pg_default_privileges:
- GRANT USAGE ON SCHEMAS TO dbrole_readonly
- GRANT SELECT ON TABLES TO dbrole_readonly
- GRANT SELECT ON SEQUENCES TO dbrole_readonly
- GRANT EXECUTE ON FUNCTIONS TO dbrole_readonly
- GRANT USAGE ON SCHEMAS TO dbrole_offline
- GRANT SELECT ON TABLES TO dbrole_offline
- GRANT SELECT ON SEQUENCES TO dbrole_offline
- GRANT EXECUTE ON FUNCTIONS TO dbrole_offline
- GRANT INSERT, UPDATE, DELETE ON TABLES TO dbrole_readwrite
- GRANT USAGE, UPDATE ON SEQUENCES TO dbrole_readwrite
- GRANT TRUNCATE, REFERENCES, TRIGGER ON TABLES TO dbrole_admin
- GRANT CREATE ON SCHEMAS TO dbrole_admin
# - schemas - #
pg_default_schemas: [monitor] # default schemas to be created
# - extension - #
pg_default_extensions: # default extensions to be created
- { name: 'pg_stat_statements', schema: 'monitor' }
- { name: 'pgstattuple', schema: 'monitor' }
- { name: 'pg_qualstats', schema: 'monitor' }
- { name: 'pg_buffercache', schema: 'monitor' }
- { name: 'pageinspect', schema: 'monitor' }
- { name: 'pg_prewarm', schema: 'monitor' }
- { name: 'pg_visibility', schema: 'monitor' }
- { name: 'pg_freespacemap', schema: 'monitor' }
- { name: 'pg_repack', schema: 'monitor' }
- name: postgres_fdw
- name: file_fdw
- name: btree_gist
- name: btree_gin
- name: pg_trgm
- name: intagg
- name: intarray
# - hba - #
pg_offline_query: false # set to true to enable offline query on instance
pg_reload: true # reload postgres after hba changes
pg_hba_rules: # postgres host-based authentication rules
- title: allow meta node password access
role: common
rules:
- host all all 10.10.10.10/32 md5
- title: allow intranet admin password access
role: common
rules:
- host all +dbrole_admin 10.0.0.0/8 md5
- host all +dbrole_admin 172.16.0.0/12 md5
- host all +dbrole_admin 192.168.0.0/16 md5
- title: allow intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
- title: allow local read/write (local production user via pgbouncer)
role: common
rules:
- local all +dbrole_readonly md5
- host all +dbrole_readonly 127.0.0.1/32 md5
- title: allow offline query (ETL,SAGA,Interactive) on offline instance
role: offline
rules:
- host all +dbrole_offline 10.0.0.0/8 md5
- host all +dbrole_offline 172.16.0.0/12 md5
- host all +dbrole_offline 192.168.0.0/16 md5
pg_hba_rules_extra: [] # extra hba rules (for cluster/instance overwrite)
pgbouncer_hba_rules: # pgbouncer host-based authentication rules
- title: local password access
role: common
rules:
- local all all md5
- host all all 127.0.0.1/32 md5
- title: intranet password access
role: common
rules:
- host all all 10.0.0.0/8 md5
- host all all 172.16.0.0/12 md5
- host all all 192.168.0.0/16 md5
pgbouncer_hba_rules_extra: [] # extra pgbouncer hba rules (for cluster/instance overwrite)
# pg_users: [] # business users
# pg_databases: [] # business databases
#------------------------------------------------------------------------------
# MONITOR PROVISION
#------------------------------------------------------------------------------
# - install - #
exporter_install: none # none|yum|binary, none by default
exporter_repo_url: '' # if set, repo will be added to /etc/yum.repos.d/ before yum installation
# - collect - #
exporter_metrics_path: /metrics # default metric path for pg related exporter
# - node exporter - #
node_exporter_enabled: true # setup node_exporter on instance
node_exporter_port: 9100 # default port for node exporter
node_exporter_options: '--no-collector.softnet --collector.systemd --collector.ntp --collector.tcpstat --collector.processes'
# - pg exporter - #
pg_exporter_config: pg_exporter-demo.yaml # default config files for pg_exporter
pg_exporter_enabled: true # setup pg_exporter on instance
pg_exporter_port: 9630 # default port for pg exporter
pg_exporter_url: '' # optional, if not set, generate from reference parameters
# - pgbouncer exporter - #
pgbouncer_exporter_enabled: true # setup pgbouncer_exporter on instance (if you don't have pgbouncer, disable it)
pgbouncer_exporter_port: 9631 # default port for pgbouncer exporter
pgbouncer_exporter_url: '' # optional, if not set, generate from reference parameters
# - promtail - # # promtail is a beta feature which requires manual deployment
promtail_enabled: true # enable promtail logging collector?
promtail_clean: false # remove promtail status file? false by default
promtail_port: 9080 # default listen address for promtail
promtail_status_file: /tmp/promtail-status.yml
promtail_send_url: http://10.10.10.10:3100/loki/api/v1/push # loki url to receive logs
#------------------------------------------------------------------------------
# SERVICE PROVISION
#------------------------------------------------------------------------------
pg_weight: 100 # default load balance weight (instance level)
# - service - #
pg_services: # how to expose postgres service in cluster?
# primary service will route {ip|name}:5433 to primary pgbouncer (5433->6432 rw)
- name: primary # service name {{ pg_cluster }}-primary
src_ip: "*"
src_port: 5433
dst_port: pgbouncer # 5433 route to pgbouncer
check_url: /primary # primary health check, success when instance is primary
selector: "[]" # select all instance as primary service candidate
# replica service will route {ip|name}:5434 to replica pgbouncer (5434->6432 ro)
- name: replica # service name {{ pg_cluster }}-replica
src_ip: "*"
src_port: 5434
dst_port: pgbouncer
check_url: /read-only # read-only health check. (including primary)
selector: "[]" # select all instance as replica service candidate
selector_backup: "[? pg_role == `primary`]" # primary are used as backup server in replica service
# default service will route {ip|name}:5436 to primary postgres (5436->5432 primary)
- name: default # service's actual name is {{ pg_cluster }}-default
src_ip: "*" # service bind ip address, * for all, vip for cluster virtual ip address
src_port: 5436 # bind port, mandatory
dst_port: postgres # target port: postgres|pgbouncer|port_number , pgbouncer(6432) by default
check_method: http # health check method: only http is available for now
check_port: patroni # health check port: patroni|pg_exporter|port_number , patroni by default
check_url: /primary # health check url path, / as default
check_code: 200 # health check http code, 200 as default
selector: "[]" # instance selector
haproxy: # haproxy specific fields
maxconn: 3000 # default front-end connection
balance: roundrobin # load balance algorithm (roundrobin by default)
default_server_options: 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'
# offline service will route {ip|name}:5438 to offline postgres (5438->5432 offline)
- name: offline # service name {{ pg_cluster }}-offline
src_ip: "*"
src_port: 5438
dst_port: postgres
check_url: /replica # offline MUST be a replica
selector: "[? pg_role == `offline` || pg_offline_query ]" # instances with pg_role == 'offline' or instance marked with 'pg_offline_query == true'
selector_backup: "[? pg_role == `replica` && !pg_offline_query]" # replica are used as backup server in offline service
pg_services_extra: [] # extra services to be added
# - haproxy - #
haproxy_enabled: true # enable haproxy among every cluster members
haproxy_reload: true # reload haproxy after config
haproxy_admin_auth_enabled: false # enable authentication for haproxy admin?
haproxy_admin_username: admin # default haproxy admin username
haproxy_admin_password: admin # default haproxy admin password
haproxy_exporter_port: 9101 # default admin/exporter port
haproxy_client_timeout: 3h # client side connection timeout
haproxy_server_timeout: 3h # server side connection timeout
# - vip - #
vip_mode: none # none | l2 | l4
vip_reload: true # whether reload service after config
# vip_address: 127.0.0.1 # virtual ip address ip (l2 or l4)
# vip_cidrmask: 24 # virtual ip address cidr mask (l2 only)
# vip_interface: eth0 # virtual ip network interface (l2 only)
# - dns - # # NOT IMPLEMENTED
# dns_mode: vip # vip|all|selector: how to resolve cluster DNS?
# dns_selector: '[]' # if dns_mode == vip, filter instances been resolved
...
8.2 - 内核优化
Pigsty针对操作系统内核进行的参数调整
Pigsty使用tuned
调整操作系统配置,tuned
是CentOS7自带的调参工具。
Pigsty Tuned配置
Pigsty默认会为操作系统安装四种tuned profile
:
tuned-adm profile oltp # 启用OLTP模式
tuned-adm profile olap # 启用OLAP模式
tuned-adm profile crit # 启用CRIT模式
tuned-adm profile tiny # 启用TINY模式
Tuned基本操作
# 如需启动 tuned,请以 root 身份运行下列指令:
systemctl start tuned
# 若要在每次计算机启动时激活 tuned,请输入以下指令:
systemctl enable tuned
# 其它的 tuned 控制,例如配置文件选择等,请使用:
tuned-adm
# 若要查看可用的已安装配置文件,此命令需要 tuned 服务正在运行。
tuned-adm list
# 若要查看目前已激活的配置文件,请运行:
tuned-adm active
# 若要选择或激活某一配置文件,请运行:
tuned-adm profile profile
# 例如
tuned-adm profile powersave
# 若要让 tuned 推荐最适合您的系统的配置文件,同时不改变任何现有的配置文件,也不使用安装期间使用过的逻辑,请运行以下指令:
tuned-adm recommend
# 要禁用所有微调:
tuned-adm off
要列出所有可用配置文件并识别目前激活的配置文件,请运行:
tuned-adm list
要只显示当前激活的配置文件请运行:
tuned-adm active
要切换到某个可用的配置文件请运行:
tuned-adm profile profile_name
例如:
tuned-adm profile server-powersave
OLTP配置
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-06-29
# Desc : Tune operatiing system to oltp mode
# Path : /etc/tuned/oltp/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL OLTP System
include=network-latency
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}
# total shmem size in pages: $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}
# total shmem segs 4096 -> 8192
kernel.shmmni=8192
# total msg queue number, set to mem size in MB
kernel.msgmni=32768
# max length of message queue
kernel.msgmnb=65536
# max size of message
kernel.msgmax=65536
kernel.pid_max=131072
# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536
# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0
# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000
#-------------------------------------------------------------#
# VM #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0
# disable when most mem are for file cache
vm.zone_reclaim_mode=0
# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80
# vm.dirty_background_bytes=67108864 # 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_ratio=3 # latency-performance default
vm.dirty_ratio=10 # latency-performance default
# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536
#-------------------------------------------------------------#
# Filesystem #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160
# max concurrent unfinished async io, should be larger than 1M. 65536->1M
fs.aio-max-nr=1048576
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304
# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# max connection tracking number
net.netfilter.nf_conntrack_max=1048576
OLAP配置
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-09-18
# Desc : Tune operatiing system to olap mode
# Path : /etc/tuned/olap/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL OLAP System
include=network-throughput
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}
# total shmem size in pages: $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}
# total shmem segs 4096 -> 8192
kernel.shmmni=8192
# total msg queue number, set to mem size in MB
kernel.msgmni=32768
# max length of message queue
kernel.msgmnb=65536
# max size of message
kernel.msgmax=65536
kernel.pid_max=131072
# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536
# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0
# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000
#-------------------------------------------------------------#
# VM #
#-------------------------------------------------------------#
# try not using swap
# vm.swappiness=10
# disable when most mem are for file cache
vm.zone_reclaim_mode=0
# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80
vm.dirty_background_ratio = 10 # throughput-performance default
vm.dirty_ratio=80 # throughput-performance default 40 -> 80
# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536
#-------------------------------------------------------------#
# Filesystem #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160
# max concurrent unfinished async io, should be larger than 1M. 65536->1M
fs.aio-max-nr=1048576
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304
# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# max connection tracking number
net.netfilter.nf_conntrack_max=1048576
CRIT配置
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-06-29
# Desc : Tune operatiing system to crit mode
# Path : /etc/tuned/crit/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL CRIT System
include=network-latency
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}
# total shmem size in pages: $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}
# total shmem segs 4096 -> 8192
kernel.shmmni=8192
# total msg queue number, set to mem size in MB
kernel.msgmni=32768
# max length of message queue
kernel.msgmnb=65536
# max size of message
kernel.msgmax=65536
kernel.pid_max=131072
# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536
# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0
# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000
#-------------------------------------------------------------#
# VM #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0
# disable when most mem are for file cache
vm.zone_reclaim_mode=0
# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=100
# 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_bytes=67108864
# vm.dirty_background_ratio=3 # latency-performance default
vm.dirty_ratio=6 # latency-performance default
# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536
#-------------------------------------------------------------#
# Filesystem #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160
# max concurrent unfinished async io, should be larger than 1M. 65536->1M
fs.aio-max-nr=1048576
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304
# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# max connection tracking number
net.netfilter.nf_conntrack_max=1048576
TINY配置
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-06-29
# Desc : Tune operatiing system to tiny mode
# Path : /etc/tuned/tiny/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2019-2020 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL TINY System
# include=virtual-guest
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up. Workloads
# that mostly use file mappings may be able to use even higher values.
#
# The generator of dirty data starts writeback at this percentage (system default
# is 20%)
vm.dirty_ratio = 40
# Filesystem I/O is usually much more efficient than swapping, so try to keep
# swapping low. It's usually safe to go even lower than this on systems with
# server-grade storage.
vm.swappiness = 30
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
数据库内核调优参考
# Database kernel optimisation
fs.aio-max-nr = 1048576 # 限制并发未完成的异步请求数目,,不应小于1M
fs.file-max = 16777216 # 最大打开16M个文件
# kernel
kernel.shmmax = 485058 # 共享内存最大页面数量: $(expr $(getconf _PHYS_PAGES) / 2)
kernel.shmall = 1986797568 # 共享内存总大小: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
kernel.shmmni = 16384 # 系统范围内共享内存段的最大数量 4096 -> 16384
kernel.msgmni = 32768 # 系统的消息队列数目,影响可以启动的代理程序数 设为内存MB数
kernel.msgmnb = 65536 # 影响队列的大小
kernel.msgmax = 65536 # 影响队列中可以发送的消息的大小
kernel.numa_balancing = 0 # Numa禁用
kernel.sched_migration_cost_ns = 5000000 # 5ms内,调度认为进程还是Hot的。
kernel.sem = 2048 134217728 2048 65536 # 每个信号集最大信号量2048,系统总共可用信号量134217728,单次最大操作2048,信号集总数65536
# vm
vm.dirty_ratio = 80 # 绝对限制,超过80%阻塞写请求刷盘
vm.dirty_background_bytes = 268435456 # 256MB脏数据唤醒刷盘进程
vm.dirty_expire_centisecs = 6000 # 1分钟前的数据被认为需要刷盘
vm.dirty_writeback_centisecs= 500 # 刷新进程运行间隔5秒
vm.mmap_min_addr = 65536 # 禁止访问0x10000下的内存
vm.zone_reclaim_mode = 0 # Numa禁用
# vm swap
vm.swappiness = 0 # 禁用SWAP,但高水位仍会有
vm.overcommit_memory = 2 # 允许一定程度的Overcommit
vm.overcommit_ratio = 50 # 允许的Overcommit:$((($mem - $swap) * 100 / $mem))
# tcp memory
net.ipv4.tcp_rmem = 8192 65536 16777216 # tcp读buffer: 32M/256M/16G
net.ipv4.tcp_wmem = 8192 65536 16777216 # tcp写buffer: 32M/256M/16G
net.ipv4.tcp_mem = 131072 262144 16777216 # tcp 内存使用 512M/1G/16G
net.core.rmem_default = 262144 # 接受缓冲区默认大小: 256K
net.core.rmem_max = 4194304 # 接受缓冲区最大大小: 4M
net.core.wmem_default = 262144 # 发送缓冲区默认大小: 256K
net.core.wmem_max = 4194304 # 发送缓冲区最大大小: 4M
# tcp keepalive
net.ipv4.tcp_keepalive_intvl = 20 # 探测没有确认时,重新发送探测的频度。默认75s -> 20s
net.ipv4.tcp_keepalive_probes = 3 # 3 * 20 = 1分钟超时断开
net.ipv4.tcp_keepalive_time = 60 # 探活周期1分钟
# tcp port resure
net.ipv4.tcp_tw_reuse = 1 # 允许将TIME_WAIT socket用于新的TCP连接。默认为0
net.ipv4.tcp_tw_recycle = 0 # 快速回收,已弃用
net.ipv4.tcp_fin_timeout = 5 # 保持在FIN-WAIT-2状态的秒时间
net.ipv4.tcp_timestamps = 1
# tcp anti-flood
net.ipv4.tcp_syncookies = 1 # SYN_RECV队列满后发cookie,防止恶意攻击
net.ipv4.tcp_synack_retries = 1 # 收到不完整sync后的重试次数 5->2
net.ipv4.tcp_syn_retries = 1 #表示在内核放弃建立连接之前发送SYN包的数量。
# tcp load-balancer
net.ipv4.ip_forward = 1 # IP转发
net.ipv4.ip_nonlocal_bind = 1 # 绑定非本机地址
net.netfilter.nf_conntrack_max = 1048576 # 最大跟踪连接数
net.ipv4.ip_local_port_range = 10000 65535 # 端口范围
net.ipv4.tcp_max_tw_buckets = 262144 # 256k TIME_WAIT
net.core.somaxconn = 65535 # 限制LISTEN队列最大数据包量,触发重传机制。
net.ipv4.tcp_max_syn_backlog = 8192 # SYN队列大小:1024->8192
net.core.netdev_max_backlog = 8192 # 网卡收包快于内核时,允许队列长度
8.3 - 指标清单
Pigsty可用监控指标清单
下面是Pigsty目前可用的监控指标列表。
衍生指标的定义规则,请查阅 衍生指标 一节。
监控指标列表
name |
go_gc_duration_seconds |
go_gc_duration_seconds_count |
go_gc_duration_seconds_sum |
go_goroutines |
go_info |
go_memstats_alloc_bytes |
go_memstats_alloc_bytes_total |
go_memstats_buck_hash_sys_bytes |
go_memstats_frees_total |
go_memstats_gc_cpu_fraction |
go_memstats_gc_sys_bytes |
go_memstats_heap_alloc_bytes |
go_memstats_heap_idle_bytes |
go_memstats_heap_inuse_bytes |
go_memstats_heap_objects |
go_memstats_heap_released_bytes |
go_memstats_heap_sys_bytes |
go_memstats_last_gc_time_seconds |
go_memstats_lookups_total |
go_memstats_mallocs_total |
go_memstats_mcache_inuse_bytes |
go_memstats_mcache_sys_bytes |
go_memstats_mspan_inuse_bytes |
go_memstats_mspan_sys_bytes |
go_memstats_next_gc_bytes |
go_memstats_other_sys_bytes |
go_memstats_stack_inuse_bytes |
go_memstats_stack_sys_bytes |
go_memstats_sys_bytes |
go_threads |
haproxy_backend_active_servers |
haproxy_backend_backup_servers |
haproxy_backend_bytes_in_total |
haproxy_backend_bytes_out_total |
haproxy_backend_check_last_change_seconds |
haproxy_backend_check_up_down_total |
haproxy_backend_client_aborts_total |
haproxy_backend_connect_time_average_seconds |
haproxy_backend_connection_attempts_total |
haproxy_backend_connection_errors_total |
haproxy_backend_connection_reuses_total |
haproxy_backend_current_queue |
haproxy_backend_current_sessions |
haproxy_backend_downtime_seconds_total |
haproxy_backend_failed_header_rewriting_total |
haproxy_backend_http_cache_hits_total |
haproxy_backend_http_cache_lookups_total |
haproxy_backend_http_comp_bytes_bypassed_total |
haproxy_backend_http_comp_bytes_in_total |
haproxy_backend_http_comp_bytes_out_total |
haproxy_backend_http_comp_responses_total |
haproxy_backend_http_requests_total |
haproxy_backend_http_responses_total |
haproxy_backend_internal_errors_total |
haproxy_backend_last_session_seconds |
haproxy_backend_limit_sessions |
haproxy_backend_loadbalanced_total |
haproxy_backend_max_connect_time_seconds |
haproxy_backend_max_queue |
haproxy_backend_max_queue_time_seconds |
haproxy_backend_max_response_time_seconds |
haproxy_backend_max_session_rate |
haproxy_backend_max_sessions |
haproxy_backend_max_total_time_seconds |
haproxy_backend_queue_time_average_seconds |
haproxy_backend_redispatch_warnings_total |
haproxy_backend_requests_denied_total |
haproxy_backend_response_errors_total |
haproxy_backend_response_time_average_seconds |
haproxy_backend_responses_denied_total |
haproxy_backend_retry_warnings_total |
haproxy_backend_server_aborts_total |
haproxy_backend_sessions_total |
haproxy_backend_status |
haproxy_backend_total_time_average_seconds |
haproxy_backend_weight |
haproxy_frontend_bytes_in_total |
haproxy_frontend_bytes_out_total |
haproxy_frontend_connections_rate_max |
haproxy_frontend_connections_total |
haproxy_frontend_current_sessions |
haproxy_frontend_denied_connections_total |
haproxy_frontend_denied_sessions_total |
haproxy_frontend_failed_header_rewriting_total |
haproxy_frontend_http_cache_hits_total |
haproxy_frontend_http_cache_lookups_total |
haproxy_frontend_http_comp_bytes_bypassed_total |
haproxy_frontend_http_comp_bytes_in_total |
haproxy_frontend_http_comp_bytes_out_total |
haproxy_frontend_http_comp_responses_total |
haproxy_frontend_http_requests_rate_max |
haproxy_frontend_http_requests_total |
haproxy_frontend_http_responses_total |
haproxy_frontend_intercepted_requests_total |
haproxy_frontend_internal_errors_total |
haproxy_frontend_limit_session_rate |
haproxy_frontend_limit_sessions |
haproxy_frontend_max_session_rate |
haproxy_frontend_max_sessions |
haproxy_frontend_request_errors_total |
haproxy_frontend_requests_denied_total |
haproxy_frontend_responses_denied_total |
haproxy_frontend_sessions_total |
haproxy_frontend_status |
haproxy_process_active_peers |
haproxy_process_busy_polling_enabled |
haproxy_process_connected_peers |
haproxy_process_connections_total |
haproxy_process_current_backend_ssl_key_rate |
haproxy_process_current_connection_rate |
haproxy_process_current_connections |
haproxy_process_current_frontend_ssl_key_rate |
haproxy_process_current_run_queue |
haproxy_process_current_session_rate |
haproxy_process_current_ssl_connections |
haproxy_process_current_ssl_rate |
haproxy_process_current_tasks |
haproxy_process_current_zlib_memory |
haproxy_process_dropped_logs_total |
haproxy_process_frontent_ssl_reuse |
haproxy_process_hard_max_connections |
haproxy_process_http_comp_bytes_in_total |
haproxy_process_http_comp_bytes_out_total |
haproxy_process_idle_time_percent |
haproxy_process_jobs |
haproxy_process_limit_connection_rate |
haproxy_process_limit_http_comp |
haproxy_process_limit_session_rate |
haproxy_process_limit_ssl_rate |
haproxy_process_listeners |
haproxy_process_max_backend_ssl_key_rate |
haproxy_process_max_connection_rate |
haproxy_process_max_connections |
haproxy_process_max_fds |
haproxy_process_max_frontend_ssl_key_rate |
haproxy_process_max_memory_bytes |
haproxy_process_max_pipes |
haproxy_process_max_session_rate |
haproxy_process_max_sockets |
haproxy_process_max_ssl_connections |
haproxy_process_max_ssl_rate |
haproxy_process_max_zlib_memory |
haproxy_process_nbproc |
haproxy_process_nbthread |
haproxy_process_pipes_free_total |
haproxy_process_pipes_used_total |
haproxy_process_pool_allocated_bytes |
haproxy_process_pool_failures_total |
haproxy_process_pool_used_bytes |
haproxy_process_relative_process_id |
haproxy_process_requests_total |
haproxy_process_ssl_cache_lookups_total |
haproxy_process_ssl_cache_misses_total |
haproxy_process_ssl_connections_total |
haproxy_process_start_time_seconds |
haproxy_process_stopping |
haproxy_process_unstoppable_jobs |
haproxy_server_bytes_in_total |
haproxy_server_bytes_out_total |
haproxy_server_check_code |
haproxy_server_check_duration_seconds |
haproxy_server_check_failures_total |
haproxy_server_check_last_change_seconds |
haproxy_server_check_status |
haproxy_server_check_up_down_total |
haproxy_server_client_aborts_total |
haproxy_server_connect_time_average_seconds |
haproxy_server_connection_attempts_total |
haproxy_server_connection_errors_total |
haproxy_server_connection_reuses_total |
haproxy_server_current_queue |
haproxy_server_current_sessions |
haproxy_server_current_throttle |
haproxy_server_downtime_seconds_total |
haproxy_server_failed_header_rewriting_total |
haproxy_server_internal_errors_total |
haproxy_server_last_session_seconds |
haproxy_server_limit_sessions |
haproxy_server_loadbalanced_total |
haproxy_server_max_connect_time_seconds |
haproxy_server_max_queue |
haproxy_server_max_queue_time_seconds |
haproxy_server_max_response_time_seconds |
haproxy_server_max_session_rate |
haproxy_server_max_sessions |
haproxy_server_max_total_time_seconds |
haproxy_server_queue_limit |
haproxy_server_queue_time_average_seconds |
haproxy_server_redispatch_warnings_total |
haproxy_server_response_errors_total |
haproxy_server_response_time_average_seconds |
haproxy_server_responses_denied_total |
haproxy_server_retry_warnings_total |
haproxy_server_server_aborts_total |
haproxy_server_server_idle_connections_current |
haproxy_server_server_idle_connections_limit |
haproxy_server_sessions_total |
haproxy_server_status |
haproxy_server_total_time_average_seconds |
haproxy_server_weight |
node:cls:cpu_count |
node:cls:cpu_mode |
node:cls:cpu_usage |
node:cls:cpu_usage_avg5m |
node:cls:disk_io_rate |
node:cls:disk_iops |
node:cls:disk_read_iops |
node:cls:disk_read_rate |
node:cls:disk_write_iops |
node:cls:disk_write_rate |
node:cls:mem_usage |
node:cls:network_io |
node:cls:network_rx |
node:cls:network_tx |
node:cls:ntp_offset_range |
node:cls:sched_timeslicesa |
node:cpu:cpu_mode |
node:cpu:cpu_usage |
node:cpu:cpu_usage_avg5m |
node:cpu:sched_timeslices |
node:dev:disk_io_rate |
node:dev:disk_iops |
node:dev:disk_read_iops |
node:dev:disk_read_rate |
node:dev:disk_read_rt |
node:dev:disk_read_time |
node:dev:disk_write_iops |
node:dev:disk_write_rate |
node:dev:disk_write_rt |
node:dev:disk_write_time |
node:dev:network_io_rate |
node:dev:network_rx |
node:dev:network_tx |
node:fs:avail_bytes |
node:fs:free_bytes |
node:fs:free_inode |
node:fs:inode_usage |
node:fs:size_bytes |
node:fs:space_deriv_1h |
node:fs:space_exhaust |
node:fs:space_usage |
node:fs:total_inode |
node:ins:cpu_count |
node:ins:cpu_mode |
node:ins:cpu_usage |
node:ins:cpu_usage_avg5m |
node:ins:ctx_switch |
node:ins:disk_io_rate |
node:ins:disk_iops |
node:ins:disk_read_iops |
node:ins:disk_read_rate |
node:ins:disk_write_iops |
node:ins:disk_write_rate |
node:ins:fd_usage |
node:ins:forks |
node:ins:intrrupt |
node:ins:mem_app |
node:ins:mem_free |
node:ins:mem_usage |
node:ins:network_io |
node:ins:network_rx |
node:ins:network_tx |
node:ins:pagefault |
node:ins:pagein |
node:ins:pageout |
node:ins:sched_timeslices |
node:ins:stdload1 |
node:ins:stdload15 |
node:ins:stdload5 |
node:ins:swap_usage |
node:ins:swapin |
node:ins:swapout |
node:ins:tcp_active_opens |
node:ins:tcp_dropped |
node:ins:tcp_insegs |
node:ins:tcp_outsegs |
node:ins:tcp_overflow |
node:ins:tcp_overflow_rate |
node:ins:tcp_passive_opens |
node:ins:tcp_retrans_rate |
node:ins:tcp_retranssegs |
node:ins:tcp_segs |
node:uptime |
node_arp_entries |
node_boot_time_seconds |
node_context_switches_total |
node_cooling_device_cur_state |
node_cooling_device_max_state |
node_cpu_guest_seconds_total |
node_cpu_seconds_total |
node_disk_io_now |
node_disk_io_time_seconds_total |
node_disk_io_time_weighted_seconds_total |
node_disk_read_bytes_total |
node_disk_read_time_seconds_total |
node_disk_reads_completed_total |
node_disk_reads_merged_total |
node_disk_write_time_seconds_total |
node_disk_writes_completed_total |
node_disk_writes_merged_total |
node_disk_written_bytes_total |
node_entropy_available_bits |
node_exporter_build_info |
node_filefd_allocated |
node_filefd_maximum |
node_filesystem_avail_bytes |
node_filesystem_device_error |
node_filesystem_files |
node_filesystem_files_free |
node_filesystem_free_bytes |
node_filesystem_readonly |
node_filesystem_size_bytes |
node_forks_total |
node_intr_total |
node_ipvs_connections_total |
node_ipvs_incoming_bytes_total |
node_ipvs_incoming_packets_total |
node_ipvs_outgoing_bytes_total |
node_ipvs_outgoing_packets_total |
node_load1 |
node_load15 |
node_load5 |
node_memory_Active_anon_bytes |
node_memory_Active_bytes |
node_memory_Active_file_bytes |
node_memory_AnonHugePages_bytes |
node_memory_AnonPages_bytes |
node_memory_Bounce_bytes |
node_memory_Buffers_bytes |
node_memory_Cached_bytes |
node_memory_CmaFree_bytes |
node_memory_CmaTotal_bytes |
node_memory_CommitLimit_bytes |
node_memory_Committed_AS_bytes |
node_memory_DirectMap2M_bytes |
node_memory_DirectMap4k_bytes |
node_memory_Dirty_bytes |
node_memory_HardwareCorrupted_bytes |
node_memory_HugePages_Free |
node_memory_HugePages_Rsvd |
node_memory_HugePages_Surp |
node_memory_HugePages_Total |
node_memory_Hugepagesize_bytes |
node_memory_Inactive_anon_bytes |
node_memory_Inactive_bytes |
node_memory_Inactive_file_bytes |
node_memory_KernelStack_bytes |
node_memory_Mapped_bytes |
node_memory_MemAvailable_bytes |
node_memory_MemFree_bytes |
node_memory_MemTotal_bytes |
node_memory_Mlocked_bytes |
node_memory_NFS_Unstable_bytes |
node_memory_PageTables_bytes |
node_memory_Percpu_bytes |
node_memory_SReclaimable_bytes |
node_memory_SUnreclaim_bytes |
node_memory_Shmem_bytes |
node_memory_Slab_bytes |
node_memory_SwapCached_bytes |
node_memory_SwapFree_bytes |
node_memory_SwapTotal_bytes |
node_memory_Unevictable_bytes |
node_memory_VmallocChunk_bytes |
node_memory_VmallocTotal_bytes |
node_memory_VmallocUsed_bytes |
node_memory_WritebackTmp_bytes |
node_memory_Writeback_bytes |
node_netstat_Icmp6_InErrors |
node_netstat_Icmp6_InMsgs |
node_netstat_Icmp6_OutMsgs |
node_netstat_Icmp_InErrors |
node_netstat_Icmp_InMsgs |
node_netstat_Icmp_OutMsgs |
node_netstat_Ip6_InOctets |
node_netstat_Ip6_OutOctets |
node_netstat_IpExt_InOctets |
node_netstat_IpExt_OutOctets |
node_netstat_Ip_Forwarding |
node_netstat_TcpExt_ListenDrops |
node_netstat_TcpExt_ListenOverflows |
node_netstat_TcpExt_SyncookiesFailed |
node_netstat_TcpExt_SyncookiesRecv |
node_netstat_TcpExt_SyncookiesSent |
node_netstat_TcpExt_TCPSynRetrans |
node_netstat_Tcp_ActiveOpens |
node_netstat_Tcp_CurrEstab |
node_netstat_Tcp_InErrs |
node_netstat_Tcp_InSegs |
node_netstat_Tcp_OutSegs |
node_netstat_Tcp_PassiveOpens |
node_netstat_Tcp_RetransSegs |
node_netstat_Udp6_InDatagrams |
node_netstat_Udp6_InErrors |
node_netstat_Udp6_NoPorts |
node_netstat_Udp6_OutDatagrams |
node_netstat_Udp6_RcvbufErrors |
node_netstat_Udp6_SndbufErrors |
node_netstat_UdpLite6_InErrors |
node_netstat_UdpLite_InErrors |
node_netstat_Udp_InDatagrams |
node_netstat_Udp_InErrors |
node_netstat_Udp_NoPorts |
node_netstat_Udp_OutDatagrams |
node_netstat_Udp_RcvbufErrors |
node_netstat_Udp_SndbufErrors |
node_network_address_assign_type |
node_network_carrier |
node_network_carrier_changes_total |
node_network_device_id |
node_network_dormant |
node_network_flags |
node_network_iface_id |
node_network_iface_link |
node_network_iface_link_mode |
node_network_info |
node_network_mtu_bytes |
node_network_net_dev_group |
node_network_protocol_type |
node_network_receive_bytes_total |
node_network_receive_compressed_total |
node_network_receive_drop_total |
node_network_receive_errs_total |
node_network_receive_fifo_total |
node_network_receive_frame_total |
node_network_receive_multicast_total |
node_network_receive_packets_total |
node_network_transmit_bytes_total |
node_network_transmit_carrier_total |
node_network_transmit_colls_total |
node_network_transmit_compressed_total |
node_network_transmit_drop_total |
node_network_transmit_errs_total |
node_network_transmit_fifo_total |
node_network_transmit_packets_total |
node_network_transmit_queue_length |
node_network_up |
node_nf_conntrack_entries |
node_nf_conntrack_entries_limit |
node_ntp_leap |
node_ntp_offset_seconds |
node_ntp_reference_timestamp_seconds |
node_ntp_root_delay_seconds |
node_ntp_root_dispersion_seconds |
node_ntp_rtt_seconds |
node_ntp_sanity |
node_ntp_stratum |
node_power_supply_capacity |
node_power_supply_cyclecount |
node_power_supply_energy_full |
node_power_supply_energy_full_design |
node_power_supply_energy_watthour |
node_power_supply_info |
node_power_supply_online |
node_power_supply_power_watt |
node_power_supply_present |
node_power_supply_voltage_min_design |
node_power_supply_voltage_volt |
node_processes_max_processes |
node_processes_max_threads |
node_processes_pids |
node_processes_state |
node_processes_threads |
node_procs_blocked |
node_procs_running |
node_schedstat_running_seconds_total |
node_schedstat_timeslices_total |
node_schedstat_waiting_seconds_total |
node_scrape_collector_duration_seconds |
node_scrape_collector_success |
node_sockstat_FRAG6_inuse |
node_sockstat_FRAG6_memory |
node_sockstat_FRAG_inuse |
node_sockstat_FRAG_memory |
node_sockstat_RAW6_inuse |
node_sockstat_RAW_inuse |
node_sockstat_TCP6_inuse |
node_sockstat_TCP_alloc |
node_sockstat_TCP_inuse |
node_sockstat_TCP_mem |
node_sockstat_TCP_mem_bytes |
node_sockstat_TCP_orphan |
node_sockstat_TCP_tw |
node_sockstat_UDP6_inuse |
node_sockstat_UDPLITE6_inuse |
node_sockstat_UDPLITE_inuse |
node_sockstat_UDP_inuse |
node_sockstat_UDP_mem |
node_sockstat_UDP_mem_bytes |
node_sockstat_sockets_used |
node_systemd_socket_accepted_connections_total |
node_systemd_socket_current_connections |
node_systemd_system_running |
node_systemd_timer_last_trigger_seconds |
node_systemd_unit_state |
node_systemd_units |
node_systemd_version |
node_tcp_connection_states |
node_textfile_scrape_error |
node_time_seconds |
node_timex_estimated_error_seconds |
node_timex_frequency_adjustment_ratio |
node_timex_loop_time_constant |
node_timex_maxerror_seconds |
node_timex_offset_seconds |
node_timex_pps_calibration_total |
node_timex_pps_error_total |
node_timex_pps_frequency_hertz |
node_timex_pps_jitter_seconds |
node_timex_pps_jitter_total |
node_timex_pps_shift_seconds |
node_timex_pps_stability_exceeded_total |
node_timex_pps_stability_hertz |
node_timex_status |
node_timex_sync_status |
node_timex_tai_offset_seconds |
node_timex_tick_seconds |
node_udp_queues |
node_uname_info |
node_vmstat_pgfault |
node_vmstat_pgmajfault |
node_vmstat_pgpgin |
node_vmstat_pgpgout |
node_vmstat_pswpin |
node_vmstat_pswpout |
node_xfs_allocation_btree_compares_total |
node_xfs_allocation_btree_lookups_total |
node_xfs_allocation_btree_records_deleted_total |
node_xfs_allocation_btree_records_inserted_total |
node_xfs_block_map_btree_compares_total |
node_xfs_block_map_btree_lookups_total |
node_xfs_block_map_btree_records_deleted_total |
node_xfs_block_map_btree_records_inserted_total |
node_xfs_block_mapping_extent_list_compares_total |
node_xfs_block_mapping_extent_list_deletions_total |
node_xfs_block_mapping_extent_list_insertions_total |
node_xfs_block_mapping_extent_list_lookups_total |
node_xfs_block_mapping_reads_total |
node_xfs_block_mapping_unmaps_total |
node_xfs_block_mapping_writes_total |
node_xfs_directory_operation_create_total |
node_xfs_directory_operation_getdents_total |
node_xfs_directory_operation_lookup_total |
node_xfs_directory_operation_remove_total |
node_xfs_extent_allocation_blocks_allocated_total |
node_xfs_extent_allocation_blocks_freed_total |
node_xfs_extent_allocation_extents_allocated_total |
node_xfs_extent_allocation_extents_freed_total |
node_xfs_read_calls_total |
node_xfs_vnode_active_total |
node_xfs_vnode_allocate_total |
node_xfs_vnode_get_total |
node_xfs_vnode_hold_total |
node_xfs_vnode_reclaim_total |
node_xfs_vnode_release_total |
node_xfs_vnode_remove_total |
node_xfs_write_calls_total |
pg:all:active_backends |
pg:all:age |
pg:all:backends |
pg:all:buf_alloc |
pg:all:buf_flush |
pg:all:commits |
pg:all:commits_realtime |
pg:all:ixact_backends |
pg:all:lag_bytes |
pg:all:lag_seconds |
pg:all:qps_realtime |
pg:all:rollbacks |
pg:all:rollbacks_realtime |
pg:all:sessions |
pg:all:tps_realtime |
pg:all:tup_deleted |
pg:all:tup_inserted |
pg:all:tup_modified |
pg:all:tup_selected |
pg:all:tup_touched |
pg:all:tup_updated |
pg:all:wal_rate |
pg:all:xacts |
pg:all:xacts_avg30m |
pg:all:xacts_mu |
pg:all:xacts_realtime |
pg:all:xacts_sigma |
pg:cls:active_backends |
pg:cls:age |
pg:cls:backends |
pg:cls:buf_alloc |
pg:cls:buf_flush |
pg:cls:ckpt_1h |
pg:cls:commits |
pg:cls:commits_realtime |
pg:cls:ixact_backends |
pg:cls:lag_bytes |
pg:cls:lag_seconds |
pg:cls:leader |
pg:cls:load0 |
pg:cls:load1 |
pg:cls:load15 |
pg:cls:load5 |
pg:cls:lock_count |
pg:cls:locks |
pg:cls:primarys |
pg:cls:qps_realtime |
pg:cls:replicas |
pg:cls:rlock |
pg:cls:rollbacks |
pg:cls:rollbacks_realtime |
pg:cls:saturation0 |
pg:cls:saturation1 |
pg:cls:saturation15 |
pg:cls:saturation5 |
pg:cls:sessions |
pg:cls:size |
pg:cls:synchronous |
pg:cls:temp_bytes |
pg:cls:temp_files |
pg:cls:timeline |
pg:cls:tps_realtime |
pg:cls:tup_deleted |
pg:cls:tup_inserted |
pg:cls:tup_modified |
pg:cls:tup_selected |
pg:cls:tup_touched |
pg:cls:tup_updated |
pg:cls:wal_rate |
pg:cls:wlock |
pg:cls:xacts |
pg:cls:xacts_avg30m |
pg:cls:xacts_mu |
pg:cls:xacts_realtime |
pg:cls:xacts_sigma |
pg:cls:xlock |
pg:db:age_deriv_1h |
pg:db:age_exhaust |
pg:db:backends |
pg:db:blks_access_1m |
pg:db:blks_hit_1m |
pg:db:blks_read_1m |
pg:db:buffer_hit_rate |
pg:db:commits |
pg:db:commits_realtime |
pg:db:io_time_usage |
pg:db:lock_count |
pg:db:locks |
pg:db:pool_current_conn |
pg:db:pool_disabled |
pg:db:pool_max_conn |
pg:db:pool_paused |
pg:db:pool_reserve_size |
pg:db:pool_size |
pg:db:qps_realtime |
pg:db:read_time_usage |
pg:db:rlock |
pg:db:rollbacks |
pg:db:rollbacks_realtime |
pg:db:sessions |
pg:db:temp_bytes |
pg:db:temp_files |
pg:db:tps_realtime |
pg:db:tup_deleted |
pg:db:tup_inserted |
pg:db:tup_modified |
pg:db:tup_selected |
pg:db:tup_touched |
pg:db:tup_updated |
pg:db:wlock |
pg:db:write_time_usage |
pg:db:xacts |
pg:db:xacts_avg30m |
pg:db:xacts_mu |
pg:db:xacts_realtime |
pg:db:xacts_sigma |
pg:db:xlock |
pg:ins:active_backends |
pg:ins:age |
pg:ins:backends |
pg:ins:buf_alloc |
pg:ins:buf_flush |
pg:ins:buf_flush_backend |
pg:ins:buf_flush_checkpoint |
pg:ins:checkpoint_lsn |
pg:ins:ckpt_req |
pg:ins:ckpt_timed |
pg:ins:commits |
pg:ins:commits_realtime |
pg:ins:free_clients |
pg:ins:free_servers |
pg:ins:hit_rate |
pg:ins:ixact_backends |
pg:ins:lag_bytes |
pg:ins:lag_seconds |
pg:ins:last_ckpt |
pg:ins:load0 |
pg:ins:load1 |
pg:ins:load15 |
pg:ins:load5 |
pg:ins:lock_count |
pg:ins:locks |
pg:ins:login_clients |
pg:ins:pool_databases |
pg:ins:pool_users |
pg:ins:pools |
pg:ins:qps_realtime |
pg:ins:query_rt |
pg:ins:query_rt_avg30m |
pg:ins:query_rt_mu |
pg:ins:query_rt_sigma |
pg:ins:query_time_rate15m |
pg:ins:query_time_rate1m |
pg:ins:query_time_rate5m |
pg:ins:recv_init_lsn |
pg:ins:recv_init_tli |
pg:ins:recv_last_lsn |
pg:ins:recv_last_tli |
pg:ins:redo_lsn |
pg:ins:rlock |
pg:ins:rollbacks |
pg:ins:rollbacks_realtime |
pg:ins:saturation0 |
pg:ins:saturation1 |
pg:ins:saturation15 |
pg:ins:saturation5 |
pg:ins:sessions |
pg:ins:slot_retained_bytes |
pg:ins:temp_bytes |
pg:ins:temp_files |
pg:ins:tps_realtime |
pg:ins:tup_deleted |
pg:ins:tup_inserted |
pg:ins:tup_modified |
pg:ins:tup_selected |
pg:ins:tup_touched |
pg:ins:tup_updated |
pg:ins:used_clients |
pg:ins:wal_rate |
pg:ins:wlock |
pg:ins:xact_rt |
pg:ins:xact_rt_avg30m |
pg:ins:xact_rt_mu |
pg:ins:xact_rt_sigma |
pg:ins:xact_time_rate15m |
pg:ins:xact_time_rate1m |
pg:ins:xact_time_rate5m |
pg:ins:xacts |
pg:ins:xacts_avg30m |
pg:ins:xacts_mu |
pg:ins:xacts_realtime |
pg:ins:xacts_sigma |
pg:ins:xlock |
pg:query:call |
pg:query:rt |
pg:svc:active_backends |
pg:svc:backends |
pg:svc:buf_alloc |
pg:svc:buf_flush |
pg:svc:commits |
pg:svc:commits_realtime |
pg:svc:ixact_backends |
pg:svc:load0 |
pg:svc:load1 |
pg:svc:load15 |
pg:svc:load5 |
pg:svc:lock_count |
pg:svc:locks |
pg:svc:qps_realtime |
pg:svc:query_rt |
pg:svc:query_rt_avg30m |
pg:svc:query_rt_mu |
pg:svc:query_rt_sigma |
pg:svc:rlock |
pg:svc:rollbacks |
pg:svc:rollbacks_realtime |
pg:svc:sessions |
pg:svc:temp_bytes |
pg:svc:temp_files |
pg:svc:tps_realtime |
pg:svc:tup_deleted |
pg:svc:tup_inserted |
pg:svc:tup_modified |
pg:svc:tup_selected |
pg:svc:tup_touched |
pg:svc:tup_updated |
pg:svc:wlock |
pg:svc:xact_rt |
pg:svc:xact_rt_avg30m |
pg:svc:xact_rt_mu |
pg:svc:xact_rt_sigma |
pg:svc:xacts |
pg:svc:xacts_avg30m |
pg:svc:xacts_mu |
pg:svc:xacts_realtime |
pg:svc:xacts_sigma |
pg:svc:xlock |
pg_activity_count |
pg_activity_max_conn_duration |
pg_activity_max_duration |
pg_activity_max_tx_duration |
pg_backend_count |
pg_backup_time |
pg_bgwriter_buffers_alloc |
pg_bgwriter_buffers_backend |
pg_bgwriter_buffers_backend_fsync |
pg_bgwriter_buffers_checkpoint |
pg_bgwriter_buffers_clean |
pg_bgwriter_checkpoint_sync_time |
pg_bgwriter_checkpoint_write_time |
pg_bgwriter_checkpoints_req |
pg_bgwriter_checkpoints_timed |
pg_bgwriter_maxwritten_clean |
pg_bgwriter_stats_reset |
pg_boot_time |
pg_checkpoint_checkpoint_lsn |
pg_checkpoint_elapse |
pg_checkpoint_full_page_writes |
pg_checkpoint_newest_commit_ts_xid |
pg_checkpoint_next_multi_offset |
pg_checkpoint_next_multixact_id |
pg_checkpoint_next_oid |
pg_checkpoint_next_xid |
pg_checkpoint_next_xid_epoch |
pg_checkpoint_oldest_active_xid |
pg_checkpoint_oldest_commit_ts_xid |
pg_checkpoint_oldest_multi_dbid |
pg_checkpoint_oldest_multi_xid |
pg_checkpoint_oldest_xid |
pg_checkpoint_oldest_xid_dbid |
pg_checkpoint_prev_tli |
pg_checkpoint_redo_lsn |
pg_checkpoint_time |
pg_checkpoint_tli |
pg_class_relage |
pg_class_relpages |
pg_class_relsize |
pg_class_reltuples |
pg_conf_reload_time |
pg_database_age |
pg_database_allow_conn |
pg_database_conn_limit |
pg_database_frozen_xid |
pg_database_is_template |
pg_db_blk_read_time |
pg_db_blk_write_time |
pg_db_blks_access |
pg_db_blks_hit |
pg_db_blks_read |
pg_db_checksum_failures |
pg_db_checksum_last_failure |
pg_db_confl_bufferpin |
pg_db_confl_deadlock |
pg_db_confl_lock |
pg_db_confl_snapshot |
pg_db_confl_tablespace |
pg_db_conflicts |
pg_db_deadlocks |
pg_db_numbackends |
pg_db_stats_reset |
pg_db_temp_bytes |
pg_db_temp_files |
pg_db_tup_deleted |
pg_db_tup_fetched |
pg_db_tup_inserted |
pg_db_tup_modified |
pg_db_tup_returned |
pg_db_tup_updated |
pg_db_xact_commit |
pg_db_xact_rollback |
pg_db_xact_total |
pg_downstream_count |
pg_exporter_last_scrape_time |
pg_exporter_query_cache_ttl |
pg_exporter_query_scrape_duration |
pg_exporter_query_scrape_error_count |
pg_exporter_query_scrape_hit_count |
pg_exporter_query_scrape_metric_count |
pg_exporter_query_scrape_total_count |
pg_exporter_scrape_duration |
pg_exporter_scrape_error_count |
pg_exporter_scrape_total_count |
pg_exporter_server_scrape_duration |
pg_exporter_server_scrape_total_count |
pg_exporter_server_scrape_total_seconds |
pg_exporter_up |
pg_exporter_uptime |
pg_flush_lsn |
pg_func_calls |
pg_func_self_time |
pg_func_total_time |
pg_in_recovery |
pg_index_bloat_ratio |
pg_index_bloat_size |
pg_index_idx_blks_hit |
pg_index_idx_blks_read |
pg_index_idx_scan |
pg_index_idx_tup_fetch |
pg_index_idx_tup_read |
pg_insert_lsn |
pg_is_in_backup |
pg_is_in_recovery |
pg_is_primary |
pg_is_replica |
pg_is_wal_replay_paused |
pg_lag |
pg_last_replay_time |
pg_lock_count |
pg_lsn |
pg_meta_info |
pg_query_blk_io_time |
pg_query_calls |
pg_query_max_time |
pg_query_mean_time |
pg_query_min_time |
pg_query_rows |
pg_query_stddev_time |
pg_query_total_time |
pg_query_wal_bytes |
pg_receive_lsn |
pg_replay_lsn |
pg_setting_block_size |
pg_setting_data_checksums |
pg_setting_max_connections |
pg_setting_max_locks_per_transaction |
pg_setting_max_prepared_transactions |
pg_setting_max_replication_slots |
pg_setting_max_wal_senders |
pg_setting_max_worker_processes |
pg_setting_wal_log_hints |
pg_shmem_allocated_size |
pg_shmem_offset |
pg_shmem_size |
pg_size_bytes |
pg_slru_blks_exists |
pg_slru_blks_hit |
pg_slru_blks_read |
pg_slru_blks_written |
pg_slru_blks_zeroed |
pg_slru_flushes |
pg_slru_stats_reset |
pg_slru_truncates |
pg_status |
pg_sync_standby_disabled |
pg_sync_standby_enabled |
pg_table_analyze_count |
pg_table_autoanalyze_count |
pg_table_autovacuum_count |
pg_table_bloat_ratio |
pg_table_bloat_size |
pg_table_heap_blks_hit |
pg_table_heap_blks_read |
pg_table_idx_blks_hit |
pg_table_idx_blks_read |
pg_table_idx_scan |
pg_table_idx_tup_fetch |
pg_table_last_analyze |
pg_table_last_autoanalyze |
pg_table_last_autovacuum |
pg_table_last_vacuum |
pg_table_n_dead_tup |
pg_table_n_live_tup |
pg_table_n_mod_since_analyze |
pg_table_n_tup_del |
pg_table_n_tup_hot_upd |
pg_table_n_tup_ins |
pg_table_n_tup_mod |
pg_table_n_tup_upd |
pg_table_seq_scan |
pg_table_seq_tup_read |
pg_table_size_bytes |
pg_table_size_indexsize |
pg_table_size_relsize |
pg_table_size_toastsize |
pg_table_tbl_scan |
pg_table_tidx_blks_hit |
pg_table_tidx_blks_read |
pg_table_toast_blks_hit |
pg_table_toast_blks_read |
pg_table_tup_read |
pg_table_vacuum_count |
pg_timeline |
pg_timestamp |
pg_up |
pg_uptime |
pg_version |
pg_write_lsn |
pg_xact_xmax |
pg_xact_xmin |
pg_xact_xnum |
pgbouncer_database_current_connections |
pgbouncer_database_disabled |
pgbouncer_database_max_connections |
pgbouncer_database_paused |
pgbouncer_database_pool_size |
pgbouncer_database_reserve_pool |
pgbouncer_exporter_last_scrape_time |
pgbouncer_exporter_query_cache_ttl |
pgbouncer_exporter_query_scrape_duration |
pgbouncer_exporter_query_scrape_error_count |
pgbouncer_exporter_query_scrape_hit_count |
pgbouncer_exporter_query_scrape_metric_count |
pgbouncer_exporter_query_scrape_total_count |
pgbouncer_exporter_scrape_duration |
pgbouncer_exporter_scrape_error_count |
pgbouncer_exporter_scrape_total_count |
pgbouncer_exporter_server_scrape_duration |
pgbouncer_exporter_server_scrape_total_count |
pgbouncer_exporter_server_scrape_total_seconds |
pgbouncer_exporter_up |
pgbouncer_exporter_uptime |
pgbouncer_in_recovery |
pgbouncer_list_items |
pgbouncer_pool_active_clients |
pgbouncer_pool_active_servers |
pgbouncer_pool_idle_servers |
pgbouncer_pool_login_servers |
pgbouncer_pool_maxwait |
pgbouncer_pool_maxwait_us |
pgbouncer_pool_tested_servers |
pgbouncer_pool_used_servers |
pgbouncer_pool_waiting_clients |
pgbouncer_stat_avg_query_count |
pgbouncer_stat_avg_query_time |
pgbouncer_stat_avg_recv |
pgbouncer_stat_avg_sent |
pgbouncer_stat_avg_wait_time |
pgbouncer_stat_avg_xact_count |
pgbouncer_stat_avg_xact_time |
pgbouncer_stat_total_query_count |
pgbouncer_stat_total_query_time |
pgbouncer_stat_total_received |
pgbouncer_stat_total_sent |
pgbouncer_stat_total_wait_time |
pgbouncer_stat_total_xact_count |
pgbouncer_stat_total_xact_time |
pgbouncer_up |
pgbouncer_version |
process_cpu_seconds_total |
process_max_fds |
process_open_fds |
process_resident_memory_bytes |
process_start_time_seconds |
process_virtual_memory_bytes |
process_virtual_memory_max_bytes |
promhttp_metric_handler_errors_total |
promhttp_metric_handler_requests_in_flight |
promhttp_metric_handler_requests_total |
scrape_duration_seconds |
scrape_samples_post_metric_relabeling |
scrape_samples_scraped |
scrape_series_added |
up |
8.4 - 衍生指标
Pigsty衍生监控指标的定义详情
这里是Pigsty所有衍生指标的定义规则。
机器节点聚合指标
---
- name: node-rules
rules:
#==============================================================#
# Aliveness #
#==============================================================#
# TODO: change this to your node exporter port
- record: node_exporter_up
expr: up{instance=~".*:9099"}
- record: node:uptime
expr: time() - node_boot_time_seconds{}
#==============================================================#
# CPU #
#==============================================================#
# cpu mode time ratio
- record: node:cpu:cpu_mode
expr: irate(node_cpu_seconds_total{}[1m])
- record: node:ins:cpu_mode
expr: sum without (cpu) (node:cpu:cpu_mode)
- record: node:cls:cpu_mode
expr: sum by (cls, mode) (node:ins:cpu_mode)
# cpu schedule time-slices
- record: node:cpu:sched_timeslices
expr: irate(node_schedstat_timeslices_total{}[1m])
- record: node:ins:sched_timeslices
expr: sum without (cpu) (node:cpu:sched_timeslices)
- record: node:cls:sched_timeslicesa
expr: sum by (cls) (node:ins:sched_timeslices)
# cpu count
- record: node:ins:cpu_count
expr: count without (cpu) (node:cpu:cpu_usage)
- record: node:cls:cpu_count
expr: sum by (cls) (node:ins:cpu_count)
# cpu usage
- record: node:cpu:cpu_usage
expr: 1 - sum without (mode) (node:cpu:cpu_mode{mode="idle"})
- record: node:ins:cpu_usage
expr: sum without (cpu) (node:cpu:cpu_usage) / node:ins:cpu_count
- record: node:cls:cpu_usage
expr: sum by (cls) (node:ins:cpu_usage * node:ins:cpu_count) / sum by (cls) (node:ins:cpu_count)
# cpu usage avg5m
- record: node:cpu:cpu_usage_avg5m
expr: avg_over_time(node:cpu:cpu_usage[5m])
- record: node:ins:cpu_usage_avg5m
expr: avg_over_time(node:ins:cpu_usage[5m])
- record: node:cls:cpu_usage_avg5m
expr: avg_over_time(node:cls:cpu_usage[5m])
#==============================================================#
# Memory #
#==============================================================#
# mem usage
- record: node:ins:mem_app
expr: node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes - node_memory_Slab_bytes - node_memory_PageTables_bytes - node_memory_SwapCached_bytes
- record: node:ins:mem_free
expr: node_memory_MemFree_bytes{} + node_memory_Cached_bytes{}
- record: node:ins:mem_usage
expr: node:ins:mem_app / node_memory_MemTotal_bytes
- record: node:cls:mem_usage
expr: sum by (cls) (node:ins:mem_app) / sum by (cls) (node_memory_MemTotal_bytes)
- record: node:ins:swap_usage
expr: 1 - node_memory_SwapFree_bytes{} / node_memory_SwapTotal_bytes{}
#==============================================================#
# Disk #
#==============================================================#
# disk read iops
- record: node:dev:disk_read_iops
expr: irate(node_disk_reads_completed_total{device=~"[a-zA-Z-_]+"}[1m])
- record: node:ins:disk_read_iops
expr: sum without (device) (node:dev:disk_read_iops)
- record: node:cls:disk_read_iops
expr: sum by (cls) (node:ins:disk_read_iops)
# disk write iops
- record: node:dev:disk_write_iops
expr: irate(node_disk_writes_completed_total{device=~"[a-zA-Z-_]+"}[1m])
- record: node:ins:disk_write_iops
expr: sum without (device) (node:dev:disk_write_iops)
- record: node:cls:disk_write_iops
expr: sum by (cls) (node:ins:disk_write_iops)
# disk iops
- record: node:dev:disk_iops
expr: node:dev:disk_read_iops + node:dev:disk_write_iops
- record: node:ins:disk_iops
expr: node:ins:disk_read_iops + node:ins:disk_write_iops
- record: node:cls:disk_iops
expr: node:cls:disk_read_iops + node:cls:disk_write_iops
# read bandwidth (rate1m)
- record: node:dev:disk_read_rate
expr: rate(node_disk_read_bytes_total{device=~"[a-zA-Z-_]+"}[1m])
- record: node:ins:disk_read_rate
expr: sum without (device) (node:dev:disk_read_rate)
- record: node:cls:disk_read_rate
expr: sum by (cls) (node:ins:disk_read_rate)
# write bandwidth (rate1m)
- record: node:dev:disk_write_rate
expr: rate(node_disk_written_bytes_total{device=~"[a-zA-Z-_]+"}[1m])
- record: node:ins:disk_write_rate
expr: sum without (device) (node:dev:disk_write_rate)
- record: node:cls:disk_write_rate
expr: sum by (cls) (node:ins:disk_write_rate)
# io bandwidth (rate1m)
- record: node:dev:disk_io_rate
expr: node:dev:disk_read_rate + node:dev:disk_write_rate
- record: node:ins:disk_io_rate
expr: node:ins:disk_read_rate + node:ins:disk_write_rate
- record: node:cls:disk_io_rate
expr: node:cls:disk_read_rate + node:cls:disk_write_rate
# read/write total time
- record: node:dev:disk_read_time
expr: rate(node_disk_read_time_seconds_total{device=~"[a-zA-Z-_]+"}[1m])
- record: node:dev:disk_write_time
expr: rate(node_disk_read_time_seconds_total{device=~"[a-zA-Z-_]+"}[1m])
# read/write response time
- record: node:dev:disk_read_rt
expr: node:dev:disk_read_time / node:dev:disk_read_iops
- record: node:dev:disk_write_rt
expr: node:dev:disk_write_time / node:dev:disk_write_iops
- record: node:dev:disk_rt
expr: (node:dev:disk_read_time + node:dev:disk_write_time) / node:dev:iops
#==============================================================#
# Network #
#==============================================================#
# transmit bandwidth (out)
- record: node:dev:network_tx
expr: irate(node_network_transmit_bytes_total{}[1m])
- record: node:ins:network_tx
expr: sum without (device) (node:dev:network_tx{device!~"lo|bond.*"})
- record: node:cls:network_tx
expr: sum by (cls) (node:ins:network_tx)
# receive bandwidth (in)
- record: node:dev:network_rx
expr: irate(node_network_receive_bytes_total{}[1m])
- record: node:ins:network_rx
expr: sum without (device) (node:dev:network_rx{device!~"lo|bond.*"})
- record: node:cls:network_rx
expr: sum by (cls) (node:ins:network_rx)
# io bandwidth
- record: node:dev:network_io_rate
expr: node:dev:network_tx + node:dev:network_rx
- record: node:ins:network_io
expr: node:ins:network_tx + node:ins:network_rx
- record: node:cls:network_io
expr: node:cls:network_tx + node:cls:network_rx
#==============================================================#
# Schedule #
#==============================================================#
# normalized load
- record: node:ins:stdload1
expr: node_load1 / node:ins:cpu_count
- record: node:ins:stdload5
expr: node_load5 / node:ins:cpu_count
- record: node:ins:stdload15
expr: node_load15 / node:ins:cpu_count
# process
- record: node:ins:forks
expr: irate(node_forks_total[1m])
# interrupt & context switch
- record: node:ins:intrrupt
expr: irate(node_intr_total[1m])
- record: node:ins:ctx_switch
expr: irate(node_context_switches_total{}[1m])
#==============================================================#
# VM #
#==============================================================#
- record: node:ins:pagefault
expr: irate(node_vmstat_pgfault[1m])
- record: node:ins:pagein
expr: irate(node_vmstat_pgpgin[1m])
- record: node:ins:pageout
expr: irate(node_vmstat_pgpgout[1m])
- record: node:ins:swapin
expr: irate(node_vmstat_pswpin[1m])
- record: node:ins:swapout
expr: irate(node_vmstat_pswpout[1m])
#==============================================================#
# FS #
#==============================================================#
# filesystem space usage
- record: node:fs:free_bytes
expr: max without(device, fstype) (node_filesystem_free_bytes{fstype!~"(n|root|tmp)fs.*"})
- record: node:fs:avail_bytes
expr: max without(device, fstype) (node_filesystem_avail_bytes{fstype!~"(n|root|tmp)fs.*"})
- record: node:fs:size_bytes
expr: max without(device, fstype) (node_filesystem_size_bytes{fstype!~"(n|root|tmp)fs.*"})
- record: node:fs:space_usage
expr: 1 - (node:fs:avail_bytes{} / node:fs:size_bytes{})
- record: node:fs:free_inode
expr: max without(device, fstype) (node_filesystem_files_free{fstype!~"(n|root|tmp)fs.*"})
- record: node:fs:total_inode
expr: max without(device, fstype) (node_filesystem_files{fstype!~"(n|root|tmp)fs.*"})
# space delta and prediction
- record: node:fs:space_deriv_1h
expr: 0 - deriv(node_filesystem_avail_bytes{}[1h])
- record: node:fs:space_exhaust
expr: (node_filesystem_avail_bytes{} / node:fs:space_deriv_1h{}) > 0
# fs inode usage
- record: node:fs:inode_usage
expr: 1 - (node:fs:free_inode / node:fs:total_inode)
# file descriptor usage
- record: node:ins:fd_usage
expr: node_filefd_allocated / node_filefd_maximum
#==============================================================#
# TCP #
#==============================================================#
# tcp segments (rate1m)
- record: node:ins:tcp_insegs
expr: rate(node_netstat_Tcp_InSegs{}[1m])
- record: node:ins:tcp_outsegs
expr: rate(node_netstat_Tcp_OutSegs{}[1m])
- record: node:ins:tcp_retranssegs
expr: rate(node_netstat_Tcp_RetransSegs{}[1m])
- record: node:ins:tcp_segs
expr: node:ins:tcp_insegs + node:ins:tcp_outsegs
# retransmit
- record: node:ins:tcp_retrans_rate
expr: node:ins:tcp_retranssegs / node:ins:tcp_outsegs
# overflow
- record: node:ins:tcp_overflow_rate
expr: rate(node_netstat_TcpExt_ListenOverflows[1m])
#==============================================================#
# Netstat #
#==============================================================#
# tcp open (rate1m)
- record: node:ins:tcp_passive_opens
expr: rate(node_netstat_Tcp_PassiveOpens[1m])
- record: node:ins:tcp_active_opens
expr: rate(node_netstat_Tcp_ActiveOpens[1m])
# tcp close
- record: node:ins:tcp_attempt_fails
expr: rate(node_netstat_Tcp_AttemptFails[1m])
- record: node:ins:tcp_estab_resets
expr: rate(node_netstat_Tcp_EstabResets[1m])
# tcp drop
- record: node:ins:tcp_overflow
expr: rate(node_netstat_TcpExt_ListenOverflows[1m])
- record: node:ins:tcp_dropped
expr: rate(node_netstat_TcpExt_ListenDrops[1m])
#==============================================================#
# NTP #
#==============================================================#
- record: node:cls:ntp_offset_range
expr: max by (cls)(node_ntp_offset_seconds) - min by (cls)(node_ntp_offset_seconds)
...
数据库与连接池聚合指标
---
#==============================================================#
# File : pgsql.yml
# Ctime : 2020-04-22
# Mtime : 2020-12-03
# Desc : Record and alert rules for postgres
# Path : /etc/prometheus/rules/pgsql.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
groups:
################################################################
# PgSQL Rules #
################################################################
- name: pgsql-rules
rules:
#==============================================================#
# Aliveness #
#==============================================================#
# TODO: change these to your pg_exporter & pgbouncer_exporter port
- record: pg_exporter_up
expr: up{instance=~".*:9185"}
- record: pgbouncer_exporter_up
expr: up{instance=~".*:9127"}
#==============================================================#
# Identity #
#==============================================================#
- record: pg_is_primary
expr: 1 - pg_in_recovery
- record: pg_is_replica
expr: pg_in_recovery
- record: pg_status
expr: (pg_up{} * 2) + (1 - pg_in_recovery{})
# encoded: 0:replica[DOWN] 1:primary[DOWN] 2:replica 3:primary
#==============================================================#
# Age #
#==============================================================#
# age
- record: pg:ins:age
expr: max without (datname) (pg_database_age{datname!~"template[0-9]"})
- record: pg:cls:age
expr: max by (cls) (pg:ins:age)
- record: pg:all:age
expr: max(pg:cls:age)
# age derive and prediction
- record: pg:db:age_deriv_1h
expr: deriv(pg_database_age{}[1h])
- record: pg:db:age_exhaust
expr: (2147483648 - pg_database_age{}) / pg:db:age_deriv_1h
#==============================================================#
# Sessions #
#==============================================================#
# session count (by state)
- record: pg:db:sessions
expr: pg_activity_count
- record: pg:ins:sessions
expr: sum without (datname) (pg:db:sessions)
- record: pg:svc:sessions
expr: sum by (cls, role, state) (pg:ins:sessions)
- record: pg:cls:sessions
expr: sum by (cls, state) (pg:ins:sessions)
- record: pg:all:sessions
expr: sum by (state) (pg:cls:sessions)
# backends
- record: pg:db:backends
expr: pg_db_numbackends
- record: pg:ins:backends
expr: sum without (datname) (pg_db_numbackends)
- record: pg:svc:backends
expr: sum by (cls, role) (pg:ins:backends)
- record: pg:cls:backends
expr: sum by (cls) (pg:ins:backends)
- record: pg:all:backends
expr: sum(pg:cls:backends)
# active backends
- record: pg:ins:active_backends
expr: pg:ins:sessions{state="active"}
- record: pg:svc:active_backends
expr: sum by (cls, role) (pg:ins:active_backends)
- record: pg:cls:active_backends
expr: sum by (cls) (pg:ins:active_backends)
- record: pg:all:active_backends
expr: sum(pg:cls:active_backends)
# idle in xact backends (including abort)
- record: pg:ins:ixact_backends
expr: pg:ins:sessions{state=~"idle in.*"}
- record: pg:svc:ixact_backends
expr: sum by (cls, role) (pg:ins:active_backends)
- record: pg:cls:ixact_backends
expr: sum by (cls) (pg:ins:active_backends)
- record: pg:all:ixact_backends
expr: sum(pg:cls:active_backends)
#==============================================================#
# Servers (Pgbouncer) #
#==============================================================#
# active servers
- record: pg:pool:active_servers
expr: pgbouncer_pool_active_servers{datname!="pgbouncer"}
- record: pg:db:active_servers
expr: sum without(user) (pg:pool:active_servers)
- record: pg:ins:active_servers
expr: sum without(user, datname) (pg:pool:active_servers)
- record: pg:svc:active_servers
expr: sum by (cls, role) (pg:ins:active_servers)
- record: pg:cls:active_servers
expr: sum by (cls) (pg:ins:active_servers)
- record: pg:all:active_servers
expr: sum(pg:cls:active_servers)
# idle servers
- record: pg:pool:idle_servers
expr: pgbouncer_pool_idle_servers{datname!="pgbouncer"}
- record: pg:db:idle_servers
expr: sum without(user) (pg:pool:idle_servers)
- record: pg:ins:idle_servers
expr: sum without(user, datname) (pg:pool:idle_servers)
- record: pg:svc:idle_servers
expr: sum by (cls, role) (pg:ins:idle_servers)
- record: pg:cls:idle_servers
expr: sum by (cls) (pg:ins:idle_servers)
- record: pg:all:idle_servers
expr: sum(pg:cls:idle_servers)
# used servers
- record: pg:pool:used_servers
expr: pgbouncer_pool_used_servers{datname!="pgbouncer"}
- record: pg:db:used_servers
expr: sum without(user) (pg:pool:used_servers)
- record: pg:ins:used_servers
expr: sum without(user, datname) (pg:pool:used_servers)
- record: pg:svc:used_servers
expr: sum by (cls, role) (pg:ins:used_servers)
- record: pg:cls:used_servers
expr: sum by (cls) (pg:ins:used_servers)
- record: pg:all:used_servers
expr: sum(pg:cls:used_servers)
# tested servers
- record: pg:pool:tested_servers
expr: pgbouncer_pool_tested_servers{datname!="pgbouncer"}
- record: pg:db:tested_servers
expr: sum without(user) (pg:pool:tested_servers)
- record: pg:ins:tested_servers
expr: sum without(user, datname) (pg:pool:tested_servers)
- record: pg:svc:tested_servers
expr: sum by (cls, role) (pg:ins:tested_servers)
- record: pg:cls:tested_servers
expr: sum by (cls) (pg:ins:tested_servers)
- record: pg:all:tested_servers
expr: sum(pg:cls:tested_servers)
# login servers
- record: pg:pool:login_servers
expr: pgbouncer_pool_login_servers{datname!="pgbouncer"}
- record: pg:db:login_servers
expr: sum without(user) (pg:pool:login_servers)
- record: pg:ins:login_servers
expr: sum without(user, datname) (pg:pool:login_servers)
- record: pg:svc:login_servers
expr: sum by (cls, role) (pg:ins:login_servers)
- record: pg:cls:login_servers
expr: sum by (cls) (pg:ins:login_servers)
- record: pg:all:login_servers
expr: sum(pg:cls:login_servers)
#==============================================================#
# Clients (Pgbouncer) #
#==============================================================#
# active clients
- record: pg:pool:active_clients
expr: pgbouncer_pool_active_clients{datname!="pgbouncer"}
- record: pg:db:active_clients
expr: sum without(user) (pg:pool:active_clients)
- record: pg:ins:active_clients
expr: sum without(user, datname) (pg:pool:active_clients)
- record: pg:svc:active_clients
expr: sum by (cls, role) (pg:ins:active_clients)
- record: pg:cls:active_clients
expr: sum by (cls) (pg:ins:active_clients)
- record: pg:all:active_clients
expr: sum(pg:cls:active_clients)
# waiting clients
- record: pg:pool:waiting_clients
expr: pgbouncer_pool_waiting_clients{datname!="pgbouncer"}
- record: pg:db:waiting_clients
expr: sum without(user) (pg:pool:waiting_clients)
- record: pg:ins:waiting_clients
expr: sum without(user, datname) (pg:pool:waiting_clients)
- record: pg:svc:waiting_clients
expr: sum by (cls, role) (pg:ins:waiting_clients)
- record: pg:cls:waiting_clients
expr: sum by (cls) (pg:ins:waiting_clients)
- record: pg:all:waiting_clients
expr: sum(pg:cls:waiting_clients)
#==============================================================#
# Transactions #
#==============================================================#
# commits (realtime)
- record: pg:db:commits_realtime
expr: irate(pg_db_xact_commit{}[1m])
- record: pg:ins:commits_realtime
expr: sum without (datname) (pg:db:commits_realtime)
- record: pg:svc:commits_realtime
expr: sum by (cls, role) (pg:ins:commits_realtime)
- record: pg:cls:commits_realtime
expr: sum by (cls) (pg:ins:commits_realtime)
- record: pg:all:commits_realtime
expr: sum(pg:cls:commits_realtime)
# commits (rate1m)
- record: pg:db:commits
expr: rate(pg_db_xact_commit{}[1m])
- record: pg:ins:commits
expr: sum without (datname) (pg:db:commits)
- record: pg:svc:commits
expr: sum by (cls, role) (pg:ins:commits)
- record: pg:cls:commits
expr: sum by (cls) (pg:ins:commits)
- record: pg:all:commits
expr: sum(pg:cls:commits)
# rollbacks realtime
- record: pg:db:rollbacks_realtime
expr: irate(pg_db_xact_rollback{}[1m])
- record: pg:ins:rollbacks_realtime
expr: sum without (datname) (pg:db:rollbacks_realtime)
- record: pg:svc:rollbacks_realtime
expr: sum by (cls, role) (pg:ins:rollbacks_realtime)
- record: pg:cls:rollbacks_realtime
expr: sum by (cls) (pg:ins:rollbacks_realtime)
- record: pg:all:rollbacks_realtime
expr: sum(pg:cls:rollbacks_realtime)
# rollbacks
- record: pg:db:rollbacks
expr: rate(pg_db_xact_rollback{}[1m])
- record: pg:ins:rollbacks
expr: sum without (datname) (pg:db:rollbacks)
- record: pg:svc:rollbacks
expr: sum by (cls, role) (pg:ins:rollbacks)
- record: pg:cls:rollbacks
expr: sum by (cls) (pg:ins:rollbacks)
- record: pg:all:rollbacks
expr: sum(pg:cls:rollbacks)
# xacts (realtime)
- record: pg:db:xacts_realtime
expr: irate(pg_db_xact_commit{}[1m])
- record: pg:ins:xacts_realtime
expr: sum without (datname) (pg:db:xacts_realtime)
- record: pg:svc:xacts_realtime
expr: sum by (cls, role) (pg:ins:xacts_realtime)
- record: pg:cls:xacts_realtime
expr: sum by (cls) (pg:ins:xacts_realtime)
- record: pg:all:xacts_realtime
expr: sum(pg:cls:xacts_realtime)
# xacts (rate1m)
- record: pg:db:xacts
expr: rate(pg_db_xact_commit{}[1m])
- record: pg:ins:xacts
expr: sum without (datname) (pg:db:xacts)
- record: pg:svc:xacts
expr: sum by (cls, role) (pg:ins:xacts)
- record: pg:cls:xacts
expr: sum by (cls) (pg:ins:xacts)
- record: pg:all:xacts
expr: sum(pg:cls:xacts)
# xacts avg30m
- record: pg:db:xacts_avg30m
expr: avg_over_time(pg:db:xacts[30m])
- record: pg:ins:xacts_avg30m
expr: avg_over_time(pg:ins:xacts[30m])
- record: pg:svc:xacts_avg30m
expr: avg_over_time(pg:svc:xacts[30m])
- record: pg:cls:xacts_avg30m
expr: avg_over_time(pg:cls:xacts[30m])
- record: pg:all:xacts_avg30m
expr: avg_over_time(pg:all:xacts[30m])
# xacts µ
- record: pg:db:xacts_mu
expr: avg_over_time(pg:db:xacts_avg30m[30m])
- record: pg:ins:xacts_mu
expr: avg_over_time(pg:ins:xacts_avg30m[30m])
- record: pg:svc:xacts_mu
expr: avg_over_time(pg:svc:xacts_avg30m[30m])
- record: pg:cls:xacts_mu
expr: avg_over_time(pg:cls:xacts_avg30m[30m])
- record: pg:all:xacts_mu
expr: avg_over_time(pg:all:xacts_avg30m[30m])
# xacts σ: sigma
- record: pg:db:xacts_sigma
expr: stddev_over_time(pg:db:xacts[30m])
- record: pg:ins:xacts_sigma
expr: stddev_over_time(pg:ins:xacts[30m])
- record: pg:svc:xacts_sigma
expr: stddev_over_time(pg:svc:xacts[30m])
- record: pg:cls:xacts_sigma
expr: stddev_over_time(pg:cls:xacts[30m])
- record: pg:all:xacts_sigma
expr: stddev_over_time(pg:all:xacts[30m])
#==============================================================#
# TPS (Pgbouncer) #
#==============================================================#
# TPS realtime (irate1m)
- record: pg:db:tps_realtime
expr: irate(pgbouncer_stat_total_xact_count{}[1m])
- record: pg:ins:tps_realtime
expr: sum without(datname) (pg:db:tps_realtime{})
- record: pg:svc:tps_realtime
expr: sum by(cls, role) (pg:ins:tps_realtime{})
- record: pg:cls:tps_realtime
expr: sum by(cls) (pg:ins:tps_realtime{})
- record: pg:all:tps_realtime
expr: sum(pg:cls:tps_realtime{})
# TPS (rate1m)
- record: pg:db:tps
expr: pgbouncer_stat_avg_xact_count{datname!="pgbouncer"}
- record: pg:ins:tps
expr: sum without(datname) (pg:db:tps)
- record: pg:svc:tps
expr: sum by (cls, role) (pg:ins:tps)
- record: pg:cls:tps
expr: sum by(cls) (pg:ins:tps)
- record: pg:all:tps
expr: sum(pg:cls:tps)
# tps : avg30m
- record: pg:db:tps_avg30m
expr: avg_over_time(pg:db:tps[30m])
- record: pg:ins:tps_avg30m
expr: avg_over_time(pg:ins:tps[30m])
- record: pg:svc:tps_avg30m
expr: avg_over_time(pg:svc:tps[30m])
- record: pg:cls:tps_avg30m
expr: avg_over_time(pg:cls:tps[30m])
- record: pg:all:tps_avg30m
expr: avg_over_time(pg:all:tps[30m])
# tps µ
- record: pg:db:tps_mu
expr: avg_over_time(pg:db:tps_avg30m[30m])
- record: pg:ins:tps_mu
expr: avg_over_time(pg:ins:tps_avg30m[30m])
- record: pg:svc:tps_mu
expr: avg_over_time(pg:svc:tps_avg30m[30m])
- record: pg:cls:tps_mu
expr: avg_over_time(pg:cls:tps_avg30m[30m])
- record: pg:all:tps_mu
expr: avg_over_time(pg:all:tps_avg30m[30m])
# tps σ
- record: pg:db:tps_sigma
expr: stddev_over_time(pg:db:tps[30m])
- record: pg:ins:tps_sigma
expr: stddev_over_time(pg:ins:tps[30m])
- record: pg:svc:tps_sigma
expr: stddev_over_time(pg:svc:tps[30m])
- record: pg:cls:tps_sigma
expr: stddev_over_time(pg:cls:tps[30m])
- record: pg:all:tps_sigma
expr: stddev_over_time(pg:all:tps[30m])
# xact rt (rate1m)
- record: pg:db:xact_rt
expr: pgbouncer_stat_avg_xact_time{datname!="pgbouncer"} / 1000000
- record: pg:ins:xact_rt
expr: sum without(datname) (rate(pgbouncer_stat_total_xact_time[1m])) / sum without(datname) (rate(pgbouncer_stat_total_xact_count[1m])) / 1000000
- record: pg:svc:xact_rt
expr: sum by (cls, role) (rate(pgbouncer_stat_total_xact_time[1m])) / sum by (cls, role) (rate(pgbouncer_stat_total_xact_count[1m])) / 1000000
# xact_rt avg30m
- record: pg:db:xact_rt_avg30m
expr: avg_over_time(pg:db:xact_rt[30m])
- record: pg:ins:xact_rt_avg30m
expr: avg_over_time(pg:ins:xact_rt[30m])
- record: pg:svc:xact_rt_avg30m
expr: avg_over_time(pg:svc:xact_rt[30m])
# xact_rt µ
- record: pg:db:xact_rt_mu
expr: avg_over_time(pg:db:xact_rt_avg30m[30m])
- record: pg:ins:xact_rt_mu
expr: avg_over_time(pg:ins:xact_rt_avg30m[30m])
- record: pg:svc:xact_rt_mu
expr: avg_over_time(pg:svc:xact_rt_avg30m[30m])
# xact_rt σ: stddev30m
- record: pg:db:xact_rt_sigma
expr: stddev_over_time(pg:db:xact_rt[30m])
- record: pg:ins:xact_rt_sigma
expr: stddev_over_time(pg:ins:xact_rt[30m])
- record: pg:svc:xact_rt_sigma
expr: stddev_over_time(pg:svc:xact_rt[30m])
#==============================================================#
# QPS (Pgbouncer) #
#==============================================================#
# QPS realtime (irate1m)
- record: pg:db:qps_realtime
expr: irate(pgbouncer_stat_total_query_count{}[1m])
- record: pg:ins:qps_realtime
expr: sum without(datname) (pg:db:qps_realtime{})
- record: pg:svc:qps_realtime
expr: sum by(cls, role) (pg:ins:qps_realtime{})
- record: pg:cls:qps_realtime
expr: sum by(cls) (pg:ins:qps_realtime{})
- record: pg:all:qps_realtime
expr: sum(pg:cls:qps_realtime{})
# qps (rate1m)
- record: pg:db:qps
expr: pgbouncer_stat_avg_query_count{datname!="pgbouncer"}
- record: pg:ins:qps
expr: sum without(datname) (pg:db:qps)
- record: pg:svc:qps
expr: sum by (cls, role) (pg:ins:qps)
- record: pg:cls:qps
expr: sum by(cls) (pg:ins:qps)
- record: pg:all:qps
expr: sum(pg:cls:qps)
# qps avg30m
- record: pg:db:qps_avg30m
expr: avg_over_time(pg:db:qps[30m])
- record: pg:ins:qps_avg30m
expr: avg_over_time(pg:ins:qps[30m])
- record: pg:svc:qps_avg30m
expr: avg_over_time(pg:svc:qps[30m])
- record: pg:cls:qps_avg30m
expr: avg_over_time(pg:cls:qps[30m])
- record: pg:all:qps_avg30m
expr: avg_over_time(pg:all:qps[30m])
# qps µ
- record: pg:db:qps_mu
expr: avg_over_time(pg:db:qps_avg30m[30m])
- record: pg:ins:qps_mu
expr: avg_over_time(pg:ins:qps_avg30m[30m])
- record: pg:svc:qps_mu
expr: avg_over_time(pg:svc:qps_avg30m[30m])
- record: pg:cls:qps_mu
expr: avg_over_time(pg:cls:qps_avg30m[30m])
- record: pg:all:qps_mu
expr: avg_over_time(pg:all:qps_avg30m[30m])
# qps σ: stddev30m qps
- record: pg:db:qps_sigma
expr: stddev_over_time(pg:db:qps[30m])
- record: pg:ins:qps_sigma
expr: stddev_over_time(pg:ins:qps[30m])
- record: pg:svc:qps_sigma
expr: stddev_over_time(pg:svc:qps[30m])
- record: pg:cls:qps_sigma
expr: stddev_over_time(pg:cls:qps[30m])
- record: pg:all:qps_sigma
expr: stddev_over_time(pg:all:qps[30m])
# query rt (1m avg)
- record: pg:db:query_rt
expr: pgbouncer_stat_avg_query_time{datname!="pgbouncer"} / 1000000
- record: pg:ins:query_rt
expr: sum without(datname) (rate(pgbouncer_stat_total_query_time[1m])) / sum without(datname) (rate(pgbouncer_stat_total_query_count[1m])) / 1000000
- record: pg:svc:query_rt
expr: sum by (cls, role) (rate(pgbouncer_stat_total_query_time[1m])) / sum by (cls, role) (rate(pgbouncer_stat_total_query_count[1m])) / 1000000
# query_rt avg30m
- record: pg:db:query_rt_avg30m
expr: avg_over_time(pg:db:query_rt[30m])
- record: pg:ins:query_rt_avg30m
expr: avg_over_time(pg:ins:query_rt[30m])
- record: pg:svc:query_rt_avg30m
expr: avg_over_time(pg:svc:query_rt[30m])
# query_rt µ
- record: pg:db:query_rt_mu
expr: avg_over_time(pg:db:query_rt_avg30m[30m])
- record: pg:ins:query_rt_mu
expr: avg_over_time(pg:ins:query_rt_avg30m[30m])
- record: pg:svc:query_rt_mu
expr: avg_over_time(pg:svc:query_rt_avg30m[30m])
# query_rt σ: stddev30m
- record: pg:db:query_rt_sigma
expr: stddev_over_time(pg:db:query_rt[30m])
- record: pg:ins:query_rt_sigma
expr: stddev_over_time(pg:ins:query_rt[30m])
- record: pg:svc:query_rt_sigma
expr: stddev_over_time(pg:svc:query_rt[30m])
#==============================================================#
# PG Load #
#==============================================================#
# seconds spend on transaction in last minute
- record: pg:ins:xact_time_rate1m
expr: sum without (datname) (rate(pgbouncer_stat_total_xact_time{}[1m])) / 1000000
- record: pg:ins:xact_time_rate5m
expr: sum without (datname) (rate(pgbouncer_stat_total_xact_time{}[5m])) / 1000000
- record: pg:ins:xact_time_rate15m
expr: sum without (datname) (rate(pgbouncer_stat_total_xact_time{}[15m])) / 1000000
# seconds spend on queries in last minute
- record: pg:ins:query_time_rate1m
expr: sum without (datname) (rate(pgbouncer_stat_total_query_time{}[1m])) / 1000000
- record: pg:ins:query_time_rate5m
expr: sum without (datname) (rate(pgbouncer_stat_total_query_time{}[5m])) / 1000000
- record: pg:ins:query_time_rate15m
expr: sum without (datname) (rate(pgbouncer_stat_total_query_time{}[15m])) / 1000000
# instance level load
- record: pg:ins:load0
expr: sum without (datname) (irate(pgbouncer_stat_total_xact_time{}[1m])) / on (ip) group_left() node:ins:cpu_count / 1000000
- record: pg:ins:load1
expr: pg:ins:xact_time_rate1m / on (ip) group_left() node:ins:cpu_count
- record: pg:ins:load5
expr: pg:ins:xact_time_rate5m / on (ip) group_left() node:ins:cpu_count
- record: pg:ins:load15
expr: pg:ins:xact_time_rate15m / on (ip) group_left() node:ins:cpu_count
# service level load
- record: pg:svc:load0
expr: sum by (svc, cls, role) (irate(pgbouncer_stat_total_xact_time{}[1m])) / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
- record: pg:svc:load1
expr: sum by (svc, cls, role) (pg:ins:xact_time_rate1m) / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
- record: pg:svc:load5
expr: sum by (svc, cls, role) (pg:ins:xact_time_rate5m) / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
- record: pg:svc:load15
expr: sum by (svc, cls, role) (pg:ins:xact_time_rate15m) / on (svc) group_left() sum by (svc) (node:ins:cpu_count{}) / 1000000
# cluster level load
- record: pg:cls:load0
expr: sum by (cls) (irate(pgbouncer_stat_total_xact_time{}[1m])) / on (cls) node:cls:cpu_count{} / 1000000
- record: pg:cls:load1
expr: sum by (cls) (pg:ins:xact_time_rate1m) / on (cls) node:cls:cpu_count
- record: pg:cls:load5
expr: sum by (cls) (pg:ins:xact_time_rate5m) / on (cls) node:cls:cpu_count
- record: pg:cls:load15
expr: sum by (cls) (pg:ins:xact_time_rate15m) / on (cls) node:cls:cpu_count
#==============================================================#
# PG Saturation #
#==============================================================#
# max value of pg_load and cpu_usage
# instance level saturation
- record: pg:ins:saturation0
expr: pg:ins:load0 > node:ins:cpu_usage or node:ins:cpu_usage
- record: pg:ins:saturation1
expr: pg:ins:load1 > node:ins:cpu_usage or node:ins:cpu_usage
- record: pg:ins:saturation5
expr: pg:ins:load5 > node:ins:cpu_usage or node:ins:cpu_usage
- record: pg:ins:saturation15
expr: pg:ins:load15 > node:ins:cpu_usage or node:ins:cpu_usage
# cluster level saturation
- record: pg:cls:saturation0
expr: pg:cls:load0 > node:cls:cpu_usage or node:cls:cpu_usage
- record: pg:cls:saturation1
expr: pg:cls:load1 > node:cls:cpu_usage or node:cls:cpu_usage
- record: pg:cls:saturation5
expr: pg:cls:load5 > node:cls:cpu_usage or node:cls:cpu_usage
- record: pg:cls:saturation15
expr: pg:cls:load15 > node:cls:cpu_usage or node:cls:cpu_usage
#==============================================================#
# CRUD #
#==============================================================#
# rows touched
- record: pg:db:tup_touched
expr: irate(pg_db_tup_fetched{}[1m])
- record: pg:ins:tup_touched
expr: sum without(datname) (pg:db:tup_touched)
- record: pg:svc:tup_touched
expr: sum by (cls, role) (pg:ins:tup_touched)
- record: pg:cls:tup_touched
expr: sum by (cls) (pg:ins:tup_touched)
- record: pg:all:tup_touched
expr: sum(pg:cls:tup_touched)
# selected
- record: pg:db:tup_selected
expr: irate(pg_db_tup_returned{}[1m])
- record: pg:ins:tup_selected
expr: sum without(datname) (pg:db:tup_selected)
- record: pg:svc:tup_selected
expr: sum by (cls, role) (pg:ins:tup_selected)
- record: pg:cls:tup_selected
expr: sum by (cls) (pg:ins:tup_selected)
- record: pg:all:tup_selected
expr: sum(pg:cls:tup_selected)
# inserted
- record: pg:db:tup_inserted
expr: irate(pg_db_tup_inserted{}[1m])
- record: pg:ins:tup_inserted
expr: sum without(datname) (pg:db:tup_inserted)
- record: pg:svc:tup_inserted
expr: sum by (cls, role) (pg:ins:tup_inserted)
- record: pg:cls:tup_inserted
expr: sum by (cls) (pg:ins:tup_inserted{role="primary"})
- record: pg:all:tup_inserted
expr: sum(pg:cls:tup_inserted)
# updated
- record: pg:db:tup_updated
expr: irate(pg_db_tup_updated{}[1m])
- record: pg:ins:tup_updated
expr: sum without(datname) (pg:db:tup_updated)
- record: pg:svc:tup_updated
expr: sum by (cls, role) (pg:ins:tup_updated)
- record: pg:cls:tup_updated
expr: sum by (cls) (pg:ins:tup_updated{role="primary"})
- record: pg:all:tup_updated
expr: sum(pg:cls:tup_updated)
# deleted
- record: pg:db:tup_deleted
expr: irate(pg_db_tup_deleted{}[1m])
- record: pg:ins:tup_deleted
expr: sum without(datname) (pg:db:tup_deleted)
- record: pg:svc:tup_deleted
expr: sum by (cls, role) (pg:ins:tup_deleted)
- record: pg:cls:tup_deleted
expr: sum by (cls) (pg:ins:tup_deleted{role="primary"})
- record: pg:all:tup_deleted
expr: sum(pg:cls:tup_deleted)
# modified
- record: pg:db:tup_modified
expr: irate(pg_db_tup_modified{}[1m])
- record: pg:ins:tup_modified
expr: sum without(datname) (pg:db:tup_modified)
- record: pg:svc:tup_modified
expr: sum by (cls, role) (pg:ins:tup_modified)
- record: pg:cls:tup_modified
expr: sum by (cls) (pg:ins:tup_modified{role="primary"})
- record: pg:all:tup_modified
expr: sum(pg:cls:tup_deleted)
#==============================================================#
# Object Access #
#==============================================================#
# table access
- record: pg:table:idx_scan
expr: rate(pg_table_idx_scan{}[1m])
- record: pg:table:seq_scan
expr: rate(pg_table_seq_scan{}[1m])
- record: pg:table:qps_realtime
expr: irate(pg_table_idx_scan{}[1m])
# index access
- record: pg:index:idx_scan
expr: rate(pg_index_idx_scan{}[1m])
- record: pg:index:qps_realtime
expr: irate(pg_index_idx_scan{}[1m])
# func access
- record: pg:func:call
expr: rate(pg_func_calls{}[1m])
- record: pg:func:rt
expr: rate(pg_func_total_time{}[1m]) / pg:func:call
# query access
- record: pg:query:call
expr: rate(pg_query_calls{}[1m])
- record: pg:query:rt
expr: rate(pg_query_total_time{}[1m]) / pg:query:call / 1000
#==============================================================#
# Blocks IO #
#==============================================================#
# blocks read/hit/access in 1min
- record: pg:db:blks_read_1m
expr: increase(pg_db_blks_read{}[1m])
- record: pg:db:blks_hit_1m
expr: increase(pg_db_blks_hit{}[1m])
- record: pg:db:blks_access_1m
expr: increase(pg_db_blks_access{}[1m])
# buffer hit rate (1m)
- record: pg:db:buffer_hit_rate
expr: pg:db:blks_hit_1m / pg:db:blks_access_1m
- record: pg:ins:hit_rate
expr: sum without(datname) (pg:db:blks_hit_1m) / sum without(datname) (pg:db:blks_access_1m)
# read/write time usage
- record: pg:db:read_time_usage
expr: rate(pg_db_blk_read_time[1m])
- record: pg:db:write_time_usage
expr: rate(pg_db_blk_write_time[1m])
- record: pg:db:io_time_usage
expr: pg:db:read_time_usage + pg:db:write_time_usage
#==============================================================#
# Traffic IO (Pgbouncer) #
#==============================================================#
# transmit bandwidth (sent, out)
- record: pg:db:tx
expr: irate(pgbouncer_stat_total_sent{datname!="pgbouncer"}[1m])
- record: pg:ins:tx
expr: sum without (user, datname) (pg:db:tx)
- record: pg:svc:tx
expr: sum by (cls, role) (pg:ins:tx)
- record: pg:cls:tx
expr: sum by (cls) (pg:ins:tx)
- record: pg:all:tx
expr: sum(pg:cls:tx)
# receive bandwidth (sent, out)
- record: pg:db:rx
expr: irate(pgbouncer_stat_total_received{datname!="pgbouncer"}[1m])
- record: pg:ins:rx
expr: sum without (datname) (pg:db:rx)
- record: pg:svc:rx
expr: sum by (cls, role) (pg:ins:rx)
- record: pg:cls:rx
expr: sum by (cls) (pg:ins:rx)
- record: pg:all:rx
expr: sum(pg:cls:rx)
#==============================================================#
# Lock #
#==============================================================#
# lock count by mode
- record: pg:db:locks
expr: pg_lock_count
- record: pg:ins:locks
expr: sum without(datname) (pg:db:locks)
- record: pg:svc:locks
expr: sum by (cls, role, mode) (pg:ins:locks)
- record: pg:cls:locks
expr: sum by (cls, mode) (pg:ins:locks)
# total lock count
- record: pg:db:lock_count
expr: sum without (mode) (pg_lock_count{})
- record: pg:ins:lock_count
expr: sum without(datname) (pg:db:lock_count)
- record: pg:svc:lock_count
expr: sum by (cls, role) (pg:ins:lock_count)
- record: pg:cls:lock_count
expr: sum by (cls) (pg:ins:lock_count)
# read category lock
- record: pg:db:rlock
expr: sum without (mode) (pg_lock_count{mode="AccessShareLock"})
- record: pg:ins:rlock
expr: sum without(datname) (pg:db:rlock)
- record: pg:svc:rlock
expr: sum by (cls, role) (pg:ins:rlock)
- record: pg:cls:rlock
expr: sum by (cls) (pg:ins:rlock)
# write category lock (insert|update|delete)
- record: pg:db:wlock
expr: sum without (mode) (pg_lock_count{mode=~"RowShareLock|RowExclusiveLock"})
- record: pg:ins:wlock
expr: sum without(datname) (pg:db:wlock)
- record: pg:svc:wlock
expr: sum by (cls, role) (pg:ins:wlock)
- record: pg:cls:wlock
expr: sum by (cls) (pg:ins:wlock)
# exclusive category lock
- record: pg:db:xlock
expr: sum without (mode) (pg_lock_count{mode=~"AccessExclusiveLock|ExclusiveLock|ShareRowExclusiveLock|ShareLock|ShareUpdateExclusiveLock"})
- record: pg:ins:xlock
expr: sum without(datname) (pg:db:xlock)
- record: pg:svc:xlock
expr: sum by (cls, role) (pg:ins:xlock)
- record: pg:cls:xlock
expr: sum by (cls) (pg:ins:xlock)
#==============================================================#
# Temp #
#==============================================================#
# temp files and bytes
- record: pg:db:temp_bytes
expr: rate(pg_db_temp_bytes{}[1m])
- record: pg:ins:temp_bytes
expr: sum without(datname) (pg:db:temp_bytes)
- record: pg:svc:temp_bytes
expr: sum by (cls, role) (pg:ins:temp_bytes)
- record: pg:cls:temp_bytes
expr: sum by (cls) (pg:ins:temp_bytes)
# temp file count in last 1m
- record: pg:db:temp_files
expr: increase(pg_db_temp_files{}[1m])
- record: pg:ins:temp_files
expr: sum without(datname) (pg:db:temp_files)
- record: pg:svc:temp_files
expr: sum by (cls, role) (pg:ins:temp_files)
- record: pg:cls:temp_files
expr: sum by (cls) (pg:ins:temp_files)
#==============================================================#
# Size #
#==============================================================#
# database size
- record: pg:ins:db_size
expr: pg_size_database
- record: pg:cls:db_size
expr: sum by (cls) (pg:ins:db_size)
# wal size
- record: pg:ins:wal_size
expr: pg_size_wal
- record: pg:cls:wal_size
expr: sum by (cls) (pg:ins:wal_size)
# log size
- record: pg:ins:log_size
expr: pg_size_log
- record: pg:cls:log_size
expr: sum by (cls) (pg_size_log)
#==============================================================#
# Checkpoint #
#==============================================================#
# checkpoint stats
- record: pg:ins:last_ckpt
expr: pg_checkpoint_elapse
- record: pg:ins:ckpt_timed
expr: increase(pg_bgwriter_checkpoints_timed{}[30s])
- record: pg:ins:ckpt_req
expr: increase(pg_bgwriter_checkpoints_req{}[30s])
- record: pg:cls:ckpt_1h
expr: increase(pg:ins:ckpt_timed[1h]) + increase(pg:ins:ckpt_req[1h])
# buffer flush & alloc
- record: pg:ins:buf_flush_backend
expr: irate(pg_bgwriter_buffers_backend{}[1m]) * 8192
- record: pg:ins:buf_flush_checkpoint
expr: irate(pg_bgwriter_buffers_checkpoint{}[1m]) * 8192
- record: pg:ins:buf_flush
expr: pg:ins:buf_flush_backend + pg:ins:buf_flush_checkpoint
- record: pg:svc:buf_flush
expr: sum by (cls, role) (pg:ins:buf_flush)
- record: pg:cls:buf_flush
expr: sum by (cls) (pg:ins:buf_flush)
- record: pg:all:buf_flush
expr: sum(pg:cls:buf_flush)
- record: pg:ins:buf_alloc
expr: irate(pg_bgwriter_buffers_alloc{}[1m]) * 8192
- record: pg:svc:buf_alloc
expr: sum by (cls, role) (pg:ins:buf_alloc)
- record: pg:cls:buf_alloc
expr: sum by (cls) (pg:ins:buf_alloc)
- record: pg:all:buf_alloc
expr: sum(pg:cls:buf_alloc)
#==============================================================#
# LSN #
#==============================================================#
# timeline & LSN
- record: pg_timeline
expr: pg_checkpoint_tli
- record: pg:ins:redo_lsn
expr: pg_checkpoint_redo_lsn
- record: pg:ins:checkpoint_lsn
expr: pg_checkpoint_checkpoint_lsn
# wal rate
- record: pg:ins:wal_rate
expr: rate(pg_lsn[1m])
- record: pg:cls:wal_rate
expr: max by (cls) (pg:ins:wal_rate{role="primary"})
- record: pg:all:wal_rate
expr: sum(pg:cls:wal_rate)
#==============================================================#
# Replication #
#==============================================================#
# lag time from replica's view
- record: pg:ins:lag_seconds
expr: pg_lag
- record: pg:cls:lag_seconds
expr: max by (cls) (pg:ins:lag_seconds)
- record: pg:all:lag_seconds
expr: max(pg:cls:lag_seconds)
# sync status
- record: pg:ins:sync_status # application_name must set to replica ins name
expr: max by (ins, svc, cls) (label_replace(pg_replication_sync_status, "ins", "$1", "application_name", "(.+)"))
# lag of self (application_name must set to standby ins name)
- record: pg:ins:lag_bytes
expr: max by (ins, svc, cls, role) (label_replace(pg_replication_lsn{} - pg_replication_replay_lsn{}, "ins", "$1", "application_name", "(.+)"))
- record: pg:cls:lag_bytes
expr: max by (cls) (pg:ins:lag_bytes)
- record: pg:all:lag_bytes
expr: max(pg:cls:lag_bytes)
# replication slot retained bytes
- record: pg:ins:slot_retained_bytes
expr: pg_slot_retained_bytes
# replica walreceiver
- record: pg:ins:recv_init_lsn
expr: pg_walreceiver_init_lsn
- record: pg:ins:recv_last_lsn
expr: pg_walreceiver_last_lsn
- record: pg:ins:recv_init_tli
expr: pg_walreceiver_init_tli
- record: pg:ins:recv_last_tli
expr: pg_walreceiver_last_tli
#==============================================================#
# Cluster Level Metrics
#==============================================================#
# cluster member count
- record: pg:cls:leader
expr: count by (cls, ins) (max by (cls, ins) (pg_status{}) == 3)
- record: pg:cls:size
expr: count by (cls) (max by (cls, ins) (pg_up{}))
- record: pg:cls:timeline
expr: max by (cls) (pg_checkpoint_tli{})
- record: pg:cls:primarys
expr: count by (cls) (max by (cls, ins) (pg_in_recovery{}) == 0)
- record: pg:cls:replicas
expr: count by (cls) (max by (cls, ins) (pg_in_recovery{}) == 1)
- record: pg:cls:synchronous
expr: max by (cls) (pg_sync_standby_enabled) > bool 0
- record: pg:cls:bridging_instances
expr: count by (cls, role, ins, ip) (pg_replication_lsn{state="streaming", role!="primary"} > 0)
- record: pg:cls:bridging
expr: count by (cls) (pg:cls:bridging_instances)
- record: pg:cls:cascading
expr: count by (cls) (pg_replication_lsn{state="streaming", role!="primary"})
#==============================================================#
# Pgbouncer List #
#==============================================================#
# object list
- record: pg:ins:pools
expr: pgbouncer_list_items{list="pools"}
- record: pg:ins:pool_databases
expr: pgbouncer_list_items{list="databases"}
- record: pg:ins:pool_users
expr: pgbouncer_list_items{list="users"}
- record: pg:ins:login_clients
expr: pgbouncer_list_items{list="login_clients"}
- record: pg:ins:free_clients
expr: pgbouncer_list_items{list="free_clients"}
- record: pg:ins:used_clients
expr: pgbouncer_list_items{list="used_clients"}
- record: pg:ins:free_servers
expr: pgbouncer_list_items{list="free_servers"}
#==============================================================#
# DBConfig (Pgbouncer) #
#==============================================================#
- record: pg:db:pool_max_conn
expr: pgbouncer_database_pool_size{datname!="pgbouncer"} + pgbouncer_database_reserve_pool{datname!="pgbouncer"}
- record: pg:db:pool_size
expr: pgbouncer_database_pool_size{datname!="pgbouncer"}
- record: pg:db:pool_reserve_size
expr: pgbouncer_database_reserve_pool{datname!="pgbouncer"}
- record: pg:db:pool_current_conn
expr: pgbouncer_database_current_connections{datname!="pgbouncer"}
- record: pg:db:pool_paused
expr: pgbouncer_database_paused{datname!="pgbouncer"}
- record: pg:db:pool_disabled
expr: pgbouncer_database_disabled{datname!="pgbouncer"}
#==============================================================#
# Waiting (Pgbouncer) #
#==============================================================#
# average wait time
- record: pg:db:wait_rt
expr: pgbouncer_stat_avg_wait_time{datname!="pgbouncer"} / 1000000
# max wait time among all clients
- record: pg:pool:maxwait
expr: pgbouncer_pool_maxwait{datname!="pgbouncer"} + pgbouncer_pool_maxwait_us{datname!="pgbouncer"} / 1000000
- record: pg:db:maxwait
expr: max without(user) (pg:pool:maxwait)
- record: pg:ins:maxwait
expr: max without(user, datname) (pg:db:maxwait)
- record: pg:svc:maxwait
expr: max by (cls, role) (pg:ins:maxwait)
- record: pg:cls:maxwait
expr: max by (cls) (pg:ins:maxwait)
- record: pg:all:maxwait
expr: max(pg:cls:maxwait)
...
8.5 - 报警规则
Pigsty报警规则定义
Prometheus报警规则
机器节点报警规则
################################################################
# Node Alert #
################################################################
- name: node-alert
rules:
# node exporter down for 1m triggers a P1 alert
- alert: NODE_EXPORTER_DOWN
expr: up{instance=~"^.*:(9100)$"} == 0
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node Exporter Down: {{ $labels.ins }} {{ $value }}"
description: |
up[instance={{ $labels.instance }}] = {{ $value }} == 0
https://dba.p1staff.com/d/node?var-ip={{ $labels.instance }}&from=now-5m&to=now&refresh=10s
#==============================================================#
# CPU & Load #
#==============================================================#
# node avg CPU usage > 90% for 1m
- alert: NODE_CPU_HIGH
expr: node:ins:cpu_usage > 0.90
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node CPU High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:ins:cpu_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=28&fullscreen&var-ip={{ $labels.ip }}
# node load5 > 100%
- alert: NODE_LOAD_HIGH
expr: node:ins:stdload5 > 1
for: 3m
labels:
severity: P2
annotations:
summary: "P2 Node Load High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:ins:stdload5[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 100%
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=37&fullscreen&var-ip={{ $labels.ip }}
#==============================================================#
# Disk & Filesystem #
#==============================================================#
# main fs readonly triggers an immediate P0 alert
- alert: NODE_FS_READONLY
expr: node_filesystem_readonly{fstype!~"(n|root|tmp)fs.*"} == 1
labels:
severity: P0
annotations:
summary: "P0 Node Filesystem Readonly: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node_filesystem_readonly{ins={{ $labels.ins }}, ip={{ $labels.ip }},fstype!~"(n|root|tmp)fs.*"} == 1
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=110&fullscreen&var-ip={{ $labels.ip }}
# main fs usage > 90% for 1m triggers P1 alert
- alert: NODE_FS_SPACE_FULL
expr: node:fs:space_usage > 0.90
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node Filesystem Space Full: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:fs:space_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=110&fullscreen&var-ip={{ $labels.ip }}
# main fs inode usage > 90% for 1m triggers P1 alert
- alert: NODE_FS_INODE_FULL
expr: node:fs:inode_usage > 0.90
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node Filesystem iNode Full: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:fs:inode_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=110&fullscreen&var-ip={{ $labels.ip }}
# fd usage > 90% for 1m triggers P1 alert
- alert: NODE_FD_FULL
expr: node:fs:fd_usage > 0.90
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node File Descriptor Full: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:fs:fd_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 90%
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=58&fullscreen&var-ip={{ $labels.ip }}
# ssd read latency > 32ms for 3m (except long-read)
- alert: NODE_READ_LATENCY_HIGH
expr: node:dev:disk_read_rt < 10000 and node:dev:disk_read_rt > 0.032
for: 3m
labels:
severity: P2
annotations:
summary: "P2 Node Read Latency High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:dev:disk_read_rt[ins={{ $labels.ins }}, ip={{ $labels.ip }}, device={{ $labels.device }}] = {{ $value }} > 32ms
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=29&fullscreen&var-ip={{ $labels.ip }}
# ssd write latency > 16ms for 3m
- alert: NODE_WRITE_LATENCY_HIGH
expr: node:dev:disk_write_rt < 10000 and node:dev:disk_write_rt > 0.016
for: 3m
labels:
severity: P2
annotations:
summary: "P2 Node Write Latency High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:dev:disk_write_rt[ins={{ $labels.ins }}, ip={{ $labels.ip }}, device={{ $labels.device }}] = {{ $value }} > 16ms
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=29&fullscreen&var-ip={{ $labels.ip }}
#==============================================================#
# Memory #
#==============================================================#
# shared memory usage > 80% for 1m triggers a P1 alert
- alert: NODE_MEM_HIGH
expr: node:ins:mem_usage > 0.80
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node Mem High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:ins:mem_usage[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 80%
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=40&fullscreen&var-ip={{ $labels.ip }}
#==============================================================#
# Network & TCP #
#==============================================================#
# node tcp listen overflow > 2 for 3m
- alert: NODE_TCP_LISTEN_OVERFLOW
expr: node:ins:tcp_overflow_rate > 2
for: 3m
labels:
severity: P1
annotations:
summary: "P1 Node TCP Listen Overflow: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:ins:tcp_overflow_rate[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 2
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=55&fullscreen&var-ip={{ $labels.ip }}
# node tcp retrans > 32 per sec for 3m
- alert: NODE_TCP_RETRANS_HIGH
expr: node:ins:tcp_retranssegs > 32
for: 3m
labels:
severity: P2
annotations:
summary: "P2 Node TCP Retrans High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node:ins:tcp_retranssegs[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 32
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=52&fullscreen&var-ip={{ $labels.ip }}
# node tcp conn > 32768 for 1m
- alert: NODE_TCP_CONN_HIGH
expr: node_netstat_Tcp_CurrEstab > 32768
for: 3m
labels:
severity: P2
annotations:
summary: "P2 Node TCP Connection High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node_netstat_Tcp_CurrEstab[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 32768
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=54&fullscreen&var-ip={{ $labels.ip }}
#==============================================================#
# Misc #
#==============================================================#
# node ntp offset > 1s for 1m
- alert: NODE_NTP_OFFSET_HIGH
expr: abs(node_ntp_offset_seconds) > 1
for: 1m
labels:
severity: P1
annotations:
summary: "P1 Node NTP Offset High: {{ $labels.ins }} {{ $labels.ip }}"
description: |
node_ntp_offset_seconds[ins={{ $labels.ins }}, ip={{ $labels.ip }}] = {{ $value }} > 32768
http://g.pigsty/d/node?&from=now-10m&to=now&viewPanel=70&fullscreen&var-ip={{ $labels.ip }}
数据库与连接池报警规则
---
################################################################
# PgSQL Alert #
################################################################
- name: pgsql-alert
rules:
#==============================================================#
# Error / Aliveness #
#==============================================================#
# cluster size change triggers a P0 alert (warn: auto heal in 5min)
- alert: PGSQL_CLUSTER_SHRINK
expr: delta(pg:cls:size{}[5m]) < 0
for: 15s
labels:
severity: P1
annotations:
summary: 'delta(pg:cls:size{cls={{ $labels.cls }}}[15s]) = {{ $value | printf "%.0f" }} < 0'
description: |
http://g.pigsty/d/pg-cluster&from=now-10m&to=now&var-cls={{ $labels.cls }}
# postgres down for 15s triggers a P0 alert
- alert: PGSQL_DOWN
expr: PGSQL_up{} == 0
labels:
severity: P0
annotations:
summary: "[P0] PGSQL_DOWN: {{ $labels.ins }} {{ $value }}"
description: |
PGSQL_up[ins={{ $labels.ins }}] = {{ $value }} == 0
http://g.pigsty/d/pg-instance&from=now-10m&to=now&var-ins={{ $labels.ins }}
# pgbouncer down for 15s triggers a P0 alert
- alert: PGBOUNCER_DOWN
expr: pgbouncer_up{} == 0
labels:
severity: P0
annotations:
summary: "P0 Pgbouncer Down: {{ $labels.ins }} {{ $value }}"
description: |
pgbouncer_up[ins={{ $labels.ins }}] = {{ $value }} == 0
http://g.pigsty/d/pg-pgbouncer&from=now-10m&to=now&var-ins={{ $labels.ins }}
# pg/pgbouncer exporter down for 1m triggers a P1 alert
- alert: PGSQL_EXPORTER_DOWN
expr: up{instance=~"^.*:(9630|9631)$"} == 0
for: 1m
labels:
severity: P1
annotations:
summary: "P1 PG/PGB Exporter Down: {{ $labels.ins }} {{ $labels.instance }} {{ $value }}"
description: |
up[instance={{ $labels.instance }}] = {{ $value }} == 0
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=262&fullscreen&var-ins={{ $labels.ins }}
#==============================================================#
# Latency #
#==============================================================#
# replication break for 1m triggers a P1 alert (warn: heal in 5m)
- alert: PGSQL_REPLICATION_BREAK
expr: delta(PGSQL_downstream_count{state="streaming"}[5m]) < 0
for: 1m
labels:
severity: P1
annotations:
summary: "P1 PG Replication Break: {{ $labels.ins }} {{ $value }}"
description: |
PGSQL_downstream_count_delta[ins={{ $labels.ins }}] = {{ $value }} < 0
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=180&fullscreen&var-ins={{ $labels.ins }}
# replication lag greater than 8 second for 3m triggers a P1 alert
- alert: PGSQL_REPLICATION_LAG
expr: PGSQL_replication_replay_lag{application_name!='PGSQL_receivewal'} > 8
for: 3m
labels:
severity: P1
annotations:
summary: "P1 PG Replication Lagged: {{ $labels.ins }} {{ $value }}"
description: |
PGSQL_replication_replay_lag[ins={{ $labels.ins }}] = {{ $value }} > 8s
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=384&fullscreen&var-ins={{ $labels.ins }}
# pg avg response time > 16ms
- alert: PGSQL_QUERY_RT_HIGH
expr: pg:ins:query_rt > 0.016
for: 1m
labels:
severity: P1
annotations:
summary: "P1 PG Query Response Time High: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:query_rt[ins={{ $labels.ins }}] = {{ $value }} > 16ms
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=137&fullscreen&var-ins={{ $labels.ins }}
#==============================================================#
# Saturation #
#==============================================================#
# pg load1 high than 70% for 3m triggers a P1 alert
- alert: PGSQL_LOAD_HIGH
expr: pg:ins:load1{} > 0.70
for: 3m
labels:
severity: P1
annotations:
summary: "P1 PG Load High: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:load1[ins={{ $labels.ins }}] = {{ $value }} > 70%
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=210&fullscreen&var-ins={{ $labels.ins }}
# pg active backend more than 2 times of available cpu cores for 3m triggers a P1 alert
- alert: PGSQL_BACKEND_HIGH
expr: pg:ins:active_backends / on(ins) node:ins:cpu_count > 2
for: 3m
labels:
severity: P1
annotations:
summary: "P1 PG Backend High: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:active_backends/node:ins:cpu_count[ins={{ $labels.ins }}] = {{ $value }} > 2
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=150&fullscreen&var-ins={{ $labels.ins }}
# max idle xact duration exceed 3m
- alert: PGSQL_IDLE_XACT_BACKEND_HIGH
expr: pg:ins:ixact_backends > 1
for: 3m
labels:
severity: P2
annotations:
summary: "P1 PG Idle In Transaction Backend High: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:ixact_backends[ins={{ $labels.ins }}] = {{ $value }} > 1
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=161&fullscreen&var-ins={{ $labels.ins }}
# 2 waiting clients for 3m triggers a P1 alert
- alert: PGSQL_CLIENT_QUEUING
expr: pg:ins:waiting_clients > 2
for: 3m
labels:
severity: P1
annotations:
summary: "P1 PG Client Queuing: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:waiting_clients[ins={{ $labels.ins }}] = {{ $value }} > 2
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=159&fullscreen&var-ins={{ $labels.ins }}
# age wrap around (near half) triggers a P1 alert
- alert: PGSQL_AGE_HIGH
expr: pg:ins:age > 1000000000
for: 3m
labels:
severity: P1
annotations:
summary: "P1 PG Age High: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:age[ins={{ $labels.ins }}] = {{ $value }} > 1000000000
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=172&fullscreen&var-ins={{ $labels.ins }}
#==============================================================#
# Traffic #
#==============================================================#
# more than 30k TPS lasts for 3m triggers a P1 (pgbouncer bottleneck)
- alert: PGSQL_TPS_HIGH
expr: pg:ins:xacts > 30000
for: 3m
labels:
severity: P1
annotations:
summary: "P1 Postgres TPS High: {{ $labels.ins }} {{ $value }}"
description: |
pg:ins:xacts[ins={{ $labels.ins }}] = {{ $value }} > 30000
http://g.pigsty/d/pg-instance?from=now-10m&to=now&viewPanel=125&fullscreen&var-ins={{ $labels.ins }}
...
8.6 - 标准输出
完成沙箱环境初始化剧本所执行的具体步骤与输出结果
在本地拉起沙箱时所执行的Makefile快捷命令,以及其输出结果。
命令概览
# 下载本项目代码
cd /tmp && git clone git@github.com:Vonng/pigsty.git && cd pigsty
make up # 拉起vagrant虚拟机
make ssh # 配置虚拟机ssh访问 【单次,下次启动无需再次执行】
sudo make dns # 写入Pigsty静态DNS域名 【sudo输入密码,可选,单次】
make download # 下载最新离线软件包 【可选,可显著加速初始化】
make upload # 将离线软件包上传至元节点
make init # 初始化Pigsty
make mon-view # 打开Pigsty监控首页(默认用户密码:admin:admin)
clone
克隆并进入项目目录,后续操作均位于项目根目录中(以/tmp/pigsty
为例)
cd /tmp && git clone git@github.com:Vonng/pigsty.git && cd pigsty
clean
清理所有的沙箱痕迹(如果有)
$ make clean
cd vagrant && vagrant destroy -f --parallel; exit 0
==> vagrant: A new version of Vagrant is available: 2.2.14 (installed version: 2.2.13)!
==> vagrant: To upgrade visit: https://www.vagrantup.com/downloads.html
==> node-3: Forcing shutdown of VM...
==> node-3: Destroying VM and associated drives...
==> node-2: Forcing shutdown of VM...
==> node-2: Destroying VM and associated drives...
==> node-1: Forcing shutdown of VM...
==> node-1: Destroying VM and associated drives...
==> meta: Forcing shutdown of VM...
==> meta: Destroying VM and associated drives...
up
执行make up
将调用vagrant up
命令,根据Vagrantfile中的定义,使用Virtualbox创建四台虚拟机。
请注意第一次执行vagrant up
时,软件会自动从官网下载 CentOS/7 的虚拟机镜像。如果您的网络状况不佳(例如没有FQ代理),则可能需要等待相当长的一段时间。您也可以选择自己创建虚拟机,并根据 部署 一章的说明进行Pigsty部署(不建议)。
$ make up
cd vagrant && vagrant up
Bringing machine 'meta' up with 'virtualbox' provider...
Bringing machine 'node-1' up with 'virtualbox' provider...
Bringing machine 'node-2' up with 'virtualbox' provider...
Bringing machine 'node-3' up with 'virtualbox' provider...
==> meta: Cloning VM...
==> meta: Matching MAC address for NAT networking...
==> meta: Setting the name of the VM: vagrant_meta_1614587906789_29514
==> meta: Clearing any previously set network interfaces...
==> meta: Preparing network interfaces based on configuration...
meta: Adapter 1: nat
meta: Adapter 2: hostonly
==> meta: Forwarding ports...
meta: 22 (guest) => 2222 (host) (adapter 1)
==> meta: Running 'pre-boot' VM customizations...
==> meta: Booting VM...
==> meta: Waiting for machine to boot. This may take a few minutes...
meta: SSH address: 127.0.0.1:2222
meta: SSH username: vagrant
meta: SSH auth method: private key
==> meta: Machine booted and ready!
==> meta: Checking for guest additions in VM...
meta: No guest additions were detected on the base box for this VM! Guest
meta: additions are required for forwarded ports, shared folders, host only
meta: networking, and more. If SSH fails on this machine, please install
meta: the guest additions and repackage the box to continue.
meta:
meta: This is not an error message; everything may continue to work properly,
meta: in which case you may ignore this message.
==> meta: Setting hostname...
==> meta: Configuring and enabling network interfaces...
==> meta: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> meta: Running provisioner: shell...
meta: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-1jv6obp.sh
meta: [INFO] write ssh config to /home/vagrant/.ssh
==> node-1: Cloning VM...
==> node-1: Matching MAC address for NAT networking...
==> node-1: Setting the name of the VM: vagrant_node-1_1614587930603_84690
==> node-1: Fixed port collision for 22 => 2222. Now on port 2200.
==> node-1: Clearing any previously set network interfaces...
==> node-1: Preparing network interfaces based on configuration...
node-1: Adapter 1: nat
node-1: Adapter 2: hostonly
==> node-1: Forwarding ports...
node-1: 22 (guest) => 2200 (host) (adapter 1)
==> node-1: Running 'pre-boot' VM customizations...
==> node-1: Booting VM...
==> node-1: Waiting for machine to boot. This may take a few minutes...
node-1: SSH address: 127.0.0.1:2200
node-1: SSH username: vagrant
node-1: SSH auth method: private key
==> node-1: Machine booted and ready!
==> node-1: Checking for guest additions in VM...
node-1: No guest additions were detected on the base box for this VM! Guest
node-1: additions are required for forwarded ports, shared folders, host only
node-1: networking, and more. If SSH fails on this machine, please install
node-1: the guest additions and repackage the box to continue.
node-1:
node-1: This is not an error message; everything may continue to work properly,
node-1: in which case you may ignore this message.
==> node-1: Setting hostname...
==> node-1: Configuring and enabling network interfaces...
==> node-1: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> node-1: Running provisioner: shell...
node-1: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-5w83e1.sh
node-1: [INFO] write ssh config to /home/vagrant/.ssh
==> node-2: Cloning VM...
==> node-2: Matching MAC address for NAT networking...
==> node-2: Setting the name of the VM: vagrant_node-2_1614587953786_32441
==> node-2: Fixed port collision for 22 => 2222. Now on port 2201.
==> node-2: Clearing any previously set network interfaces...
==> node-2: Preparing network interfaces based on configuration...
node-2: Adapter 1: nat
node-2: Adapter 2: hostonly
==> node-2: Forwarding ports...
node-2: 22 (guest) => 2201 (host) (adapter 1)
==> node-2: Running 'pre-boot' VM customizations...
==> node-2: Booting VM...
==> node-2: Waiting for machine to boot. This may take a few minutes...
node-2: SSH address: 127.0.0.1:2201
node-2: SSH username: vagrant
node-2: SSH auth method: private key
==> node-2: Machine booted and ready!
==> node-2: Checking for guest additions in VM...
node-2: No guest additions were detected on the base box for this VM! Guest
node-2: additions are required for forwarded ports, shared folders, host only
node-2: networking, and more. If SSH fails on this machine, please install
node-2: the guest additions and repackage the box to continue.
node-2:
node-2: This is not an error message; everything may continue to work properly,
node-2: in which case you may ignore this message.
==> node-2: Setting hostname...
==> node-2: Configuring and enabling network interfaces...
==> node-2: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> node-2: Running provisioner: shell...
node-2: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-1xljcde.sh
node-2: [INFO] write ssh config to /home/vagrant/.ssh
==> node-3: Cloning VM...
==> node-3: Matching MAC address for NAT networking...
==> node-3: Setting the name of the VM: vagrant_node-3_1614587977533_52921
==> node-3: Fixed port collision for 22 => 2222. Now on port 2202.
==> node-3: Clearing any previously set network interfaces...
==> node-3: Preparing network interfaces based on configuration...
node-3: Adapter 1: nat
node-3: Adapter 2: hostonly
==> node-3: Forwarding ports...
node-3: 22 (guest) => 2202 (host) (adapter 1)
==> node-3: Running 'pre-boot' VM customizations...
==> node-3: Booting VM...
==> node-3: Waiting for machine to boot. This may take a few minutes...
node-3: SSH address: 127.0.0.1:2202
node-3: SSH username: vagrant
node-3: SSH auth method: private key
==> node-3: Machine booted and ready!
==> node-3: Checking for guest additions in VM...
node-3: No guest additions were detected on the base box for this VM! Guest
node-3: additions are required for forwarded ports, shared folders, host only
node-3: networking, and more. If SSH fails on this machine, please install
node-3: the guest additions and repackage the box to continue.
node-3:
node-3: This is not an error message; everything may continue to work properly,
node-3: in which case you may ignore this message.
==> node-3: Setting hostname...
==> node-3: Configuring and enabling network interfaces...
==> node-3: Rsyncing folder: /Volumes/Data/pigsty/vagrant/ => /vagrant
==> node-3: Running provisioner: shell...
node-3: Running: /var/folders/_5/_0mbf4292pl9y4xgy0kn2r1h0000gn/T/vagrant-shell20210301-60046-1cykx8o.sh
node-3: [INFO] write ssh config to /home/vagrant/.ssh
ssh
新拉起的虚拟机默认用户为vagrant
,需要配置本机到虚拟机的免密ssh访问。 执行make ssh
命令将调用vagrant的ssh-config
命令,将pigsty虚拟机节点的ssh配置文件写入~/.ssh/pigsty_config
。
通常该命令只需要在首次启动沙箱时执行一次,后续重新拉起的虚拟机通常会保有相同的SSH配置。
执行完毕后,用户才可以使用类似ssh node-1
的方式通过SSH别名连接至沙箱内的虚拟机节点。
$ make ssh
cd vagrant && vagrant ssh-config > ~/.ssh/pigsty_config 2>/dev/null; true
if ! grep --quiet "pigsty_config" ~/.ssh/config ; then (echo 'Include ~/.ssh/pigsty_config' && cat ~/.ssh/config) > ~/.ssh/config.tmp; mv ~/.ssh/config.tmp ~/.ssh/config && chmod 0600 ~/.ssh/config; fi
if ! grep --quiet "StrictHostKeyChecking=no" ~/.ssh/config ; then (echo 'StrictHostKeyChecking=no' && cat ~/.ssh/config) > ~/.ssh/config.tmp; mv ~/.ssh/config.tmp ~/.ssh/config && chmod 0600 ~/.ssh/config; fi
dns
此命令将Pigsty沙箱虚拟机的静态DNS配置写入/etc/hosts
,通常该命令只需要在首次启动沙箱时执行一次。
执行完毕后,用户才可以从本地浏览器使用域名访问 http://g.pigsty
等WebUI。
注意DNS命令需要SUDO权限执行,需要输入密码,因为/etc/hosts
文件需要特权方可修改。
$ sudo make dns
Password: #<在此输入用户密码>
if ! grep --quiet "pigsty dns records" /etc/hosts ; then cat files/dns >> /etc/hosts; fi
download
从CDN下载最新的Pigsty离线安装包至本地,大小约1GB,约1分钟下载完成。
$ make download
curl http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz -o files/pkg.tgz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1067M 100 1067M 0 0 15.2M 0 0:01:10 0:01:10 --:--:-- 29.0M
Pigsty是一个复杂的软件系统,为了确保系统的稳定,Pigsty会在初始化过程中从互联网下载所有依赖的软件包并建立本地Yum源。
所有依赖的软件总大小约1GB左右,下载速度取决于您的网络情况。尽管Pigsty已经尽量使用镜像源以加速下载,但少量包的下载仍可能受到防火墙的阻挠,可能出现非常慢的情况。您可以通过proxy_env
配置项设置下载代理以完成首次下载,或直接下载预先打包好的离线安装包。
最新的离线安装包地址为:
Github Release:https://github.com/Vonng/pigsty/releases
CDN Download:http://pigsty-1304147732.cos.accelerate.myqcloud.com/pkg.tgz
您也可以手工下载好后放置于files/pkg.tgz
。
upload
将下载的离线安装包上传元节点并解压,加速后续初始化。
$ make upload
ssh -t meta "sudo rm -rf /tmp/pkg.tgz"
Connection to 127.0.0.1 closed.
scp -r files/pkg.tgz meta:/tmp/pkg.tgz
pkg.tgz 100% 1068MB 53.4MB/s 00:19
ssh -t meta "sudo mkdir -p /www/pigsty/; sudo rm -rf /www/pigsty/*; sudo tar -xf /tmp/pkg.tgz --strip-component=1 -C /www/pigsty/"
Connection to 127.0.0.1 closed.
init
完成上述操作后,执行make init
即会调用ansible
完成Pigsty系统的初始化。
$ make init
./sandbox.yml # 快速初始化,并行初始化元节点与普通数据库节点
sandbox.yml
是专门为本地沙箱环境准备的初始化剧本,通过同时初始化元节点和数据库节点节省了一半时间。 生产环境建议使用infra.yml
与pgsql.yml
分别依次完成元节点与普通节点的初始化。
如果您已经将离线安装包上传至元节点,那么初始化环境会比较快,视机器配置可能总共需要5~10分钟不等。
若离线安装包不存在,那么Pigsty会在初始化过程中从互联网下载约1GB数据,视网络条件可能需要20分钟或更久。
$ make init
./sandbox.yml # interleave sandbox provisioning
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
PLAY [Init local repo] ***********************************************************************************************************************************************************************************
TASK [repo : Create local repo directory] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [repo : Backup & remove existing repos] *************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [repo : Add required upstream repos] ****************************************************************************************************************************************************************
[WARNING]: Using a variable for a task's 'args' is unsafe in some situations (see https://docs.ansible.com/ansible/devel/reference_appendices/faq.html#argsplat-unsafe)
changed: [10.10.10.10] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
changed: [10.10.10.10] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
changed: [10.10.10.10] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
changed: [10.10.10.10] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
changed: [10.10.10.10] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
changed: [10.10.10.10] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
changed: [10.10.10.10] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
changed: [10.10.10.10] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
changed: [10.10.10.10] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
changed: [10.10.10.10] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
TASK [repo : Check repo pkgs cache exists] ***************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [repo : Set fact whether repo_exists] ***************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [repo : Move upstream repo to backup] ***************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [repo : Add local file system repos] ****************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [repo : Remake yum cache if not exists] *************************************************************************************************************************************************************
[WARNING]: Consider using the yum module rather than running 'yum'. If you need to use command because yum is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [10.10.10.10]
TASK [repo : Install repo bootstrap packages] ************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=['yum-utils', 'createrepo', 'ansible', 'nginx', 'wget'])
TASK [repo : Render repo nginx server files] *************************************************************************************************************************************************************
changed: [10.10.10.10] => (item={'src': 'index.html.j2', 'dest': '/www/index.html'})
changed: [10.10.10.10] => (item={'src': 'default.conf.j2', 'dest': '/etc/nginx/conf.d/default.conf'})
changed: [10.10.10.10] => (item={'src': 'local.repo.j2', 'dest': '/www/pigsty.repo'})
changed: [10.10.10.10] => (item={'src': 'nginx.conf.j2', 'dest': '/etc/nginx/nginx.conf'})
TASK [repo : Disable selinux for repo server] ************************************************************************************************************************************************************
[WARNING]: SELinux state temporarily changed from 'enforcing' to 'permissive'. State change will take effect next reboot.
changed: [10.10.10.10]
TASK [repo : Launch repo nginx server] *******************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [repo : Waits repo server online] *******************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [repo : Download web url packages] ******************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=https://github.com/Vonng/pg_exporter/releases/download/v0.3.2/pg_exporter-0.3.2-1.el7.x86_64.rpm)
skipping: [10.10.10.10] => (item=https://github.com/cybertec-postgresql/vip-manager/releases/download/v0.6/vip-manager_0.6-1_amd64.rpm)
skipping: [10.10.10.10] => (item=http://guichaz.free.fr/polysh/files/polysh-0.4-1.noarch.rpm)
TASK [repo : Download repo packages] *********************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=epel-release nginx wget yum-utils yum createrepo)
skipping: [10.10.10.10] => (item=ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned)
skipping: [10.10.10.10] => (item=readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates)
skipping: [10.10.10.10] => (item=numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet)
skipping: [10.10.10.10] => (item=grafana prometheus2 pushgateway alertmanager)
skipping: [10.10.10.10] => (item=node_exporter postgres_exporter nginx_exporter blackbox_exporter)
skipping: [10.10.10.10] => (item=consul consul_exporter consul-template etcd)
skipping: [10.10.10.10] => (item=ansible python python-pip python-psycopg2 audit)
skipping: [10.10.10.10] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.10] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.10] => (item=haproxy keepalived dnsmasq)
skipping: [10.10.10.10] => (item=patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity)
skipping: [10.10.10.10] => (item=pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel)
skipping: [10.10.10.10] => (item=barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail)
skipping: [10.10.10.10] => (item=postgresql13* postgis31* citus_13 timescaledb_13)
skipping: [10.10.10.10] => (item=pg_repack13 pg_squeeze13)
skipping: [10.10.10.10] => (item=pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13)
skipping: [10.10.10.10] => (item=plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13)
skipping: [10.10.10.10] => (item=hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13)
skipping: [10.10.10.10] => (item=wal2json13 count_distinct13 ddlx_13 geoip13 orafce13)
skipping: [10.10.10.10] => (item=rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13)
skipping: [10.10.10.10] => (item=pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13)
skipping: [10.10.10.10] => (item=pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13)
skipping: [10.10.10.10] => (item=pguint13 pguri13 prefix13 safeupdate_13 semver13 table_version13 tdigest13)
TASK [repo : Download repo pkg deps] *********************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=epel-release nginx wget yum-utils yum createrepo)
skipping: [10.10.10.10] => (item=ntp chrony uuid lz4 nc pv jq vim-enhanced make patch bash lsof wget unzip git tuned)
skipping: [10.10.10.10] => (item=readline zlib openssl libyaml libxml2 libxslt perl-ExtUtils-Embed ca-certificates)
skipping: [10.10.10.10] => (item=numactl grubby sysstat dstat iotop bind-utils net-tools tcpdump socat ipvsadm telnet)
skipping: [10.10.10.10] => (item=grafana prometheus2 pushgateway alertmanager)
skipping: [10.10.10.10] => (item=node_exporter postgres_exporter nginx_exporter blackbox_exporter)
skipping: [10.10.10.10] => (item=consul consul_exporter consul-template etcd)
skipping: [10.10.10.10] => (item=ansible python python-pip python-psycopg2 audit)
skipping: [10.10.10.10] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.10] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.10] => (item=haproxy keepalived dnsmasq)
skipping: [10.10.10.10] => (item=patroni patroni-consul patroni-etcd pgbouncer pg_cli pgbadger pg_activity)
skipping: [10.10.10.10] => (item=pgcenter boxinfo check_postgres emaj pgbconsole pg_bloat_check pgquarrel)
skipping: [10.10.10.10] => (item=barman barman-cli pgloader pgFormatter pitrery pspg pgxnclient PyGreSQL pgadmin4 tail_n_mail)
skipping: [10.10.10.10] => (item=postgresql13* postgis31* citus_13 timescaledb_13)
skipping: [10.10.10.10] => (item=pg_repack13 pg_squeeze13)
skipping: [10.10.10.10] => (item=pg_qualstats13 pg_stat_kcache13 system_stats_13 bgw_replstatus13)
skipping: [10.10.10.10] => (item=plr13 plsh13 plpgsql_check_13 plproxy13 plr13 plsh13 plpgsql_check_13 pldebugger13)
skipping: [10.10.10.10] => (item=hdfs_fdw_13 mongo_fdw13 mysql_fdw_13 ogr_fdw13 redis_fdw_13 pgbouncer_fdw13)
skipping: [10.10.10.10] => (item=wal2json13 count_distinct13 ddlx_13 geoip13 orafce13)
skipping: [10.10.10.10] => (item=rum_13 hypopg_13 ip4r13 jsquery_13 logerrors_13 periods_13 pg_auto_failover_13 pg_catcheck13)
skipping: [10.10.10.10] => (item=pg_fkpart13 pg_jobmon13 pg_partman13 pg_prioritize_13 pg_track_settings13 pgaudit15_13)
skipping: [10.10.10.10] => (item=pgcryptokey13 pgexportdoc13 pgimportdoc13 pgmemcache-13 pgmp13 pgq-13)
skipping: [10.10.10.10] => (item=pguint13 pguri13 prefix13 safeupdate_13 semver13 table_version13 tdigest13)
TASK [repo : Create local repo index] ********************************************************************************************************************************************************************
skipping: [10.10.10.10]
TASK [repo : Copy bootstrap scripts] *********************************************************************************************************************************************************************
skipping: [10.10.10.10]
TASK [repo : Mark repo cache as valid] *******************************************************************************************************************************************************************
skipping: [10.10.10.10]
PLAY [Provision Node] ************************************************************************************************************************************************************************************
TASK [node : Update node hostname] ***********************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [node : Add new hostname to /etc/hosts] *************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [node : Write static dns records] *******************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=10.10.10.10 yum.pigsty)
changed: [10.10.10.11] => (item=10.10.10.10 yum.pigsty)
changed: [10.10.10.13] => (item=10.10.10.10 yum.pigsty)
changed: [10.10.10.12] => (item=10.10.10.10 yum.pigsty)
TASK [node : Get old nameservers] ************************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [node : Truncate resolv file] ***********************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [node : Write resolv options] ***********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.12] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.10] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.13] => (item=options single-request-reopen timeout:1 rotate)
changed: [10.10.10.11] => (item=domain service.consul)
changed: [10.10.10.12] => (item=domain service.consul)
changed: [10.10.10.13] => (item=domain service.consul)
changed: [10.10.10.10] => (item=domain service.consul)
TASK [node : Add new nameservers] ************************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=10.10.10.10)
changed: [10.10.10.12] => (item=10.10.10.10)
changed: [10.10.10.10] => (item=10.10.10.10)
changed: [10.10.10.13] => (item=10.10.10.10)
TASK [node : Append old nameservers] *********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=10.0.2.3)
changed: [10.10.10.12] => (item=10.0.2.3)
changed: [10.10.10.10] => (item=10.0.2.3)
changed: [10.10.10.13] => (item=10.0.2.3)
TASK [node : Node configure disable firewall] ************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [node : Node disable selinux by default] ************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
[WARNING]: SELinux state change will take effect next reboot
ok: [10.10.10.10]
TASK [node : Backup existing repos] **********************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [node : Install upstream repo] **********************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.10] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.12] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.13] => (item={'name': 'base', 'description': 'CentOS-$releasever - Base - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/os/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/os/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/os/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.11] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.13] => (item={'name': 'updates', 'description': 'CentOS-$releasever - Updates - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/updates/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/updates/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.13] => (item={'name': 'extras', 'description': 'CentOS-$releasever - Extras - Aliyun Mirror', 'baseurl': ['http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/', 'http://mirrors.aliyuncs.com/centos/$releasever/extras/$basearch/', 'http://mirrors.cloud.aliyuncs.com/centos/$releasever/extras/$basearch/'], 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.12] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.13] => (item={'name': 'epel', 'description': 'CentOS $releasever - EPEL - Aliyun Mirror', 'baseurl': 'http://mirrors.aliyun.com/epel/$releasever/$basearch', 'gpgcheck': False, 'failovermethod': 'priority'})
skipping: [10.10.10.13] => (item={'name': 'grafana', 'description': 'Grafana - TsingHua Mirror', 'gpgcheck': False, 'baseurl': 'https://mirrors.tuna.tsinghua.edu.cn/grafana/yum/rpm'})
skipping: [10.10.10.10] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.10] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.11] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.10] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.11] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.12] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.10] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.11] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.12] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.10] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.11] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.12] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.10] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'prometheus', 'description': 'Prometheus and exporters', 'gpgcheck': False, 'baseurl': 'https://packagecloud.io/prometheus-rpm/release/el/$releasever/$basearch'})
skipping: [10.10.10.11] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.12] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.13] => (item={'name': 'pgdg-common', 'description': 'PostgreSQL common RPMs for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/common/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.10] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.11] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.12] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.10] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'pgdg13', 'description': 'PostgreSQL 13 for RHEL/CentOS $releasever - $basearch', 'gpgcheck': False, 'baseurl': 'http://mirrors.zju.edu.cn/postgresql/repos/yum/13/redhat/rhel-$releasever-$basearch'})
skipping: [10.10.10.11] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.12] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'centos-sclo', 'description': 'CentOS-$releasever - SCLo', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-sclo'})
skipping: [10.10.10.11] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
skipping: [10.10.10.12] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'centos-sclo-rh', 'description': 'CentOS-$releasever - SCLo rh', 'gpgcheck': False, 'mirrorlist': 'http://mirrorlist.centos.org?arch=$basearch&release=7&repo=sclo-rh'})
skipping: [10.10.10.12] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'nginx', 'description': 'Nginx Official Yum Repo', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'http://nginx.org/packages/centos/$releasever/$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'haproxy', 'description': 'Copr repo for haproxy', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/roidelapluie/haproxy/epel-$releasever-$basearch/'})
skipping: [10.10.10.13] => (item={'name': 'harbottle', 'description': 'Copr repo for main owned by harbottle', 'skip_if_unavailable': True, 'gpgcheck': False, 'baseurl': 'https://download.copr.fedorainfracloud.org/results/harbottle/main/epel-$releasever-$basearch/'})
TASK [node : Install local repo] *************************************************************************************************************************************************************************
changed: [10.10.10.13] => (item=http://yum.pigsty/pigsty.repo)
changed: [10.10.10.12] => (item=http://yum.pigsty/pigsty.repo)
changed: [10.10.10.11] => (item=http://yum.pigsty/pigsty.repo)
changed: [10.10.10.10] => (item=http://yum.pigsty/pigsty.repo)
TASK [node : Install node basic packages] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=[])
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])
TASK [node : Install node extra packages] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=[])
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])
TASK [node : Install meta specific packages] *************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=[])
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])
TASK [node : Install node basic packages] ****************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
changed: [10.10.10.13] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
changed: [10.10.10.11] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
changed: [10.10.10.12] => (item=['wget,yum-utils,ntp,chrony,tuned,uuid,lz4,vim-minimal,make,patch,bash,lsof,wget,unzip,git,readline,zlib,openssl', 'numactl,grubby,sysstat,dstat,iotop,bind-utils,net-tools,tcpdump,socat,ipvsadm,telnet,tuned,pv,jq', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography', 'node_exporter,consul,consul-template,etcd,haproxy,keepalived,vip-manager'])
TASK [node : Install node extra packages] ****************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
changed: [10.10.10.12] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
changed: [10.10.10.13] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
changed: [10.10.10.10] => (item=['patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity'])
TASK [node : Install meta specific packages] *************************************************************************************************************************************************************
skipping: [10.10.10.11] => (item=[])
skipping: [10.10.10.12] => (item=[])
skipping: [10.10.10.13] => (item=[])
changed: [10.10.10.10] => (item=['grafana,prometheus2,alertmanager,nginx_exporter,blackbox_exporter,pushgateway', 'dnsmasq,nginx,ansible,pgbadger,polysh'])
TASK [node : Node configure disable numa] ****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [node : Node configure disable swap] ****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [node : Node configure unmount swap] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=swap)
skipping: [10.10.10.10] => (item=none)
skipping: [10.10.10.11] => (item=swap)
skipping: [10.10.10.11] => (item=none)
skipping: [10.10.10.12] => (item=swap)
skipping: [10.10.10.12] => (item=none)
skipping: [10.10.10.13] => (item=swap)
skipping: [10.10.10.13] => (item=none)
TASK [node : Node setup static network] ******************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Node configure disable firewall] ************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [node : Node configure disk prefetch] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [node : Enable linux kernel modules] ****************************************************************************************************************************************************************
changed: [10.10.10.13] => (item=softdog)
changed: [10.10.10.12] => (item=softdog)
changed: [10.10.10.11] => (item=softdog)
changed: [10.10.10.10] => (item=softdog)
changed: [10.10.10.13] => (item=br_netfilter)
changed: [10.10.10.12] => (item=br_netfilter)
changed: [10.10.10.11] => (item=br_netfilter)
changed: [10.10.10.10] => (item=br_netfilter)
changed: [10.10.10.12] => (item=ip_vs)
changed: [10.10.10.13] => (item=ip_vs)
changed: [10.10.10.11] => (item=ip_vs)
changed: [10.10.10.10] => (item=ip_vs)
changed: [10.10.10.13] => (item=ip_vs_rr)
changed: [10.10.10.12] => (item=ip_vs_rr)
changed: [10.10.10.11] => (item=ip_vs_rr)
changed: [10.10.10.10] => (item=ip_vs_rr)
ok: [10.10.10.13] => (item=ip_vs_rr)
ok: [10.10.10.12] => (item=ip_vs_rr)
ok: [10.10.10.11] => (item=ip_vs_rr)
ok: [10.10.10.10] => (item=ip_vs_rr)
changed: [10.10.10.13] => (item=ip_vs_wrr)
changed: [10.10.10.12] => (item=ip_vs_wrr)
changed: [10.10.10.11] => (item=ip_vs_wrr)
changed: [10.10.10.10] => (item=ip_vs_wrr)
changed: [10.10.10.13] => (item=ip_vs_sh)
changed: [10.10.10.12] => (item=ip_vs_sh)
changed: [10.10.10.11] => (item=ip_vs_sh)
changed: [10.10.10.10] => (item=ip_vs_sh)
changed: [10.10.10.13] => (item=nf_conntrack_ipv4)
changed: [10.10.10.12] => (item=nf_conntrack_ipv4)
changed: [10.10.10.11] => (item=nf_conntrack_ipv4)
changed: [10.10.10.10] => (item=nf_conntrack_ipv4)
TASK [node : Enable kernel module on reboot] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.10]
TASK [node : Get config parameter page count] ************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [node : Get config parameter page size] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [node : Tune shmmax and shmall via mem] *************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [node : Create tuned profiles] **********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=oltp)
changed: [10.10.10.12] => (item=oltp)
changed: [10.10.10.10] => (item=oltp)
changed: [10.10.10.13] => (item=oltp)
changed: [10.10.10.11] => (item=olap)
changed: [10.10.10.12] => (item=olap)
changed: [10.10.10.13] => (item=olap)
changed: [10.10.10.10] => (item=olap)
changed: [10.10.10.11] => (item=crit)
changed: [10.10.10.12] => (item=crit)
changed: [10.10.10.13] => (item=crit)
changed: [10.10.10.10] => (item=crit)
changed: [10.10.10.11] => (item=tiny)
changed: [10.10.10.12] => (item=tiny)
changed: [10.10.10.13] => (item=tiny)
changed: [10.10.10.10] => (item=tiny)
TASK [node : Render tuned profiles] **********************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=oltp)
changed: [10.10.10.12] => (item=oltp)
changed: [10.10.10.13] => (item=oltp)
changed: [10.10.10.10] => (item=oltp)
changed: [10.10.10.12] => (item=olap)
changed: [10.10.10.11] => (item=olap)
changed: [10.10.10.13] => (item=olap)
changed: [10.10.10.10] => (item=olap)
changed: [10.10.10.12] => (item=crit)
changed: [10.10.10.11] => (item=crit)
changed: [10.10.10.13] => (item=crit)
changed: [10.10.10.10] => (item=crit)
changed: [10.10.10.11] => (item=tiny)
changed: [10.10.10.12] => (item=tiny)
changed: [10.10.10.13] => (item=tiny)
changed: [10.10.10.10] => (item=tiny)
TASK [node : Active tuned profile] ***********************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
TASK [node : Change additional sysctl params] ************************************************************************************************************************************************************
changed: [10.10.10.13] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
changed: [10.10.10.12] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
changed: [10.10.10.11] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
changed: [10.10.10.10] => (item={'key': 'net.bridge.bridge-nf-call-iptables', 'value': 1})
TASK [node : Copy default user bash profile] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Setup node default pam ulimits] *************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Create os user group admin] *****************************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Create os user admin] ***********************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
TASK [node : Grant admin group nopass sudo] **************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Add no host checking to ssh config] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Add admin ssh no host checking] *************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [node : Fetch all admin public keys] ****************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [node : Exchange all admin ssh keys] ****************************************************************************************************************************************************************
changed: [10.10.10.10 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.10'])
changed: [10.10.10.13 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.10'])
changed: [10.10.10.12 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.10'])
changed: [10.10.10.11 -> meta] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.10'])
changed: [10.10.10.10 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.11'])
changed: [10.10.10.12 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.11'])
changed: [10.10.10.13 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.11'])
changed: [10.10.10.11 -> node-1] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.11'])
changed: [10.10.10.10 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.12'])
changed: [10.10.10.13 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.12'])
changed: [10.10.10.12 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.12'])
changed: [10.10.10.11 -> node-2] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.12'])
changed: [10.10.10.10 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDfXbkp7ATV3rIzcpCwxcwpumIjnjldzDp9qfu65d4W5gSNumN/wvOORnG17rB2y/msyjstu1C42v2V60yho/XjPNIqqPWPtM/bc6MHNeNJJxvEEtDsY530z3n37QTcVI1kg3zRqnzm8HDKEE+BAll+iyXjzTFoGHc39syDRF8r5sZpG0qiNY2QaqEnByASsoHM4RQ3Jw2D2SbA78wFBz1zqsdz5VympAcc9wcfuUqhwk0ExL+AtrPNUeyEXwgRr1Br6JXVHjT6EHLsZburTD7uT94Jqzixd3LXRwsmuCrPIssASrYvfnWVQ29MxhiZqrmLcwp4ImjQetcZE2EgfzEp ansible-generated on meta', '10.10.10.13'])
changed: [10.10.10.13 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDbkD6WQhs9KAv9HTYtZ+q2Nfxqhj72YbP16m0mTrEOS2evd4MWDBhVgAE6qK4gvAhVBdEdNaHc3f2W/wDpKvvbvCbwy+HZldUCTVUe1W3sycm1ZwP7m9Xr7Rg0Dd1Nom87CWsqmlmN6afPYyvJV3wCl4ZuqrAMQ5oCrR4D1B8yZBL7rj55JpzggnNJYv7+ueIeUYoPzE6mu32k9wPxEa2qXcdVelgL7dwjTAt1nsNukWAufuAI1nZcJahsNjj1B2XEEwgA1mHUzDPpemn5alCNeCb+Hdb0Y12No/Wo2Gcn3b5vh9pOamLCm3CGrrsAXZ2B8tQPGFObhGkSOB6pddkT ansible-generated on node-3', '10.10.10.13'])
changed: [10.10.10.11 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC2TJItJzBUEZ452k7ADL6mIQsGk7gb4AUqvN0pAHwR06pVv1XUmpCI5Wb0RUOoNFwmSBVTUXoXCnK7SB44ftpzD29cpxw3tlLEphYeY1wfrd2lblhpn2KxzBhyJZ27lK2qcZk7Ik20pZDhQZRuZuhb6HufYn7FGOutB8kgQChrcpqr9zRhjZOe4Y8tLR2lmEAVrp6ZsS04rjiBJ65TDCWCNSnin8DVbM1EerJ6Pvxy1cOY+B00EYMHlMni/3orzcrlnZqpkR/NRpgs9+lo+DZ4SCuEtIEOzpPzcm/O4oLhxSnTMJKTFwcc+bgmE0t1LMxvIKOQTwhIX+KoBE/syxh9 ansible-generated on node-1', '10.10.10.13'])
changed: [10.10.10.12 -> node-3] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC3IAopnkVwQ779/Hk5MceAVZbhb/y3YaUu7ZROI87TaY/XK5WKJjplfNlLBC2vXGNkYMirbW+Qmmz/XIsyL7qvKmQfcMGP3ILD4FtMMlJMWLwBTIw5ORxvoZGxaWfw0bcZSIw5rv9rBA4UJR9JfZhpUkBMj7cq8jNDyIrLpoJ+hlnJa5G5zyiMWBqe7VKOoiBo7d2WBIauhRgHY3G79H9pVxJti6JJOeQ1tsUI5UtOMCRO+dbmsuRWruac4jWOj864RG/EjFveWEfCTagMFakqaxPTgF3RHAwPVBjbMm3+2lBiVNd2Zt2g/2gPdkEbIE+xXXP/f5kh21gXFea4ENsV ansible-generated on node-2', '10.10.10.13'])
TASK [node : Install public keys] ************************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
changed: [10.10.10.10] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
changed: [10.10.10.12] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
changed: [10.10.10.13] => (item=ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQC7IMAMNavYtWwzAJajKqwdn3ar5BhvcwCnBTxxEkXhGlCO2vfgosSAQMEflfgvkiI5nM1HIFQ8KINlx1XLO7SdL5KdInG5LIJjAFh0pujS4kNCT9a5IGvSq1BrzGqhbEcwWYdju1ZPYBcJm/MG+JD0dYCh8vfrYB/cYMD0SOmNkQ== vagrant@pigsty.com)
TASK [node : Install ntp package] ************************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.10]
ok: [10.10.10.13]
TASK [node : Install chrony package] *********************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
ok: [10.10.10.10]
TASK [node : Setup default node timezone] ****************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
TASK [node : Copy the ntp.conf file] *********************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Copy the chrony.conf template] **************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [node : Launch ntpd service] ************************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [node : Launch chronyd service] *********************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
PLAY [Init meta service] *********************************************************************************************************************************************************************************
TASK [ca : Create local ca directory] ********************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [ca : Copy ca cert from local files] ****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=ca.key)
skipping: [10.10.10.10] => (item=ca.crt)
TASK [ca : Check ca key cert exists] *********************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [ca : Create self-signed CA key-cert] ***************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nameserver : Make sure dnsmasq package installed] **************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nameserver : Copy dnsmasq /etc/dnsmasq.d/config] ***************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nameserver : Add dynamic dns records to meta] ******************************************************************************************************************************************************
changed: [10.10.10.10] => (item=10.10.10.2 pg-meta)
changed: [10.10.10.10] => (item=10.10.10.3 pg-test)
changed: [10.10.10.10] => (item=10.10.10.10 meta-1)
changed: [10.10.10.10] => (item=10.10.10.11 node-1)
changed: [10.10.10.10] => (item=10.10.10.12 node-2)
changed: [10.10.10.10] => (item=10.10.10.13 node-3)
changed: [10.10.10.10] => (item=10.10.10.10 pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 y.pigsty yum.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 c.pigsty consul.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 g.pigsty grafana.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 p.pigsty prometheus.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 a.pigsty alertmanager.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 n.pigsty ntp.pigsty)
changed: [10.10.10.10] => (item=10.10.10.10 h.pigsty haproxy.pigsty)
TASK [nameserver : Launch meta dnsmasq service] **********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nameserver : Wait for meta dnsmasq online] *********************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nameserver : Register consul dnsmasq service] ******************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nameserver : Reload consul] ************************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Make sure nginx installed] *****************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nginx : Create local html directory] ***************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nginx : Create nginx config directory] *************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Update default nginx index page] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Copy nginx default config] *****************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nginx : Copy nginx upstream conf] ******************************************************************************************************************************************************************
changed: [10.10.10.10] => (item={'name': 'home', 'host': 'pigsty', 'url': '127.0.0.1:3000'})
changed: [10.10.10.10] => (item={'name': 'consul', 'host': 'c.pigsty', 'url': '127.0.0.1:8500'})
changed: [10.10.10.10] => (item={'name': 'grafana', 'host': 'g.pigsty', 'url': '127.0.0.1:3000'})
changed: [10.10.10.10] => (item={'name': 'prometheus', 'host': 'p.pigsty', 'url': '127.0.0.1:9090'})
changed: [10.10.10.10] => (item={'name': 'alertmanager', 'host': 'a.pigsty', 'url': '127.0.0.1:9093'})
changed: [10.10.10.10] => (item={'name': 'haproxy', 'host': 'h.pigsty', 'url': '127.0.0.1:9091'})
TASK [nginx : Templating /etc/nginx/haproxy.conf] ********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Render haproxy upstream in cluster mode] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)
TASK [nginx : Render haproxy location in cluster mode] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)
TASK [nginx : Templating haproxy cluster index] **********************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)
TASK [nginx : Templating haproxy cluster index] **********************************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
ok: [10.10.10.10] => (item=pg-test)
TASK [nginx : Restart meta nginx service] ****************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Wait for nginx service online] *************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nginx : Make sure nginx exporter installed] ********************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nginx : Config nginx_exporter options] *************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Restart nginx_exporter service] ************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Wait for nginx exporter online] ************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [nginx : Register cosnul nginx service] *************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Register consul nginx-exporter service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [nginx : Reload consul] *****************************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Install prometheus and alertmanager] **************************************************************************************************************************************************
ok: [10.10.10.10] => (item=prometheus2)
ok: [10.10.10.10] => (item=alertmanager)
TASK [prometheus : Wipe out prometheus config dir] *******************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Wipe out existing prometheus data] ****************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [prometheus : Create postgres directory structure] **************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/prometheus)
changed: [10.10.10.10] => (item=/etc/prometheus/bin)
changed: [10.10.10.10] => (item=/etc/prometheus/rules)
changed: [10.10.10.10] => (item=/etc/prometheus/targets)
changed: [10.10.10.10] => (item=/export/prometheus/data)
TASK [prometheus : Copy prometheus bin scripts] **********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Copy prometheus rules scripts] ********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Copy altermanager config] *************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Render prometheus config] *************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Config /etc/prometheus opts] **********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Launch prometheus service] ************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Launch alertmanager service] **********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Wait for prometheus online] ***********************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [prometheus : Wait for alertmanager online] *********************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [prometheus : Render prometheus targets in cluster mode] ********************************************************************************************************************************************
changed: [10.10.10.10] => (item=pg-meta)
changed: [10.10.10.10] => (item=pg-test)
TASK [prometheus : Reload prometheus service] ************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Copy prometheus service definition] ***************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Copy alertmanager service definition] *************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [prometheus : Reload consul to register prometheus] *************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Make sure grafana is installed] **********************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [grafana : Check grafana plugin cache exists] *******************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [grafana : Provision grafana plugins via cache] *****************************************************************************************************************************************************
[WARNING]: Consider using the file module with state=absent rather than running 'rm'. If you need to use command because file is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [10.10.10.10]
TASK [grafana : Download grafana plugins from web] *******************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=redis-datasource)
skipping: [10.10.10.10] => (item=simpod-json-datasource)
skipping: [10.10.10.10] => (item=fifemon-graphql-datasource)
skipping: [10.10.10.10] => (item=sbueringer-consul-datasource)
skipping: [10.10.10.10] => (item=camptocamp-prometheus-alertmanager-datasource)
skipping: [10.10.10.10] => (item=ryantxu-ajax-panel)
skipping: [10.10.10.10] => (item=marcusolsson-hourly-heatmap-panel)
skipping: [10.10.10.10] => (item=michaeldmoore-multistat-panel)
skipping: [10.10.10.10] => (item=marcusolsson-treemap-panel)
skipping: [10.10.10.10] => (item=pr0ps-trackmap-panel)
skipping: [10.10.10.10] => (item=dalvany-image-panel)
skipping: [10.10.10.10] => (item=magnesium-wordcloud-panel)
skipping: [10.10.10.10] => (item=cloudspout-button-panel)
skipping: [10.10.10.10] => (item=speakyourcode-button-panel)
skipping: [10.10.10.10] => (item=jdbranham-diagram-panel)
skipping: [10.10.10.10] => (item=grafana-piechart-panel)
skipping: [10.10.10.10] => (item=snuids-radar-panel)
skipping: [10.10.10.10] => (item=digrich-bubblechart-panel)
TASK [grafana : Download grafana plugins from web] *******************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=https://github.com/Vonng/grafana-echarts)
TASK [grafana : Create grafana plugins cache] ************************************************************************************************************************************************************
skipping: [10.10.10.10]
TASK [grafana : Copy /etc/grafana/grafana.ini] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Remove grafana provision dir] ************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Copy provisioning content] ***************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Copy pigsty dashboards] ******************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Copy pigsty icon image] ******************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Replace grafana icon with pigsty] ********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Launch grafana service] ******************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Wait for grafana online] *****************************************************************************************************************************************************************
ok: [10.10.10.10]
TASK [grafana : Update grafana default preferences] ******************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Register consul grafana service] *********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [grafana : Reload consul] ***************************************************************************************************************************************************************************
changed: [10.10.10.10]
PLAY [Init dcs] ******************************************************************************************************************************************************************************************
TASK [consul : Check for existing consul] ****************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [consul : Consul exists flag fact set] **************************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [consul : Abort due to consul exists] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Clean existing consul instance] ***********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Stop any running consul instance] *********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [consul : Remove existing consul dir] ***************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/consul.d)
changed: [10.10.10.11] => (item=/etc/consul.d)
changed: [10.10.10.12] => (item=/etc/consul.d)
changed: [10.10.10.13] => (item=/etc/consul.d)
changed: [10.10.10.10] => (item=/var/lib/consul)
changed: [10.10.10.11] => (item=/var/lib/consul)
changed: [10.10.10.12] => (item=/var/lib/consul)
changed: [10.10.10.13] => (item=/var/lib/consul)
TASK [consul : Recreate consul dir] **********************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/consul.d)
changed: [10.10.10.11] => (item=/etc/consul.d)
changed: [10.10.10.12] => (item=/etc/consul.d)
changed: [10.10.10.13] => (item=/etc/consul.d)
changed: [10.10.10.10] => (item=/var/lib/consul)
changed: [10.10.10.11] => (item=/var/lib/consul)
changed: [10.10.10.13] => (item=/var/lib/consul)
changed: [10.10.10.12] => (item=/var/lib/consul)
TASK [consul : Make sure consul is installed] ************************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [consul : Make sure consul dir exists] **************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [consul : Get dcs server node names] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Get dcs node name from var] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Get dcs node name from var] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [consul : Fetch hostname as dcs node name] **********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Get dcs name from hostname] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Copy /etc/consul.d/consul.json] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [consul : Copy consul agent service] ****************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [consul : Get dcs bootstrap expect quroum] **********************************************************************************************************************************************************
ok: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Copy consul server service unit] **********************************************************************************************************************************************************
changed: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Launch consul server service] *************************************************************************************************************************************************************
changed: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Wait for consul server online] ************************************************************************************************************************************************************
ok: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [consul : Launch consul agent service] **************************************************************************************************************************************************************
skipping: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [consul : Wait for consul agent online] *************************************************************************************************************************************************************
skipping: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
PLAY [Init database cluster] *****************************************************************************************************************************************************************************
TASK [postgres : Create os group postgres] ***************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Make sure dcs group exists] *************************************************************************************************************************************************************
ok: [10.10.10.10] => (item=consul)
ok: [10.10.10.11] => (item=consul)
ok: [10.10.10.12] => (item=consul)
ok: [10.10.10.13] => (item=consul)
ok: [10.10.10.11] => (item=etcd)
ok: [10.10.10.10] => (item=etcd)
ok: [10.10.10.12] => (item=etcd)
ok: [10.10.10.13] => (item=etcd)
TASK [postgres : Create dbsu postgres] *******************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Grant dbsu nopass sudo] *****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Grant dbsu all sudo] ********************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Grant dbsu limited sudo] ****************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Config patroni watchdog support] ********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Add dbsu ssh no host checking] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Fetch dbsu public keys] *****************************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Exchange dbsu ssh keys] *****************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.10'])
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.11'])
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.12'])
skipping: [10.10.10.10] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8ahlH3Yo0nTb1hhd7SGTF1sCwnjEVA/yGra2ktQcZ/i8S/2tfumVomxtnNTeOZqNeQygVUbRgIH77lABXrXwBOimw+J0EmoekPsW7q/NCT5EJgqfoDe5vWBpyhrCe1ixCxESlP2GfpaJYGqeMW2G8HiFU6ieDZcfGcFn1q9JBjtrrV851Htw+Ik/fed93ipGgWzzZnu4NOjz7tpmrsmE3/1J/RvPQdRT7Pjuy2pLn+oCjMkQHJezvUKruVTVwxjObaWO7WFlvQCy2dRez1GBxEK80LRbsZfmgkfIQPzmqHOaacqNBAHe+OeYlBh3fMMbpALzJHnhgJSW5GpdRwiUJ ansible-generated on meta', '10.10.10.13'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.10'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.11'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.12'])
skipping: [10.10.10.11] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCIr/IW4qyd4Ls8dztCJyYHt354iPFbhLAUUiEK9R3A5W8UOSiJK/WVwlxMazH8QUaMWHuQAlTtW66kW1DDU+fsJ4xGxrNjEnwUbmWfj3BBnoANJQHYOid8iLJwWZuykvz0EIdGMDVpUpIx/qqm3/ZlC+cD0iukXQyEyAw3Qgts/Twqr5IJGeQOFy9Z4rmqSXtz/8tS0YOHCHVC5GGsUpD5+GLqhwPd64xCbWnvpYY61IX45Hzf+zO80xGqPeQLqF9HULs5wi2i6plKrSRl76VWCq9T7QMQMKJJSLUabnrXrKm+sr21LImgpSxSbqbBVVNUVS+adQvvylWb6yaFWov ansible-generated on node-1', '10.10.10.13'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.10'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.11'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.12'])
skipping: [10.10.10.12] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQChMymmlyxGn7PnUvAUvh968/gxTnwGZhhMhIc2+aiuA0QP/D8CSmKfzRYoMVP6/nm3cJsYXM28wzWZ1X/sLp33rYYxbwWpj5n8oBalzqKmSzK0HI5CePKAlWlEeLRDxvKpZYhZwXmro5Ov9lfp63kNHU84nAP7BPBOlufFyydn50bUwP1xKEsG1BC9Xqd4XqB5+eRLjkQDuC743bgxFc3FM8fij1/MuvxtG3HvL6DgEvCo3Lx4qkiVO3akR6Lo3bQEkf76Gq94cFbecAAnYZzdkPHR5LqJiIGS0DYj0yZQXrdN+DtjpyIBfZzi+TFdcVW1Agy1IUQ7Lrt29HJw+/sD ansible-generated on node-2', '10.10.10.13'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.10'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.11'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.12'])
skipping: [10.10.10.13] => (item=['ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCo9KBPH2DVYQrM/WZ4CO4Ipvr+5L6FhqWBr1A6C0Ms+qi77aKHwFEIbrxKqj7wZFbHWoTPt/cbWkXhZgnkfDBR81/wBImnFz0QfuL0tNDN0/YP/4cePo5bQERGcnBI6vkjmXMyGGpRQobNRj71fX/Wt5WMw6dM+d4XjfgUKHIJxEKnz8HYnkiwWm5Flc9EHKTWN+87vZ9B6cdi7gxLQu8LL3x+4e2ArRoz9u5yZIajUTvexqD2IIReqsFt+QObpinLaTc/g7Q+w/no1hAZERS3pImx9l0GF6Ktdp/HMHH1vk2cwnyogrk+OLw1WccI1YkBes/xdzBFTWOwUX3w/vBt ansible-generated on node-3', '10.10.10.13'])
TASK [postgres : Install offical pgdg yum repo] **********************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=postgresql${pg_version}*)
skipping: [10.10.10.10] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.10] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.11] => (item=postgresql${pg_version}*)
skipping: [10.10.10.10] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.11] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.10] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.11] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.12] => (item=postgresql${pg_version}*)
skipping: [10.10.10.10] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.11] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.12] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.11] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.12] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.13] => (item=postgresql${pg_version}*)
skipping: [10.10.10.11] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.12] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.13] => (item=postgis31_${pg_version}*)
skipping: [10.10.10.12] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.13] => (item=pgbouncer patroni pg_exporter pgbadger)
skipping: [10.10.10.12] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
skipping: [10.10.10.13] => (item=patroni patroni-consul patroni-etcd pgbouncer pgbadger pg_activity)
skipping: [10.10.10.13] => (item=python3 python3-psycopg2 python36-requests python3-etcd python3-consul)
skipping: [10.10.10.13] => (item=python36-urllib3 python36-idna python36-pyOpenSSL python36-cryptography)
TASK [postgres : Install pg packages] ********************************************************************************************************************************************************************
changed: [10.10.10.10] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
changed: [10.10.10.11] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
changed: [10.10.10.13] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
changed: [10.10.10.12] => (item=['postgresql13*', 'postgis31_13*', 'pgbouncer,patroni,pg_exporter,pgbadger', 'patroni,patroni-consul,patroni-etcd,pgbouncer,pgbadger,pg_activity', 'python3,python3-psycopg2,python36-requests,python3-etcd,python3-consul', 'python36-urllib3,python36-idna,python36-pyOpenSSL,python36-cryptography'])
TASK [postgres : Install pg extensions] ******************************************************************************************************************************************************************
changed: [10.10.10.11] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
changed: [10.10.10.10] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
changed: [10.10.10.13] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
changed: [10.10.10.12] => (item=['pg_repack13,pg_qualstats13,pg_stat_kcache13,wal2json13'])
TASK [postgres : Link /usr/pgsql to current version] *****************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Add pg bin dir to profile path] *********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Fix directory ownership] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Remove default postgres service] ********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Check necessary variables exists] *******************************************************************************************************************************************************
ok: [10.10.10.10] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [postgres : Fetch variables via pg_cluster] *********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Set cluster basic facts for hosts] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Assert cluster primary singleton] *******************************************************************************************************************************************************
ok: [10.10.10.10] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.11] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.12] => {
"changed": false,
"msg": "All assertions passed"
}
ok: [10.10.10.13] => {
"changed": false,
"msg": "All assertions passed"
}
TASK [postgres : Setup cluster primary ip address] *******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Setup repl upstream for primary] ********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Setup repl upstream for replicas] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Debug print instance summary] ***********************************************************************************************************************************************************
ok: [10.10.10.10] => {
"msg": "cluster=pg-meta service=pg-meta-primary instance=pg-meta-1 replication=[primary:itself]->10.10.10.10"
}
ok: [10.10.10.11] => {
"msg": "cluster=pg-test service=pg-test-primary instance=pg-test-1 replication=[primary:itself]->10.10.10.11"
}
ok: [10.10.10.12] => {
"msg": "cluster=pg-test service=pg-test-replica instance=pg-test-2 replication=[primary:itself]->10.10.10.12"
}
ok: [10.10.10.13] => {
"msg": "cluster=pg-test service=pg-test-offline instance=pg-test-3 replication=[primary:itself]->10.10.10.13"
}
TASK [postgres : Check for existing postgres instance] ***************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Set fact whether pg port is open] *******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Abort due to existing postgres instance] ************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Clean existing postgres instance] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Shutdown existing postgres service] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Remove registerd consul service] ********************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.10]
TASK [postgres : Remove postgres metadata in consul] *****************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.10]
TASK [postgres : Remove existing postgres data] **********************************************************************************************************************************************************
ok: [10.10.10.10] => (item=/pg)
ok: [10.10.10.11] => (item=/pg)
ok: [10.10.10.12] => (item=/pg)
ok: [10.10.10.13] => (item=/pg)
ok: [10.10.10.10] => (item=/export/postgres)
ok: [10.10.10.11] => (item=/export/postgres)
ok: [10.10.10.12] => (item=/export/postgres)
ok: [10.10.10.13] => (item=/export/postgres)
ok: [10.10.10.10] => (item=/var/backups/postgres)
ok: [10.10.10.11] => (item=/var/backups/postgres)
ok: [10.10.10.12] => (item=/var/backups/postgres)
ok: [10.10.10.13] => (item=/var/backups/postgres)
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
TASK [postgres : Make sure main and backup dir exists] ***************************************************************************************************************************************************
changed: [10.10.10.11] => (item=/export)
changed: [10.10.10.12] => (item=/export)
changed: [10.10.10.13] => (item=/export)
changed: [10.10.10.10] => (item=/export)
changed: [10.10.10.11] => (item=/var/backups)
changed: [10.10.10.12] => (item=/var/backups)
changed: [10.10.10.13] => (item=/var/backups)
changed: [10.10.10.10] => (item=/var/backups)
TASK [postgres : Create postgres directory structure] ****************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/export/postgres)
changed: [10.10.10.11] => (item=/export/postgres)
changed: [10.10.10.12] => (item=/export/postgres)
changed: [10.10.10.13] => (item=/export/postgres)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/bin)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/bin)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/bin)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/bin)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/log)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/log)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/log)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/log)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/tmp)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/tmp)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/tmp)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/tmp)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/conf)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/conf)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/conf)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/conf)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/data)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/data)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/data)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/data)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/meta)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/meta)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/meta)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/meta)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/stat)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/stat)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/stat)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/stat)
changed: [10.10.10.10] => (item=/export/postgres/pg-meta-13/change)
changed: [10.10.10.12] => (item=/export/postgres/pg-test-13/change)
changed: [10.10.10.11] => (item=/export/postgres/pg-test-13/change)
changed: [10.10.10.13] => (item=/export/postgres/pg-test-13/change)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/postgres)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/postgres)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/postgres)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/postgres)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/arcwal)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/arcwal)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/arcwal)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/arcwal)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/backup)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/backup)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/backup)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/backup)
changed: [10.10.10.10] => (item=/var/backups/postgres/pg-meta-13/remote)
changed: [10.10.10.12] => (item=/var/backups/postgres/pg-test-13/remote)
changed: [10.10.10.11] => (item=/var/backups/postgres/pg-test-13/remote)
changed: [10.10.10.13] => (item=/var/backups/postgres/pg-test-13/remote)
TASK [postgres : Create pgbouncer directory structure] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)
TASK [postgres : Create links from pgbkup to pgroot] *****************************************************************************************************************************************************
changed: [10.10.10.10] => (item=arcwal)
changed: [10.10.10.11] => (item=arcwal)
changed: [10.10.10.12] => (item=arcwal)
changed: [10.10.10.13] => (item=arcwal)
changed: [10.10.10.10] => (item=backup)
changed: [10.10.10.11] => (item=backup)
changed: [10.10.10.12] => (item=backup)
changed: [10.10.10.13] => (item=backup)
changed: [10.10.10.10] => (item=remote)
changed: [10.10.10.11] => (item=remote)
changed: [10.10.10.12] => (item=remote)
changed: [10.10.10.13] => (item=remote)
TASK [postgres : Create links from current cluster] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]
TASK [postgres : Copy pg_cluster to /pg/meta/cluster] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Copy pg_version to /pg/meta/version] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Copy pg_instance to /pg/meta/instance] **************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Copy pg_seq to /pg/meta/sequence] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Copy pg_role to /pg/meta/role] **********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Copy postgres scripts to /pg/bin/] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Copy alias profile to /etc/profile.d] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Copy psqlrc to postgres home] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Setup hostname to pg instance name] *****************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Copy consul node-meta definition] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Restart consul to load new node-meta] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [postgres : Config patroni watchdog support] ********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Get config parameter page count] ********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Get config parameter page size] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [postgres : Tune shared buffer and work mem] ********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Hanlde small size mem occasion] *********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Calculate postgres mem params] **********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : create patroni config dir] **************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : use predefined patroni template] ********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Render default /pg/conf/patroni.yml] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Link /pg/conf/patroni to /pg/bin/] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Link /pg/bin/patroni.yml to /etc/patroni/] **********************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Config patroni watchdog support] ********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Copy patroni systemd service file] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : create patroni systemd drop-in dir] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Copy postgres systemd service file] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Drop-In consul dependency for patroni] **************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Render default initdb scripts] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [postgres : Launch patroni on primary instance] *****************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Wait for patroni primary online] ********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
ok: [10.10.10.10]
ok: [10.10.10.11]
TASK [postgres : Wait for postgres primary online] *******************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
ok: [10.10.10.10]
ok: [10.10.10.11]
TASK [postgres : Check primary postgres service ready] ***************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
[WARNING]: Module remote_tmp /var/lib/pgsql/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir
with the correct permissions manually
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Check replication connectivity to primary] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Render init roles sql] ******************************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Render init template sql] ***************************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Render default pg-init scripts] *********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.10]
TASK [postgres : Execute initialization scripts] *********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Check primary instance ready] ***********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Add dbsu password to pgpass if exists] **************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Add system user to pgpass] **************************************************************************************************************************************************************
changed: [10.10.10.10] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.11] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.12] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.13] => (item={'username': 'replicator', 'password': 'DBUser.Replicator'})
changed: [10.10.10.11] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.10] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.13] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.12] => (item={'username': 'dbuser_monitor', 'password': 'DBUser.Monitor'})
changed: [10.10.10.13] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
changed: [10.10.10.12] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
changed: [10.10.10.10] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
changed: [10.10.10.11] => (item={'username': 'dbuser_admin', 'password': 'DBUser.Admin'})
TASK [postgres : Check replication connectivity to primary] **********************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Launch patroni on replica instances] ****************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Wait for patroni replica online] ********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Wait for postgres replica online] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Check replica postgres service ready] ***************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Render hba rules] ***********************************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
TASK [postgres : Reload hba rules] ***********************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Pause patroni] **************************************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.10]
TASK [postgres : Stop patroni on replica instance] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Stop patroni on primary instance] *******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Launch raw postgres on primary] *********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Launch raw postgres on primary] *********************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Wait for postgres online] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Check pgbouncer is installed] ***********************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [postgres : Stop existing pgbouncer service] ********************************************************************************************************************************************************
ok: [10.10.10.11]
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Remove existing pgbouncer dirs] *********************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
TASK [postgres : Recreate dirs with owner postgres] ******************************************************************************************************************************************************
changed: [10.10.10.10] => (item=/etc/pgbouncer)
changed: [10.10.10.11] => (item=/etc/pgbouncer)
changed: [10.10.10.12] => (item=/etc/pgbouncer)
changed: [10.10.10.13] => (item=/etc/pgbouncer)
changed: [10.10.10.10] => (item=/var/log/pgbouncer)
changed: [10.10.10.12] => (item=/var/log/pgbouncer)
changed: [10.10.10.11] => (item=/var/log/pgbouncer)
changed: [10.10.10.13] => (item=/var/log/pgbouncer)
changed: [10.10.10.10] => (item=/var/run/pgbouncer)
changed: [10.10.10.12] => (item=/var/run/pgbouncer)
changed: [10.10.10.11] => (item=/var/run/pgbouncer)
changed: [10.10.10.13] => (item=/var/run/pgbouncer)
TASK [postgres : Copy /etc/pgbouncer/pgbouncer.ini] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [postgres : Copy /etc/pgbouncer/pgb_hba.conf] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Touch userlist and database list] *******************************************************************************************************************************************************
changed: [10.10.10.10] => (item=database.txt)
changed: [10.10.10.11] => (item=database.txt)
changed: [10.10.10.12] => (item=database.txt)
changed: [10.10.10.13] => (item=database.txt)
changed: [10.10.10.10] => (item=userlist.txt)
changed: [10.10.10.11] => (item=userlist.txt)
changed: [10.10.10.12] => (item=userlist.txt)
changed: [10.10.10.13] => (item=userlist.txt)
TASK [postgres : Add default users to pgbouncer] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [postgres : Copy pgbouncer systemd service] *********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [postgres : Launch pgbouncer pool service] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [postgres : Wait for pgbouncer service online] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [postgres : Check pgbouncer service is ready] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : include_tasks] **************************************************************************************************************************************************************************
included: /private/tmp/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.10 => (item={'name': 'dbuser_meta', 'password': 'DBUser.Meta', 'login': True, 'superuser': False, 'createdb': False, 'createrole': False, 'inherit': True, 'replication': False, 'bypassrls': False, 'connlimit': -1, 'expire_at': '2030-12-31', 'expire_in': 365, 'roles': ['dbrole_readwrite'], 'pgbouncer': True, 'parameters': {'search_path': 'public'}, 'comment': 'test user'})
included: /private/tmp/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.10 => (item={'name': 'dbuser_vonng2', 'password': 'DBUser.Vonng', 'roles': ['dbrole_offline'], 'expire_in': 365, 'pgbouncer': False, 'comment': 'example personal user for interactive queries'})
included: /private/tmp/pigsty/roles/postgres/tasks/createuser.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13 => (item={'name': 'test', 'password': 'test', 'roles': ['dbrole_readwrite'], 'pgbouncer': True, 'comment': 'default test user for production usage'})
TASK [postgres : Render user dbuser_meta creation sql] ***************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Execute user dbuser_meta creation sql on primary] ***************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Add user to pgbouncer] ******************************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Render user dbuser_vonng2 creation sql] *************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Execute user dbuser_vonng2 creation sql on primary] *************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Add user to pgbouncer] ******************************************************************************************************************************************************************
skipping: [10.10.10.10]
TASK [postgres : Render user test creation sql] **********************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [postgres : Execute user test creation sql on primary] **********************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [postgres : Add user to pgbouncer] ******************************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [postgres : include_tasks] **************************************************************************************************************************************************************************
included: /private/tmp/pigsty/roles/postgres/tasks/createdb.yml for 10.10.10.10 => (item={'name': 'meta', 'allowconn': True, 'revokeconn': False, 'connlimit': -1, 'extensions': [{'name': 'postgis', 'schema': 'public'}], 'parameters': {'enable_partitionwise_join': True}, 'pgbouncer': True, 'comment': 'pigsty meta database'})
included: /private/tmp/pigsty/roles/postgres/tasks/createdb.yml for 10.10.10.11, 10.10.10.12, 10.10.10.13 => (item={'name': 'test'})
TASK [postgres : debug] **********************************************************************************************************************************************************************************
ok: [10.10.10.10] => {
"msg": {
"allowconn": true,
"comment": "pigsty meta database",
"connlimit": -1,
"extensions": [
{
"name": "postgis",
"schema": "public"
}
],
"name": "meta",
"parameters": {
"enable_partitionwise_join": true
},
"pgbouncer": true,
"revokeconn": false
}
}
TASK [postgres : Render database meta creation sql] ******************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Render database meta baseline sql] ******************************************************************************************************************************************************
skipping: [10.10.10.10]
TASK [postgres : Execute database meta creation command] *************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Execute database meta creation sql] *****************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : Execute database meta creation sql] *****************************************************************************************************************************************************
skipping: [10.10.10.10]
TASK [postgres : Add pgbouncer busniess database] ********************************************************************************************************************************************************
changed: [10.10.10.10]
TASK [postgres : debug] **********************************************************************************************************************************************************************************
ok: [10.10.10.11] => {
"msg": {
"name": "test"
}
}
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Render database test creation sql] ******************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [postgres : Render database test baseline sql] ******************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Execute database test creation command] *************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [postgres : Execute database test creation sql] *****************************************************************************************************************************************************
skipping: [10.10.10.12]
skipping: [10.10.10.13]
changed: [10.10.10.11]
TASK [postgres : Execute database test creation sql] *****************************************************************************************************************************************************
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [postgres : Add pgbouncer busniess database] ********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [postgres : Reload pgbouncer to add db and users] ***************************************************************************************************************************************************
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.11]
TASK [postgres : Copy pg service definition to consul] ***************************************************************************************************************************************************
changed: [10.10.10.10] => (item=postgres)
changed: [10.10.10.11] => (item=postgres)
changed: [10.10.10.12] => (item=postgres)
changed: [10.10.10.13] => (item=postgres)
changed: [10.10.10.10] => (item=pgbouncer)
changed: [10.10.10.11] => (item=pgbouncer)
changed: [10.10.10.12] => (item=pgbouncer)
changed: [10.10.10.13] => (item=pgbouncer)
changed: [10.10.10.10] => (item=patroni)
changed: [10.10.10.11] => (item=patroni)
changed: [10.10.10.12] => (item=patroni)
changed: [10.10.10.13] => (item=patroni)
TASK [postgres : Reload postgres consul service] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [postgres : Render grafana datasource definition] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [postgres : Register datasource to grafana] *********************************************************************************************************************************************************
[WARNING]: Consider using the get_url or uri module rather than running 'curl'. If you need to use command because get_url or uri is insufficient you can add 'warn: false' to this command task or set
'command_warnings=False' in ansible.cfg to get rid of this message.
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [monitor : Install exporter yum repo] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [monitor : Install node_exporter and pg_exporter] ***************************************************************************************************************************************************
skipping: [10.10.10.10] => (item=node_exporter)
skipping: [10.10.10.10] => (item=pg_exporter)
skipping: [10.10.10.11] => (item=node_exporter)
skipping: [10.10.10.11] => (item=pg_exporter)
skipping: [10.10.10.12] => (item=node_exporter)
skipping: [10.10.10.12] => (item=pg_exporter)
skipping: [10.10.10.13] => (item=node_exporter)
skipping: [10.10.10.13] => (item=pg_exporter)
TASK [monitor : Copy node_exporter binary] ***************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [monitor : Copy pg_exporter binary] *****************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [monitor : Create /etc/pg_exporter conf dir] ********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [monitor : Copy default pg_exporter.yaml] ***********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [monitor : Config /etc/default/pg_exporter] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [monitor : Config pg_exporter service unit] *********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [monitor : Launch pg_exporter systemd service] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [monitor : Wait for pg_exporter service online] *****************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.11]
ok: [10.10.10.13]
TASK [monitor : Register pg-exporter consul service] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [monitor : Reload pg-exporter consul service] *******************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.10]
TASK [monitor : Config pgbouncer_exporter opts] **********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [monitor : Config pgbouncer_exporter service] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [monitor : Launch pgbouncer_exporter service] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
TASK [monitor : Wait for pgbouncer_exporter online] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [monitor : Register pgb-exporter consul service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
TASK [monitor : Reload pgb-exporter consul service] ******************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [monitor : Copy node_exporter systemd service] ******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [monitor : Config default node_exporter options] ****************************************************************************************************************************************************
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [monitor : Launch node_exporter service unit] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]
TASK [monitor : Wait for node_exporter online] ***********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [monitor : Register node-exporter service to consul] ************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [monitor : Reload node-exporter consul service] *****************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [service : Make sure haproxy is installed] **********************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [service : Create haproxy directory] ****************************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.13]
ok: [10.10.10.11]
TASK [service : Copy haproxy systemd service file] *******************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]
TASK [service : Fetch postgres cluster memberships] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.11]
ok: [10.10.10.12]
ok: [10.10.10.13]
TASK [service : Templating /etc/haproxy/haproxy.cfg] *****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.11]
changed: [10.10.10.12]
TASK [service : Launch haproxy load balancer service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.13]
changed: [10.10.10.11]
TASK [service : Wait for haproxy load balancer online] ***************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.12]
ok: [10.10.10.11]
ok: [10.10.10.13]
TASK [service : Reload haproxy load balancer service] ****************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.11]
TASK [service : Copy haproxy exporter definition] ********************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [service : Copy haproxy service definition] *********************************************************************************************************************************************************
changed: [10.10.10.12] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.10] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.11] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.13] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
changed: [10.10.10.10] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.12] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.13] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.11] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
changed: [10.10.10.10] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.12] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.11] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.13] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
changed: [10.10.10.10] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
changed: [10.10.10.12] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
changed: [10.10.10.11] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
changed: [10.10.10.13] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
TASK [service : Reload haproxy consul service] ***********************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [service : Make sure vip-manager is installed] ******************************************************************************************************************************************************
ok: [10.10.10.10]
ok: [10.10.10.13]
ok: [10.10.10.11]
ok: [10.10.10.12]
TASK [service : Copy vip-manager systemd service file] ***************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.12]
changed: [10.10.10.11]
changed: [10.10.10.13]
TASK [service : create vip-manager systemd drop-in dir] **************************************************************************************************************************************************
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.10]
changed: [10.10.10.13]
TASK [service : create vip-manager systemd drop-in file] *************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.13]
changed: [10.10.10.12]
changed: [10.10.10.11]
TASK [service : Templating /etc/default/vip-manager.yml] *************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.12]
changed: [10.10.10.13]
TASK [service : Launch vip-manager] **********************************************************************************************************************************************************************
changed: [10.10.10.10]
changed: [10.10.10.11]
changed: [10.10.10.13]
changed: [10.10.10.12]
TASK [service : Fetch postgres cluster memberships] ******************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
TASK [service : Render L4 VIP configs] *******************************************************************************************************************************************************************
skipping: [10.10.10.10] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.10] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.10] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.11] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.10] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
skipping: [10.10.10.11] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.11] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.12] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.11] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
skipping: [10.10.10.12] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.13] => (item={'name': 'primary', 'src_ip': '*', 'src_port': 5433, 'dst_port': 'pgbouncer', 'check_url': '/primary', 'selector': '[]'})
skipping: [10.10.10.12] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.13] => (item={'name': 'replica', 'src_ip': '*', 'src_port': 5434, 'dst_port': 'pgbouncer', 'check_url': '/read-only', 'selector': '[]', 'selector_backup': '[? pg_role == `primary`]'})
skipping: [10.10.10.12] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
skipping: [10.10.10.13] => (item={'name': 'default', 'src_ip': '*', 'src_port': 5436, 'dst_port': 'postgres', 'check_method': 'http', 'check_port': 'patroni', 'check_url': '/primary', 'check_code': 200, 'selector': '[]', 'haproxy': {'maxconn': 3000, 'balance': 'roundrobin', 'default_server_options': 'inter 3s fastinter 1s downinter 5s rise 3 fall 3 on-marked-down shutdown-sessions slowstart 30s maxconn 3000 maxqueue 128 weight 100'}})
skipping: [10.10.10.13] => (item={'name': 'offline', 'src_ip': '*', 'src_port': 5438, 'dst_port': 'postgres', 'check_url': '/replica', 'selector': '[? pg_role == `offline` || pg_offline_query ]', 'selector_backup': '[? pg_role == `replica` && !pg_offline_query]'})
TASK [service : include_tasks] ***************************************************************************************************************************************************************************
skipping: [10.10.10.10]
skipping: [10.10.10.11]
skipping: [10.10.10.12]
skipping: [10.10.10.13]
PLAY RECAP ***********************************************************************************************************************************************************************************************
10.10.10.10 : ok=264 changed=205 unreachable=0 failed=0 skipped=62 rescued=0 ignored=0
10.10.10.11 : ok=182 changed=146 unreachable=0 failed=0 skipped=55 rescued=0 ignored=0
10.10.10.12 : ok=171 changed=135 unreachable=0 failed=0 skipped=66 rescued=0 ignored=0
10.10.10.13 : ok=171 changed=135 unreachable=0 failed=0 skipped=66 rescued=0 ignored=0
烈建议在第一次完成初始化后执行 make cache
命令,该命令会将下载好的软件打为离线缓存包,并放置于files/pkg.tgz
中。这样当下一次创建新的pigsty环境时,只要宿主机内操作系统一致,就可以直接复用该离线包,省去大量下载时间。
mon-view
初始化完毕后,您可以通过浏览器访问 http://pigsty 前往监控系统主页。默认的用户名与密码均为admin
如果没有配置DNS,或者没有使用默认的IP地址,也可以直接访问 http://meta_ip_address:3000
前往监控系统首页。
$ make mon-view
open -n 'http://g.pigsty/'
8.7 - PG Exporter
PG Exporter参考
Exporter
https://github.com/Vonng/pg_exporter
完全自研的 pg_exporter, 用于收集postgres与pgbouncer的指标:
支持PostgreSQL 9.4 ~ 13版本,Pgbouncer 1.8+版本
几乎所有指标都通过配置文件以SQL的形式获取,完全定制化,提供热重载功能
指标收集器可以根据类似Kubernetes的方式调度执行
(例如只在从库上执行,只在带有tag启动标签的节点执行,只在安装特定扩展的实例上执行等)
带有灵活的指标缓存策略,自动超时取消,最小化监控系统对数据库的性能影响。
提供健康检查,就绪探针,主从角色检查等功能,可用于流量分发
PG Exporter
Prometheus exporter for PostgreSQL metrics. Gives you complete insight on your favourate elephant!
Latest binaries & rpms can be found on release page. Supported pg version: PostgreSQL 9.4+ & Pgbouncer 1.8+. Default collectors definition is compatible with PostgreSQL 10,11,12,13.
Latest pg_exporter
version: 0.3.1
Features
- Support both Postgres & Pgbouncer
- Flexible: Almost all metrics are defined in customizable configuration files in SQL style.
- Fine-grained execution control (Tags Filter, Facts Filter, Version Filter, Timeout, Cache, etc…)
- Dynamic Planning: User could provide multiple branches of a metric queries. Queries matches server version & fact & tag will be actually installed.
- Configurable caching policy & query timeout
- Rich metrics about
pg_exporter
itself.
- Auto discovery multi database in the same cluster (multiple database scrape TBD)
- Tested and verified in real world production environment for years (200+ Nodes)
- Metrics overhelming! Gives you complete insight on your favourate elephant!
- (Pgbouncer mode is enabled when target dbname is
pgbouncer
)
性能表现
对于极端场景(几十万张表与几万种查询),一次抓取最多可能耗费秒级的时长。
好在所有指标收集器都是可选关闭的,且pg_exporter 允许为收集器配置主动超时取消(默认100ms)
自监控
Exporter展示了监控系统组件本身的监控指标,包括:
- Exporter是否存活,Uptime,Exporter每分钟被抓取的次数
- 每个监控查询的耗时,产生的指标数量与错误数量。
Prometheus的配置
Prometheus的抓取频率建议采用10~15秒,并配置适当的超时。
演示或特殊情况也可以配置的更精细(例如2秒,5秒等)
单Prometheus节点可以支持几百个实例的监控,约几百万个时间序列 (Dell R740 64 Core / 400GB Mem/ 3TB PCI-E SSD)
更大规模的集群可以通过Prometheus级联、联邦或分片实现伸缩。例如为每一个数据库集群部署一个Prometheus,并使用上级Prometheus统筹抓取并计算衍生指标
8.8 - Prometheus服务发现
Prometheus是如何通过静态文件进行服务发现的
当使用 prometheus_sd_method
== ‘static’ 的静态文件服务发现模式时,Prometheus会使用静态文件进行服务发现,目标配置文件地址默认为 /etc/prometheus/targets/
目录中的所有yml
文件。
集中式配置
当 prometheus_sd_target
配置为batch
模式时,Pigsty会采用集中式配置管理Prometheus监控目标。
所有监控对象都定义于单一配置文件:/etc/prometheus/targets/all.yml
中。
#==============================================================#
# File : targets/all.yml
# Ctime : 2021-02-18
# Mtime : 2021-02-18
# Atime : 2021-03-01 16:46
# Note : Managed by Ansible
# Desc : Prometheus Static Monitoring Targets Definition
# Path : /etc/prometheus/targets/all.yml
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# static monitor targets, batch version
#======> pg-meta-1 [primary]
- labels: {cls: pg-meta, ins: pg-meta-1, ip: 10.10.10.10, role: primary, svc: pg-meta-primary}
targets: [10.10.10.10:9630, 10.10.10.10:9100, 10.10.10.10:9631, 10.10.10.10:9101]
#======> pg-test-1 [primary]
- labels: {cls: pg-test, ins: pg-test-1, ip: 10.10.10.11, role: primary, svc: pg-test-primary}
targets: [10.10.10.11:9630, 10.10.10.11:9100, 10.10.10.11:9631, 10.10.10.11:9101]
#======> pg-test-2 [replica]
- labels: {cls: pg-test, ins: pg-test-2, ip: 10.10.10.12, role: replica, svc: pg-test-replica}
targets: [10.10.10.12:9630, 10.10.10.12:9100, 10.10.10.12:9631, 10.10.10.12:9101]
#======> pg-test-3 [replica]
- labels: {cls: pg-test, ins: pg-test-3, ip: 10.10.10.13, role: replica, svc: pg-test-replica}
targets: [10.10.10.13:9630, 10.10.10.13:9100, 10.10.10.13:9631, 10.10.10.13:9101]
分立式配置
当 prometheus_sd_target
配置为single
模式时,Pigsty会采用分立式配置管理Prometheus监控目标。
每个监控实例,都拥有自己独占的单一配置文件:/etc/prometheus/targets/{{ pg_instance }}.yml
中。
以 pg-meta-1
实例为例,其配置文件位置为:/etc/prometheus/targets/pg-meta-1.yml
,内容为:
# pg-meta-1 [primary]
- labels: {cls: pg-meta, ins: pg-meta-1, ip: 10.10.10.10, role: primary, svc: pg-meta-primary}
targets: [10.10.10.10:9630, 10.10.10.10:9100, 10.10.10.10:9631, 10.10.10.10:9101]
8.9 - Tuned模板
几种预制的Tuned模板
8.9.1 - OLTP
Tuned OLTP模板
Tuned OLTP模板主要针对延迟进行优化,此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-06-29
# Desc : Tune operatiing system to oltp mode
# Path : /etc/tuned/oltp/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL OLTP System
include=network-latency
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}
# total shmem size in pages: $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}
# total shmem segs 4096 -> 8192
kernel.shmmni=8192
# total msg queue number, set to mem size in MB
kernel.msgmni=32768
# max length of message queue
kernel.msgmnb=65536
# max size of message
kernel.msgmax=65536
kernel.pid_max=131072
# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536
# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0
# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000
#-------------------------------------------------------------#
# VM #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0
# disable when most mem are for file cache
vm.zone_reclaim_mode=0
# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80
# vm.dirty_background_bytes=67108864 # 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_ratio=3 # latency-performance default
vm.dirty_ratio=10 # latency-performance default
# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536
#-------------------------------------------------------------#
# Filesystem #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160
# max concurrent unfinished async io, should be larger than 1M. 65536->1M
fs.aio-max-nr=1048576
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304
# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# max connection tracking number
net.netfilter.nf_conntrack_max=1048576
8.9.2 - TINY
Tuned TINY模板
Tuned TINY模板主要针对极低配置的虚拟机进行优化,
此模板针对的典型机型是1核/1GB的虚拟机节点。您可以根据自己的实际机型进行调整。
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-06-29
# Desc : Tune operatiing system to tiny mode
# Path : /etc/tuned/tiny/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL TINY System
# include=virtual-guest
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up. Workloads
# that mostly use file mappings may be able to use even higher values.
#
# The generator of dirty data starts writeback at this percentage (system default
# is 20%)
vm.dirty_ratio = 40
# Filesystem I/O is usually much more efficient than swapping, so try to keep
# swapping low. It's usually safe to go even lower than this on systems with
# server-grade storage.
vm.swappiness = 30
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
8.9.3 - OLAP
Tuned OLAP模板,针对高并行,长查询,高吞吐实例优化
Tuned OLAP模板主要针对吞吐量与计算并行度进行优化
此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-09-18
# Desc : Tune operatiing system to olap mode
# Path : /etc/tuned/olap/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL OLAP System
include=network-throughput
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}
# total shmem size in pages: $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}
# total shmem segs 4096 -> 8192
kernel.shmmni=8192
# total msg queue number, set to mem size in MB
kernel.msgmni=32768
# max length of message queue
kernel.msgmnb=65536
# max size of message
kernel.msgmax=65536
kernel.pid_max=131072
# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536
# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0
# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000
#-------------------------------------------------------------#
# VM #
#-------------------------------------------------------------#
# try not using swap
# vm.swappiness=10
# disable when most mem are for file cache
vm.zone_reclaim_mode=0
# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=80
vm.dirty_background_ratio = 10 # throughput-performance default
vm.dirty_ratio=80 # throughput-performance default 40 -> 80
# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536
#-------------------------------------------------------------#
# Filesystem #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160
# max concurrent unfinished async io, should be larger than 1M. 65536->1M
fs.aio-max-nr=1048576
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304
# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# max connection tracking number
net.netfilter.nf_conntrack_max=1048576
8.9.4 - CRIT
Tuned CRIT模板,针对金融场景、不允许数据丢失错漏的场景进行优化。
Tuned CRIT模板主要针对RPO进行优化,尽可能减少内存中脏数据的量。
此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。
# tuned configuration
#==============================================================#
# File : tuned.conf
# Mtime : 2020-06-29
# Desc : Tune operatiing system to crit mode
# Path : /etc/tuned/crit/tuned.conf
# Author : Vonng(fengruohang@outlook.com)
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
[main]
summary=Optimize for PostgreSQL CRIT System
include=network-latency
[cpu]
force_latency=1
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[vm]
# disable transparent hugepages
transparent_hugepages=never
[sysctl]
#-------------------------------------------------------------#
# KERNEL #
#-------------------------------------------------------------#
# disable numa balancing
kernel.numa_balancing=0
# total shmem size in bytes: $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
{% if param_shmall is defined and param_shmall != '' %}
kernel.shmall = {{ param_shmall }}
{% endif %}
# total shmem size in pages: $(expr $(getconf _PHYS_PAGES) / 2)
{% if param_shmmax is defined and param_shmmax != '' %}
kernel.shmmax = {{ param_shmmax }}
{% endif %}
# total shmem segs 4096 -> 8192
kernel.shmmni=8192
# total msg queue number, set to mem size in MB
kernel.msgmni=32768
# max length of message queue
kernel.msgmnb=65536
# max size of message
kernel.msgmax=65536
kernel.pid_max=131072
# max(Sem in Set)=2048, max(Sem)=max(Sem in Set) x max(SemSet) , max(Sem per Ops)=2048, max(SemSet)=65536
kernel.sem=2048 134217728 2048 65536
# do not sched postgres process in group
kernel.sched_autogroup_enabled = 0
# total time the scheduler will consider a migrated process cache hot and, thus, less likely to be remigrated
# defaut = 0.5ms (500000ns), update to 5ms , depending on your typical query (e.g < 1ms)
kernel.sched_migration_cost_ns=5000000
#-------------------------------------------------------------#
# VM #
#-------------------------------------------------------------#
# try not using swap
vm.swappiness=0
# disable when most mem are for file cache
vm.zone_reclaim_mode=0
# overcommit threshhold = 80%
vm.overcommit_memory=2
vm.overcommit_ratio=100
# 64MB mem (2xRAID cache) wake the bgwriter
vm.dirty_background_bytes=67108864
# vm.dirty_background_ratio=3 # latency-performance default
vm.dirty_ratio=6 # latency-performance default
# deny access on 0x00000 - 0x10000
vm.mmap_min_addr=65536
#-------------------------------------------------------------#
# Filesystem #
#-------------------------------------------------------------#
# max open files: 382589 -> 167772160
fs.file-max=167772160
# max concurrent unfinished async io, should be larger than 1M. 65536->1M
fs.aio-max-nr=1048576
#-------------------------------------------------------------#
# Network #
#-------------------------------------------------------------#
# max connection in listen queue (triggers retrans if full)
net.core.somaxconn=65535
net.core.netdev_max_backlog=8192
# tcp receive/transmit buffer default = 256KiB
net.core.rmem_default=262144
net.core.wmem_default=262144
# receive/transmit buffer limit = 4MiB
net.core.rmem_max=4194304
net.core.wmem_max=4194304
# ip options
net.ipv4.ip_forward=1
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_local_port_range=32768 65000
# tcp options
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_tw_recycle=0
net.ipv4.tcp_syncookies=0
net.ipv4.tcp_synack_retries=1
net.ipv4.tcp_syn_retries=1
# tcp read/write buffer
net.ipv4.tcp_rmem="4096 87380 16777216"
net.ipv4.tcp_wmem="4096 16384 16777216"
net.ipv4.udp_mem="3145728 4194304 16777216"
# tcp probe fail interval: 75s -> 20s
net.ipv4.tcp_keepalive_intvl=20
# tcp break after 3 * 20s = 1m
net.ipv4.tcp_keepalive_probes=3
# probe peroid = 1 min
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_fin_timeout=5
net.ipv4.tcp_max_tw_buckets=262144
net.ipv4.tcp_max_syn_backlog=8192
net.ipv4.neigh.default.gc_thresh1=80000
net.ipv4.neigh.default.gc_thresh2=90000
net.ipv4.neigh.default.gc_thresh3=100000
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# max connection tracking number
net.netfilter.nf_conntrack_max=1048576
8.10 - Patroni模板
Pigsty预置的四种Patroni模板
Pigsty使用Patroni管理与初始化Postgres数据库集群。
Pigsty使用Patroni完成供给的主体工作,即使用户选择了无Patroni模式,拉起数据库集群也会由Patroni负责,并在创建完成后移除Patroni组件。
用户可以通过Patroni配置文件,完成大部分的PostgreSQL集群定制工作,Patroni配置文件格式详情请参考 Patroni官方文档。
预定义模板
Pigsty提供了四种预定义的初始化模板,初始化模板是用于初始化数据库集群的定义文件,默认位于roles/postgres/templates/
。包括:
oltp.yml
OLTP模板,默认配置,针对生产机型优化延迟与性能。
- `olap.yml OLAP模板,提高并行度,针对吞吐量,长查询进行优化。
crit.yml
) 核心业务模板,基于OLTP模板针对RPO、安全性、数据完整性进行优化,启用同步复制与数据校验和。
tiny.yml
微型数据库模板,针对低资源场景进行优化,例如运行于虚拟机中的演示数据库集群。
通过 pg_conf
参数指定所需使用的模板路径,如果使用预制模板,则只需填入模板文件名称即可。
如果使用定制的 Patroni配置模板,通常也应当针对机器节点使用配套的 节点优化模板。
更详细的配置信息,请参考 PG供给
8.10.1 - OLTP
Patroni OLTP模板
Patroni OLTP模板主要针对延迟进行优化,此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。
#!/usr/bin/env patroni
#==============================================================#
# File : patroni.yml
# Ctime : 2020-04-08
# Mtime : 2020-12-22
# Desc : patroni cluster definition for {{ pg_cluster }} (oltp)
# Path : /pg/bin/patroni.yml
# Real Path : /pg/conf/{{ pg_instance }}.yml
# Link : /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note : Transactional Database Cluster Template
# Doc : https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# OLTP database are optimized for performance, rt latency
# typical spec: 64 Core | 400 GB RAM | PCI-E SSD xTB
---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/ # namespace
scope: {{ pg_cluster }} # cluster name
name: {{ pg_instance }} # instance name
#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
level: INFO # NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
dir: /pg/log/ # default log file: /pg/log/patroni.log
file_size: 100000000 # 100MB log triggers a log rotate
# format: '%(asctime)s %(levelname)s: %(message)s'
#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
host: 127.0.0.1:8500
consistency: default # default|consistent|stale
register_service: true
service_check_interval: 15s
service_tags:
- {{ pg_cluster }}
#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
listen: 0.0.0.0:{{ patroni_port }}
connect_address: {{ inventory_hostname }}:{{ patroni_port }}
authentication:
verify_client: none # none|optional|required
username: {{ pg_monitor_username }}
password: '{{ pg_monitor_password }}'
#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
optional:
insecure: true
# cacert: '/path/to/ca/cert'
# certfile: '/path/to/cert/file'
# keyfile: '/path/to/key/file'
#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
nofailover: false
clonefrom: true
noloadbalance: false
nosync: false
{% if pg_upstream is defined %}
replicatefrom: {{ pg_upstream }} # clone from another replica rather than primary
{% endif %}
#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
mode: {{ patroni_watchdog_mode }}
device: /dev/watchdog
# safety_margin: 10s
#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:
#----------------------------------------------------------------------------
# bootstrap method
#----------------------------------------------------------------------------
method: initdb
# add custom bootstrap method here
# default bootstrap method: initdb
initdb:
- locale: C
- encoding: UTF8
# - data-checksums # enable data-checksum
#----------------------------------------------------------------------------
# bootstrap users
#---------------------------------------------------------------------------
# additional users which need to be created after initializing new cluster
# replication user and monitor user are required
users:
{{ pg_replication_username }}:
password: '{{ pg_replication_password }}'
{{ pg_monitor_username }}:
password: '{{ pg_monitor_password }}'
{{ pg_admin_username }}:
password: '{{ pg_admin_password }}'
# bootstrap hba, allow local and intranet password access & replication
# will be overwritten later
pg_hba:
- local all postgres ident
- local all all md5
- host all all 0.0.0.0/0 md5
- local replication postgres ident
- local replication all md5
- host replication all 0.0.0.0/0 md5
#----------------------------------------------------------------------------
# template
#---------------------------------------------------------------------------
# post_init: /pg/bin/pg-init
#----------------------------------------------------------------------------
# bootstrap config
#---------------------------------------------------------------------------
# this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
# if will NOT take any effect after cluster bootstrap
dcs:
{% if pg_role == 'primary' and pg_upstream is defined %}
#----------------------------------------------------------------------------
# standby cluster definition
#---------------------------------------------------------------------------
standby_cluster:
host: {{ pg_upstream }}
port: {{ pg_port }}
# primary_slot_name: patroni # must be create manually on upstream server, if specified
create_replica_methods:
- basebackup
{% endif %}
#----------------------------------------------------------------------------
# important parameters
#---------------------------------------------------------------------------
# constraint: ttl >: loop_wait + retry_timeout * 2
# the number of seconds the loop will sleep. Default value: 10
# this is patroni check loop interval
loop_wait: 10
# the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
# config this according to your network condition to avoid false-positive failover
ttl: 30
# timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
retry_timeout: 10
# the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
# Max RTO: 2 loop wait + master_start_timeout
master_start_timeout: 10
# import: candidate will not be promoted if replication lag is higher than this
# maximum RPO: 1MB
maximum_lag_on_failover: 1048576
# The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
master_stop_timeout: 30
# turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
# set to true for RPO mode
synchronous_mode: false
# prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
synchronous_mode_strict: false
#----------------------------------------------------------------------------
# postgres parameters
#---------------------------------------------------------------------------
postgresql:
use_slots: true
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
parameters:
#----------------------------------------------------------------------
# IMPORTANT PARAMETERS
#----------------------------------------------------------------------
max_connections: 400 # 100 -> 400
superuser_reserved_connections: 10 # reserve 10 connection for su
max_locks_per_transaction: 128 # 64 -> 128
max_prepared_transactions: 0 # 0 disable 2PC
track_commit_timestamp: on # enabled xact timestamp
max_worker_processes: 8 # default 8, set to cpu core
wal_level: logical # logical
wal_log_hints: on # wal log hints to support rewind
max_wal_senders: 16 # 10 -> 16
max_replication_slots: 16 # 10 -> 16
wal_keep_size: 100GB # keep at least 100GB WAL
password_encryption: md5 # use traditional md5 auth
#----------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#----------------------------------------------------------------------
# memory: shared_buffers and maintenance_work_mem will be dynamically set
shared_buffers: {{ pg_shared_buffers }}
maintenance_work_mem: {{ pg_maintenance_work_mem }}
work_mem: 32MB # 4MB -> 32MB
huge_pages: try # try huge pages
temp_file_limit: 100GB # 0 -> 100GB
vacuum_cost_delay: 2ms # wait 2ms per 10000 cost
vacuum_cost_limit: 10000 # 10000 cost each round
bgwriter_delay: 10ms # check dirty page every 10ms
bgwriter_lru_maxpages: 800 # 100 -> 800
bgwriter_lru_multiplier: 5.0 # 2.0 -> 5.0 more cushion buffer
#----------------------------------------------------------------------
# WAL
#----------------------------------------------------------------------
wal_buffers: 16MB # max to 16MB
wal_writer_delay: 20ms # wait period
wal_writer_flush_after: 1MB # max allowed data loss
min_wal_size: 100GB # at least 100GB WAL
max_wal_size: 400GB # at most 400GB WAL
commit_delay: 20 # 200ms -> 20ms, increase speed
commit_siblings: 10 # 5 -> 10
checkpoint_timeout: 60min # checkpoint 5min -> 1h
checkpoint_completion_target: 0.95 # 0.5 -> 0.95
archive_mode: on
archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'
#----------------------------------------------------------------------
# REPLICATION
#----------------------------------------------------------------------
# synchronous_standby_names: ''
vacuum_defer_cleanup_age: 50000 # 0->50000 last 50000 xact changes will not be vacuumed
promote_trigger_file: promote.signal # default promote trigger file path
max_standby_archive_delay: 10min # max delay before canceling queries when reading WAL from archive;
max_standby_streaming_delay: 3min # max delay before canceling queries when reading streaming WAL;
wal_receiver_status_interval: 1s # send replies at least this often
hot_standby_feedback: on # send info from standby to prevent query conflicts
wal_receiver_timeout: 60s # time that receiver waits for
max_logical_replication_workers: 8 # 4 -> 8
max_sync_workers_per_subscription: 8 # 4 -> 8
#----------------------------------------------------------------------
# QUERY TUNING
#----------------------------------------------------------------------
# planner
# enable_partitionwise_join: on
random_page_cost: 1.1 # 4 for HDD, 1.1 for SSD
effective_cache_size: 320GB # max mem - shared buffer
default_statistics_target: 1000 # stat bucket 100 -> 1000
#----------------------------------------------------------------------
# REPORTING AND LOGGING
#----------------------------------------------------------------------
log_destination: csvlog # use standard csv log
logging_collector: on # enable csvlog
log_directory: log # default log dir: /pg/data/log
# log_filename: 'postgresql-%a.log' # weekly auto-recycle
log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
log_checkpoints: on # log checkpoint info
log_lock_waits: on # log lock wait info
log_replication_commands: on # log replication info
log_statement: ddl # log ddl change
log_min_duration_statement: 100 # log slow query (>100ms)
#----------------------------------------------------------------------
# STATISTICS
#----------------------------------------------------------------------
track_io_timing: on # collect io statistics
track_functions: all # track all functions (none|pl|all)
track_activity_query_size: 8192 # max query length in pg_stat_activity
#----------------------------------------------------------------------
# AUTOVACUUM
#----------------------------------------------------------------------
log_autovacuum_min_duration: 1s # log autovacuum activity take more than 1s
autovacuum_max_workers: 3 # default autovacuum worker 3
autovacuum_naptime: 1min # default autovacuum naptime 1min
autovacuum_vacuum_scale_factor: 0.08 # fraction of table size before vacuum 20% -> 8%
autovacuum_analyze_scale_factor: 0.04 # fraction of table size before analyze 10% -> 4%
autovacuum_vacuum_cost_delay: -1 # default vacuum cost delay: same as vacuum_cost_delay
autovacuum_vacuum_cost_limit: -1 # default vacuum cost limit: same as vacuum_cost_limit
autovacuum_freeze_max_age: 100000000 # age > 1 billion triggers force vacuum
#----------------------------------------------------------------------
# CLIENT
#----------------------------------------------------------------------
deadlock_timeout: 50ms # 50ms for deadlock
idle_in_transaction_session_timeout: 10min # 10min timeout for idle in transaction
#----------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#----------------------------------------------------------------------
# extensions
shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'
# auto_explain
auto_explain.log_min_duration: 1s # auto explain query slower than 1s
auto_explain.log_analyze: true # explain analyze
auto_explain.log_verbose: true # explain verbose
auto_explain.log_timing: true # explain timing
auto_explain.log_nested_statements: true
# pg_stat_statements
pg_stat_statements.max: 10000 # 5000 -> 10000 queries
pg_stat_statements.track: all # track all statements (all|top|none)
pg_stat_statements.track_utility: off # do not track query other than CRUD
pg_stat_statements.track_planning: off # do not track planning metrics
#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:
#----------------------------------------------------------------------------
# how to connect to postgres
#----------------------------------------------------------------------------
bin_dir: {{ pg_bin_dir }}
data_dir: {{ pg_data }}
config_dir: {{ pg_data }}
pgpass: {{ pg_dbsu_home }}/.pgpass
listen: {{ pg_listen }}:{{ pg_port }}
connect_address: {{ inventory_hostname }}:{{ pg_port }}
use_unix_socket: true # default: /var/run/postgresql, /tmp
#----------------------------------------------------------------------------
# who to connect to postgres
#----------------------------------------------------------------------------
authentication:
superuser:
username: {{ pg_dbsu }}
replication:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
rewind:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
#----------------------------------------------------------------------------
# how to react to database operations
#----------------------------------------------------------------------------
# event callback script log: /pg/log/callback.log
callbacks:
on_start: /pg/bin/pg-failover-callback
on_stop: /pg/bin/pg-failover-callback
on_reload: /pg/bin/pg-failover-callback
on_restart: /pg/bin/pg-failover-callback
on_role_change: /pg/bin/pg-failover-callback
# rewind policy: data checksum should be enabled before using rewind
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: false
#----------------------------------------------------------------------------
# how to create replica
#----------------------------------------------------------------------------
# create replica method: default pg_basebackup
create_replica_methods:
- basebackup
basebackup:
- max-rate: '1000M'
- checkpoint: fast
- status-interva: 1s
- verbose
- progress
#----------------------------------------------------------------------------
# ad hoc parameters (overwrite with default)
#----------------------------------------------------------------------------
# parameters:
#----------------------------------------------------------------------------
# host based authentication, overwrite default pg_hba.conf
#----------------------------------------------------------------------------
# pg_hba:
# - local all postgres ident
# - local all all md5
# - host all all 0.0.0.0/0 md5
# - local replication postgres ident
# - local replication all md5
# - host replication all 0.0.0.0/0 md5
...
8.10.2 - TINY
Patroni TINY模板
Patroni TINY模板主要针对极低配置的虚拟机进行优化,
此模板针对的典型机型是1核/1GB的虚拟机节点。您可以根据自己的实际机型进行调整。
#!/usr/bin/env patroni
#==============================================================#
# File : patroni.yml
# Ctime : 2020-04-08
# Mtime : 2020-12-22
# Desc : patroni cluster definition for {{ pg_cluster }} (tiny)
# Path : /pg/bin/patroni.yml
# Real Path : /pg/conf/{{ pg_instance }}.yml
# Link : /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note : Tiny Database Cluster Template
# Doc : https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# TINY database are optimized for low-resource situation (e.g 1 Core 1G)
# typical spec: 1 Core | 1-4 GB RAM | Normal SSD 10x GB
---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/ # namespace
scope: {{ pg_cluster }} # cluster name
name: {{ pg_instance }} # instance name
#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
level: INFO # NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
dir: /pg/log/ # default log file: /pg/log/patroni.log
file_size: 100000000 # 100MB log triggers a log rotate
# format: '%(asctime)s %(levelname)s: %(message)s'
#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
host: 127.0.0.1:8500
consistency: default # default|consistent|stale
register_service: true
service_check_interval: 15s
service_tags:
- {{ pg_cluster }}
#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
listen: 0.0.0.0:{{ patroni_port }}
connect_address: {{ inventory_hostname }}:{{ patroni_port }}
authentication:
verify_client: none # none|optional|required
username: {{ pg_monitor_username }}
password: '{{ pg_monitor_password }}'
#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
optional:
insecure: true
# cacert: '/path/to/ca/cert'
# certfile: '/path/to/cert/file'
# keyfile: '/path/to/key/file'
#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
nofailover: false
clonefrom: true
noloadbalance: false
nosync: false
{% if pg_upstream is defined %}
replicatefrom: {{ pg_upstream }} # clone from another replica rather than primary
{% endif %}
#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
mode: {{ patroni_watchdog_mode }}
device: /dev/watchdog
# safety_margin: 10s
#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:
#----------------------------------------------------------------------------
# bootstrap method
#----------------------------------------------------------------------------
method: initdb
# add custom bootstrap method here
# default bootstrap method: initdb
initdb:
- locale: C
- encoding: UTF8
- data-checksums # enable data-checksum
#----------------------------------------------------------------------------
# bootstrap users
#---------------------------------------------------------------------------
# additional users which need to be created after initializing new cluster
# replication user and monitor user are required
users:
{{ pg_replication_username }}:
password: '{{ pg_replication_password }}'
{{ pg_monitor_username }}:
password: '{{ pg_monitor_password }}'
# bootstrap hba, allow local and intranet password access & replication
# will be overwritten later
pg_hba:
- local all postgres ident
- local all all md5
- host all all 0.0.0.0/0 md5
- local replication postgres ident
- local replication all md5
- host replication all 0.0.0.0/0 md5
#----------------------------------------------------------------------------
# customization
#---------------------------------------------------------------------------
# post_init: /pg/bin/pg-init
#----------------------------------------------------------------------------
# bootstrap config
#---------------------------------------------------------------------------
# this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
# if will NOT take any effect after cluster bootstrap
dcs:
{% if pg_role == 'primary' and pg_upstream is defined %}
#----------------------------------------------------------------------------
# standby cluster definition
#---------------------------------------------------------------------------
standby_cluster:
host: {{ pg_upstream }}
port: {{ pg_port }}
# primary_slot_name: patroni # must be create manually on upstream server, if specified
create_replica_methods:
- basebackup
{% endif %}
#----------------------------------------------------------------------------
# important parameters
#---------------------------------------------------------------------------
# constraint: ttl >: loop_wait + retry_timeout * 2
# the number of seconds the loop will sleep. Default value: 10
# this is patroni check loop interval
loop_wait: 10
# the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
# config this according to your network condition to avoid false-positive failover
ttl: 30
# timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
retry_timeout: 10
# the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
# Max RTO: 2 loop wait + master_start_timeout
master_start_timeout: 10
# import: candidate will not be promoted if replication lag is higher than this
# maximum RPO: 1MB
maximum_lag_on_failover: 1048576
# The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
master_stop_timeout: 30
# turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
# set to true for RPO mode
synchronous_mode: false
# prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
synchronous_mode_strict: false
#----------------------------------------------------------------------------
# postgres parameters
#---------------------------------------------------------------------------
postgresql:
use_slots: true
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
parameters:
#----------------------------------------------------------------------
# IMPORTANT PARAMETERS
#----------------------------------------------------------------------
max_connections: 50 # default 100 -> 50
superuser_reserved_connections: 10 # reserve 10 connection for su
max_locks_per_transaction: 64 # default 64
max_prepared_transactions: 0 # 0 disable 2PC
track_commit_timestamp: on # enabled xact timestamp
max_worker_processes: 1 # default 8 -> 1 (set to cpu core)
wal_level: logical # logical
wal_log_hints: on # wal log hints to support rewind
max_wal_senders: 10 # default 10
max_replication_slots: 10 # default 10
wal_keep_size: 1GB # keep at least 1GB WAL
password_encryption: md5 # use traditional md5 auth
#----------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#----------------------------------------------------------------------
# memory: shared_buffers and maintenance_work_mem will be dynamically set
shared_buffers: {{ pg_shared_buffers }}
maintenance_work_mem: {{ pg_maintenance_work_mem }}
work_mem: 4MB # default 4MB
huge_pages: try # try huge pages
temp_file_limit: 40GB # 0 -> 40GB (according to your disk)
vacuum_cost_delay: 5ms # wait 5ms per 10000 cost
vacuum_cost_limit: 10000 # 10000 cost each round
bgwriter_delay: 10ms # check dirty page every 10ms
bgwriter_lru_maxpages: 800 # 100 -> 800
bgwriter_lru_multiplier: 5.0 # 2.0 -> 5.0 more cushion buffer
#----------------------------------------------------------------------
# WAL
#----------------------------------------------------------------------
wal_buffers: 16MB # max to 16MB
wal_writer_delay: 20ms # wait period
wal_writer_flush_after: 1MB # max allowed data loss
min_wal_size: 100GB # at least 100GB WAL
max_wal_size: 400GB # at most 400GB WAL
commit_delay: 20 # 200ms -> 20ms, increase speed
commit_siblings: 10 # 5 -> 10
checkpoint_timeout: 15min # checkpoint 5min -> 15min
checkpoint_completion_target: 0.80 # 0.5 -> 0.8
archive_mode: on
archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'
#----------------------------------------------------------------------
# REPLICATION
#----------------------------------------------------------------------
# synchronous_standby_names: ''
vacuum_defer_cleanup_age: 50000 # 0->50000 last 50000 xact changes will not be vacuumed
promote_trigger_file: promote.signal # default promote trigger file path
max_standby_archive_delay: 10min # max delay before canceling queries when reading WAL from archive;
max_standby_streaming_delay: 3min # max delay before canceling queries when reading streaming WAL;
wal_receiver_status_interval: 1s # send replies at least this often
hot_standby_feedback: on # send info from standby to prevent query conflicts
wal_receiver_timeout: 60s # time that receiver waits for
max_logical_replication_workers: 8 # 4 -> 2 (set according to your cpu core)
max_sync_workers_per_subscription: 8 # 4 -> 2
#----------------------------------------------------------------------
# QUERY TUNING
#----------------------------------------------------------------------
# planner
# enable_partitionwise_join: on
random_page_cost: 1.1 # 4 for HDD, 1.1 for SSD
effective_cache_size: 2GB # max mem - shared buffer
default_statistics_target: 200 # stat bucket 100 -> 200
#----------------------------------------------------------------------
# REPORTING AND LOGGING
#----------------------------------------------------------------------
log_destination: csvlog # use standard csv log
logging_collector: on # enable csvlog
log_directory: log # default log dir: /pg/data/log
# log_filename: 'postgresql-%a.log' # weekly auto-recycle
log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
log_checkpoints: on # log checkpoint info
log_lock_waits: on # log lock wait info
log_replication_commands: on # log replication info
log_statement: ddl # log ddl change
log_min_duration_statement: 100 # log slow query (>100ms)
#----------------------------------------------------------------------
# STATISTICS
#----------------------------------------------------------------------
track_io_timing: on # collect io statistics
track_functions: all # track all functions (none|pl|all)
track_activity_query_size: 8192 # max query length in pg_stat_activity
#----------------------------------------------------------------------
# AUTOVACUUM
#----------------------------------------------------------------------
log_autovacuum_min_duration: 1s # log autovacuum activity take more than 1s
autovacuum_max_workers: 1 # default autovacuum worker 3 -> 1
autovacuum_naptime: 1min # default autovacuum naptime 1min
autovacuum_vacuum_scale_factor: 0.08 # fraction of table size before vacuum 20% -> 8%
autovacuum_analyze_scale_factor: 0.04 # fraction of table size before analyze 10% -> 4%
autovacuum_vacuum_cost_delay: -1 # default vacuum cost delay: same as vacuum_cost_delay
autovacuum_vacuum_cost_limit: -1 # default vacuum cost limit: same as vacuum_cost_limit
autovacuum_freeze_max_age: 100000000 # age > 1 billion triggers force vacuum
#----------------------------------------------------------------------
# CLIENT
#----------------------------------------------------------------------
deadlock_timeout: 50ms # 50ms for deadlock
idle_in_transaction_session_timeout: 10min # 10min timeout for idle in transaction
#----------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#----------------------------------------------------------------------
# extensions
shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'
# auto_explain
auto_explain.log_min_duration: 1s # auto explain query slower than 1s
auto_explain.log_analyze: true # explain analyze
auto_explain.log_verbose: true # explain verbose
auto_explain.log_timing: true # explain timing
auto_explain.log_nested_statements: true
# pg_stat_statements
pg_stat_statements.max: 3000 # 5000 -> 3000 queries
pg_stat_statements.track: all # track all statements (all|top|none)
pg_stat_statements.track_utility: off # do not track query other than CRUD
pg_stat_statements.track_planning: off # do not track planning metrics
#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:
#----------------------------------------------------------------------------
# how to connect to postgres
#----------------------------------------------------------------------------
bin_dir: {{ pg_bin_dir }}
data_dir: {{ pg_data }}
config_dir: {{ pg_data }}
pgpass: {{ pg_dbsu_home }}/.pgpass
listen: {{ pg_listen }}:{{ pg_port }}
connect_address: {{ inventory_hostname }}:{{ pg_port }}
use_unix_socket: true # default: /var/run/postgresql, /tmp
#----------------------------------------------------------------------------
# who to connect to postgres
#----------------------------------------------------------------------------
authentication:
superuser:
username: {{ pg_dbsu }}
replication:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
rewind:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
#----------------------------------------------------------------------------
# how to react to database operations
#----------------------------------------------------------------------------
# event callback script log: /pg/log/callback.log
callbacks:
on_start: /pg/bin/pg-failover-callback
on_stop: /pg/bin/pg-failover-callback
on_reload: /pg/bin/pg-failover-callback
on_restart: /pg/bin/pg-failover-callback
on_role_change: /pg/bin/pg-failover-callback
# rewind policy: data checksum should be enabled before using rewind
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: false
#----------------------------------------------------------------------------
# how to create replica
#----------------------------------------------------------------------------
# create replica method: default pg_basebackup
create_replica_methods:
- basebackup
basebackup:
- max-rate: '1000M'
- checkpoint: fast
- status-interva: 1s
- verbose
- progress
#----------------------------------------------------------------------------
# ad hoc parameters (overwrite with default)
#----------------------------------------------------------------------------
# parameters:
#----------------------------------------------------------------------------
# host based authentication, overwrite default pg_hba.conf
#----------------------------------------------------------------------------
# pg_hba:
# - local all postgres ident
# - local all all md5
# - host all all 0.0.0.0/0 md5
# - local replication postgres ident
# - local replication all md5
# - host replication all 0.0.0.0/0 md5
...
8.10.3 - OLAP
Patroni OLAP模板,针对高并行,长查询,高吞吐实例优化
Patroni OLAP模板主要针对吞吐量与计算并行度进行优化
此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。您可以根据自己的实际机型进行调整。
#!/usr/bin/env patroni
#==============================================================#
# File : patroni.yml
# Ctime : 2020-04-08
# Mtime : 2020-12-22
# Desc : patroni cluster definition for {{ pg_cluster }} (olap)
# Path : /pg/bin/patroni.yml
# Real Path : /pg/conf/{{ pg_instance }}.yml
# Link : /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note : Analysis Database Cluster Template
# Doc : https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# OLTP database are optimized for throughput
# typical spec: 64 Core | 400 GB RAM | PCI-E SSD xTB
---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/ # namespace
scope: {{ pg_cluster }} # cluster name
name: {{ pg_instance }} # instance name
#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
level: INFO # NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
dir: /pg/log/ # default log file: /pg/log/patroni.log
file_size: 100000000 # 100MB log triggers a log rotate
# format: '%(asctime)s %(levelname)s: %(message)s'
#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
host: 127.0.0.1:8500
consistency: default # default|consistent|stale
register_service: true
service_check_interval: 15s
service_tags:
- {{ pg_cluster }}
#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
listen: 0.0.0.0:{{ patroni_port }}
connect_address: {{ inventory_hostname }}:{{ patroni_port }}
authentication:
verify_client: none # none|optional|required
username: {{ pg_monitor_username }}
password: '{{ pg_monitor_password }}'
#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
optional:
insecure: true
# cacert: '/path/to/ca/cert'
# certfile: '/path/to/cert/file'
# keyfile: '/path/to/key/file'
#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
nofailover: false
clonefrom: true
noloadbalance: false
nosync: false
{% if pg_upstream is defined %}
replicatefrom: {{ pg_upstream }} # clone from another replica rather than primary
{% endif %}
#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
mode: {{ patroni_watchdog_mode }}
device: /dev/watchdog
# safety_margin: 10s
#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:
#----------------------------------------------------------------------------
# bootstrap method
#----------------------------------------------------------------------------
method: initdb
# add custom bootstrap method here
# default bootstrap method: initdb
initdb:
- locale: C
- encoding: UTF8
# - data-checksums # enable data-checksum
#----------------------------------------------------------------------------
# bootstrap users
#---------------------------------------------------------------------------
# additional users which need to be created after initializing new cluster
# replication user and monitor user are required
users:
{{ pg_replication_username }}:
password: '{{ pg_replication_password }}'
{{ pg_monitor_username }}:
password: '{{ pg_monitor_password }}'
{{ pg_admin_username }}:
password: '{{ pg_admin_password }}'
# bootstrap hba, allow local and intranet password access & replication
# will be overwritten later
pg_hba:
- local all postgres ident
- local all all md5
- host all all 0.0.0.0/0 md5
- local replication postgres ident
- local replication all md5
- host replication all 0.0.0.0/0 md5
#----------------------------------------------------------------------------
# template
#---------------------------------------------------------------------------
# post_init: /pg/bin/pg-init
#----------------------------------------------------------------------------
# bootstrap config
#---------------------------------------------------------------------------
# this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
# if will NOT take any effect after cluster bootstrap
dcs:
{% if pg_role == 'primary' and pg_upstream is defined %}
#----------------------------------------------------------------------------
# standby cluster definition
#---------------------------------------------------------------------------
standby_cluster:
host: {{ pg_upstream }}
port: {{ pg_port }}
# primary_slot_name: patroni # must be create manually on upstream server, if specified
create_replica_methods:
- basebackup
{% endif %}
#----------------------------------------------------------------------------
# important parameters
#---------------------------------------------------------------------------
# constraint: ttl >: loop_wait + retry_timeout * 2
# the number of seconds the loop will sleep. Default value: 10
# this is patroni check loop interval
loop_wait: 10
# the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
# config this according to your network condition to avoid false-positive failover
ttl: 30
# timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
retry_timeout: 10
# the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
# Max RTO: 2 loop wait + master_start_timeout
master_start_timeout: 10
# import: candidate will not be promoted if replication lag is higher than this
# maximum RPO: 16MB (analysis tolerate more data loss)
maximum_lag_on_failover: 16777216
# The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
master_stop_timeout: 30
# turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
# set to true for RPO mode
synchronous_mode: false
# prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
synchronous_mode_strict: false
#----------------------------------------------------------------------------
# postgres parameters
#---------------------------------------------------------------------------
postgresql:
use_slots: true
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
parameters:
#----------------------------------------------------------------------
# IMPORTANT PARAMETERS
#----------------------------------------------------------------------
max_connections: 400 # 100 -> 400
superuser_reserved_connections: 10 # reserve 10 connection for su
max_locks_per_transaction: 256 # 64 -> 256 (analysis)
max_prepared_transactions: 0 # 0 disable 2PC
track_commit_timestamp: on # enabled xact timestamp
max_worker_processes: 64 # default 8 -> 64, SET THIS ACCORDING TO YOUR CPU CORES
wal_level: logical # logical
wal_log_hints: on # wal log hints to support rewind
max_wal_senders: 16 # 10 -> 16
max_replication_slots: 16 # 10 -> 16
wal_keep_size: 100GB # keep at least 100GB WAL
password_encryption: md5 # use traditional md5 auth
#----------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#----------------------------------------------------------------------
# memory: shared_buffers and maintenance_work_mem will be dynamically set
shared_buffers: {{ pg_shared_buffers }}
maintenance_work_mem: {{ pg_maintenance_work_mem }}
work_mem: 128MB # 4MB -> 128MB (analysis)
huge_pages: try # try huge pages
temp_file_limit: 500GB # 0 -> 500GB (analysis)
vacuum_cost_delay: 2ms # wait 2ms per 10000 cost
vacuum_cost_limit: 10000 # 10000 cost each round
bgwriter_delay: 10ms # check dirty page every 10ms
bgwriter_lru_maxpages: 1600 # 100 -> 1600 (analysis)
bgwriter_lru_multiplier: 5.0 # 2.0 -> 5.0 more cushion buffer
max_parallel_workers: 64 # SET THIS ACCORDING TO YOUR CPU CORES
max_parallel_workers_per_gather: 64 # SET THIS ACCORDING TO YOUR CPU CORES
max_parallel_maintenance_workers: 4 # 2 -> 4
#----------------------------------------------------------------------
# WAL
#----------------------------------------------------------------------
wal_buffers: 16MB # max to 16MB
wal_writer_delay: 20ms # wait period
wal_writer_flush_after: 16MB # max allowed data loss (analysis)
min_wal_size: 100GB # at least 100GB WAL
max_wal_size: 400GB # at most 400GB WAL
commit_delay: 20 # 200ms -> 20ms, increase speed
commit_siblings: 10 # 5 -> 10
checkpoint_timeout: 60min # checkpoint 5min -> 1h
checkpoint_completion_target: 0.95 # 0.5 -> 0.95
archive_mode: on
archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'
#----------------------------------------------------------------------
# REPLICATION
#----------------------------------------------------------------------
# synchronous_standby_names: ''
vacuum_defer_cleanup_age: 0 # 0 (default)
promote_trigger_file: promote.signal # default promote trigger file path
max_standby_archive_delay: 10min # max delay before canceling queries when reading WAL from archive;
max_standby_streaming_delay: 3min # max delay before canceling queries when reading streaming WAL;
wal_receiver_status_interval: 1s # send replies at least this often
hot_standby_feedback: on # send info from standby to prevent query conflicts
wal_receiver_timeout: 60s # time that receiver waits for
max_logical_replication_workers: 8 # 4 -> 8
max_sync_workers_per_subscription: 8 # 4 -> 8
#----------------------------------------------------------------------
# QUERY TUNING
#----------------------------------------------------------------------
# planner
enable_partitionwise_join: on # enable on analysis
random_page_cost: 1.1 # 4 for HDD, 1.1 for SSD
effective_cache_size: 320GB # max mem - shared buffer
default_statistics_target: 1000 # stat bucket 100 -> 1000
jit: on # default on
jit_above_cost: 100000 # default jit threshold
#----------------------------------------------------------------------
# REPORTING AND LOGGING
#----------------------------------------------------------------------
log_destination: csvlog # use standard csv log
logging_collector: on # enable csvlog
log_directory: log # default log dir: /pg/data/log
# log_filename: 'postgresql-%a.log' # weekly auto-recycle
log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
log_checkpoints: on # log checkpoint info
log_lock_waits: on # log lock wait info
log_replication_commands: on # log replication info
log_statement: ddl # log ddl change
log_min_duration_statement: 1000 # log slow query (>1s)
#----------------------------------------------------------------------
# STATISTICS
#----------------------------------------------------------------------
track_io_timing: on # collect io statistics
track_functions: all # track all functions (none|pl|all)
track_activity_query_size: 8192 # max query length in pg_stat_activity
#----------------------------------------------------------------------
# AUTOVACUUM
#----------------------------------------------------------------------
log_autovacuum_min_duration: 1s # log autovacuum activity take more than 1s
autovacuum_max_workers: 3 # default autovacuum worker 3
autovacuum_naptime: 1min # default autovacuum naptime 1min
autovacuum_vacuum_scale_factor: 0.08 # fraction of table size before vacuum 20% -> 8%
autovacuum_analyze_scale_factor: 0.04 # fraction of table size before analyze 10% -> 4%
autovacuum_vacuum_cost_delay: -1 # default vacuum cost delay: same as vacuum_cost_delay
autovacuum_vacuum_cost_limit: -1 # default vacuum cost limit: same as vacuum_cost_limit
autovacuum_freeze_max_age: 100000000 # age > 1 billion triggers force vacuum
#----------------------------------------------------------------------
# CLIENT
#----------------------------------------------------------------------
deadlock_timeout: 50ms # 50ms for deadlock
idle_in_transaction_session_timeout: 0 # Disable idle in xact timeout in analysis database
#----------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#----------------------------------------------------------------------
# extensions
shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'
# auto_explain
auto_explain.log_min_duration: 1s # auto explain query slower than 1s
auto_explain.log_analyze: true # explain analyze
auto_explain.log_verbose: true # explain verbose
auto_explain.log_timing: true # explain timing
auto_explain.log_nested_statements: true
# pg_stat_statements
pg_stat_statements.max: 10000 # 5000 -> 10000 queries
pg_stat_statements.track: all # track all statements (all|top|none)
pg_stat_statements.track_utility: off # do not track query other than CRUD
pg_stat_statements.track_planning: off # do not track planning metrics
#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:
#----------------------------------------------------------------------------
# how to connect to postgres
#----------------------------------------------------------------------------
bin_dir: {{ pg_bin_dir }}
data_dir: {{ pg_data }}
config_dir: {{ pg_data }}
pgpass: {{ pg_dbsu_home }}/.pgpass
listen: {{ pg_listen }}:{{ pg_port }}
connect_address: {{ inventory_hostname }}:{{ pg_port }}
use_unix_socket: true # default: /var/run/postgresql, /tmp
#----------------------------------------------------------------------------
# who to connect to postgres
#----------------------------------------------------------------------------
authentication:
superuser:
username: {{ pg_dbsu }}
replication:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
rewind:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
#----------------------------------------------------------------------------
# how to react to database operations
#----------------------------------------------------------------------------
# event callback script log: /pg/log/callback.log
callbacks:
on_start: /pg/bin/pg-failover-callback
on_stop: /pg/bin/pg-failover-callback
on_reload: /pg/bin/pg-failover-callback
on_restart: /pg/bin/pg-failover-callback
on_role_change: /pg/bin/pg-failover-callback
# rewind policy: data checksum should be enabled before using rewind
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: false
#----------------------------------------------------------------------------
# how to create replica
#----------------------------------------------------------------------------
# create replica method: default pg_basebackup
create_replica_methods:
- basebackup
basebackup:
- max-rate: '1000M'
- checkpoint: fast
- status-interva: 1s
- verbose
- progress
#----------------------------------------------------------------------------
# ad hoc parameters (overwrite with default)
#----------------------------------------------------------------------------
# parameters:
#----------------------------------------------------------------------------
# host based authentication, overwrite default pg_hba.conf
#----------------------------------------------------------------------------
# pg_hba:
# - local all postgres ident
# - local all all md5
# - host all all 0.0.0.0/0 md5
# - local replication postgres ident
# - local replication all md5
# - host replication all 0.0.0.0/0 md5
...
8.10.4 - CRIT
Patroni CRIT模板,针对金融场景、不允许数据丢失错漏的场景进行优化。
Patroni CRIT模板主要针对RPO进行优化,采用同步复制,发生故障时确保不会有数据丢失。
此模板针对的机型是Dell R740 64核/400GB内存,使用PCI-E SSD的节点。用户可以根据自己的实际机型进行调整。
#!/usr/bin/env patroni
#==============================================================#
# File : patroni.yml
# Ctime : 2020-04-08
# Mtime : 2020-12-22
# Desc : patroni cluster definition for {{ pg_cluster }} (crit)
# Path : /pg/bin/patroni.yml
# Real Path : /pg/conf/{{ pg_instance }}.yml
# Link : /pg/bin/patroni.yml -> /pg/conf/{{ pg_instance}}.yml
# Note : Critical Database Cluster Template
# Doc : https://patroni.readthedocs.io/en/latest/SETTINGS.html
# Copyright (C) 2018-2021 Ruohang Feng
#==============================================================#
# CRIT database are optimized for security, integrity, RPO
# typical spec: 64 Core | 400 GB RAM | PCI-E SSD xTB
---
#------------------------------------------------------------------------------
# identity
#------------------------------------------------------------------------------
namespace: {{ pg_namespace }}/ # namespace
scope: {{ pg_cluster }} # cluster name
name: {{ pg_instance }} # instance name
#------------------------------------------------------------------------------
# log
#------------------------------------------------------------------------------
log:
level: INFO # NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
dir: /pg/log/ # default log file: /pg/log/patroni.log
file_size: 100000000 # 100MB log triggers a log rotate
# format: '%(asctime)s %(levelname)s: %(message)s'
#------------------------------------------------------------------------------
# dcs
#------------------------------------------------------------------------------
consul:
host: 127.0.0.1:8500
consistency: default # default|consistent|stale
register_service: true
service_check_interval: 15s
service_tags:
- {{ pg_cluster }}
#------------------------------------------------------------------------------
# api
#------------------------------------------------------------------------------
# how to expose patroni service
# listen on all ipv4, connect via public ip, use same credential as dbuser_monitor
restapi:
listen: 0.0.0.0:{{ patroni_port }}
connect_address: {{ inventory_hostname }}:{{ patroni_port }}
authentication:
verify_client: none # none|optional|required
username: {{ pg_monitor_username }}
password: '{{ pg_monitor_password }}'
#------------------------------------------------------------------------------
# ctl
#------------------------------------------------------------------------------
ctl:
optional:
insecure: true
# cacert: '/path/to/ca/cert'
# certfile: '/path/to/cert/file'
# keyfile: '/path/to/key/file'
#------------------------------------------------------------------------------
# tags
#------------------------------------------------------------------------------
tags:
nofailover: false
clonefrom: true
noloadbalance: false
nosync: false
{% if pg_upstream is defined %}
replicatefrom: {{ pg_upstream }} # clone from another replica rather than primary
{% endif %}
#------------------------------------------------------------------------------
# watchdog
#------------------------------------------------------------------------------
# available mode: off|automatic|required
watchdog:
mode: {{ patroni_watchdog_mode }}
device: /dev/watchdog
# safety_margin: 10s
#------------------------------------------------------------------------------
# bootstrap
#------------------------------------------------------------------------------
bootstrap:
#----------------------------------------------------------------------------
# bootstrap method
#----------------------------------------------------------------------------
method: initdb
# add custom bootstrap method here
# default bootstrap method: initdb
initdb:
- locale: C
- encoding: UTF8
# - data-checksums # enable data-checksum
#----------------------------------------------------------------------------
# bootstrap users
#---------------------------------------------------------------------------
# additional users which need to be created after initializing new cluster
# replication user and monitor user are required
users:
{{ pg_replication_username }}:
password: '{{ pg_replication_password }}'
{{ pg_monitor_username }}:
password: '{{ pg_monitor_password }}'
{{ pg_admin_username }}:
password: '{{ pg_admin_password }}'
# bootstrap hba, allow local and intranet password access & replication
# will be overwritten later
pg_hba:
- local all postgres ident
- local all all md5
- host all all 0.0.0.0/0 md5
- local replication postgres ident
- local replication all md5
- host replication all 0.0.0.0/0 md5
#----------------------------------------------------------------------------
# template
#---------------------------------------------------------------------------
# post_init: /pg/bin/pg-init
#----------------------------------------------------------------------------
# bootstrap config
#---------------------------------------------------------------------------
# this section will be written to /{{ pg_namespace }}/{{ pg_cluster }}/config
# if will NOT take any effect after cluster bootstrap
dcs:
{% if pg_role == 'primary' and pg_upstream is defined %}
#----------------------------------------------------------------------------
# standby cluster definition
#---------------------------------------------------------------------------
standby_cluster:
host: {{ pg_upstream }}
port: {{ pg_port }}
# primary_slot_name: patroni # must be create manually on upstream server, if specified
create_replica_methods:
- basebackup
{% endif %}
#----------------------------------------------------------------------------
# important parameters
#---------------------------------------------------------------------------
# constraint: ttl >: loop_wait + retry_timeout * 2
# the number of seconds the loop will sleep. Default value: 10
# this is patroni check loop interval
loop_wait: 10
# the TTL to acquire the leader lock (in seconds). Think of it as the length of time before initiation of the automatic failover process. Default value: 30
# config this according to your network condition to avoid false-positive failover
ttl: 30
# timeout for DCS and PostgreSQL operation retries (in seconds). DCS or network issues shorter than this will not cause Patroni to demote the leader. Default value: 10
retry_timeout: 10
# the amount of time a master is allowed to recover from failures before failover is triggered (in seconds)
# Max RTO: 2 loop wait + master_start_timeout
master_start_timeout: 120 # more patient on critical database
# import: candidate will not be promoted if replication lag is higher than this
# maximum RPO: 0 for critical database
maximum_lag_on_failover: 1
# The number of seconds Patroni is allowed to wait when stopping Postgres and effective only when synchronous_mode is enabled
master_stop_timeout: 10 # more patient on critical database
# turns on synchronous replication mode. In this mode a replica will be chosen as synchronous and only the latest leader and synchronous replica are able to participate in leader election
# set to true for RPO mode
synchronous_mode: true # use sync replication on critical database
# prevents disabling synchronous replication if no synchronous replicas are available, blocking all client writes to the master
synchronous_mode_strict: false
#----------------------------------------------------------------------------
# postgres parameters
#---------------------------------------------------------------------------
postgresql:
use_slots: true
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
parameters:
#----------------------------------------------------------------------
# IMPORTANT PARAMETERS
#----------------------------------------------------------------------
max_connections: 400 # 100 -> 400
superuser_reserved_connections: 10 # reserve 10 connection for su
max_locks_per_transaction: 128 # 64 -> 128
max_prepared_transactions: 0 # 0 disable 2PC
track_commit_timestamp: on # enabled xact timestamp
max_worker_processes: 8 # default 8, set to cpu core
wal_level: logical # logical
wal_log_hints: on # wal log hints to support rewind
max_wal_senders: 16 # 10 -> 16
max_replication_slots: 16 # 10 -> 16
wal_keep_size: 100GB # keep at least 100GB WAL
password_encryption: md5 # use traditional md5 auth
#----------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#----------------------------------------------------------------------
# memory: shared_buffers and maintenance_work_mem will be dynamically set
shared_buffers: {{ pg_shared_buffers }}
maintenance_work_mem: {{ pg_maintenance_work_mem }}
work_mem: 32MB # 4MB -> 32MB
huge_pages: try # try huge pages
temp_file_limit: 100GB # 0 -> 100GB
vacuum_cost_delay: 2ms # wait 2ms per 10000 cost
vacuum_cost_limit: 10000 # 10000 cost each round
bgwriter_delay: 10ms # check dirty page every 10ms
bgwriter_lru_maxpages: 800 # 100 -> 800
bgwriter_lru_multiplier: 5.0 # 2.0 -> 5.0 more cushion buffer
#----------------------------------------------------------------------
# WAL
#----------------------------------------------------------------------
wal_buffers: 16MB # max to 16MB
wal_writer_delay: 20ms # wait period
wal_writer_flush_after: 1MB # max allowed data loss
min_wal_size: 100GB # at least 100GB WAL
max_wal_size: 400GB # at most 400GB WAL
commit_delay: 20 # 200ms -> 20ms, increase speed
commit_siblings: 10 # 5 -> 10
checkpoint_timeout: 60min # checkpoint 5min -> 1h
checkpoint_completion_target: 0.95 # 0.5 -> 0.95
archive_mode: on
archive_command: 'wal_dir=/pg/arcwal; [[ $(date +%H%M) == 1200 ]] && rm -rf ${wal_dir}/$(date -d"yesterday" +%Y%m%d); /bin/mkdir -p ${wal_dir}/$(date +%Y%m%d) && /usr/bin/lz4 -q -z %p > ${wal_dir}/$(date +%Y%m%d)/%f.lz4'
#----------------------------------------------------------------------
# REPLICATION
#----------------------------------------------------------------------
# synchronous_standby_names: ''
vacuum_defer_cleanup_age: 50000 # 0->50000 last 50000 xact changes will not be vacuumed
promote_trigger_file: promote.signal # default promote trigger file path
max_standby_archive_delay: 10min # max delay before canceling queries when reading WAL from archive;
max_standby_streaming_delay: 3min # max delay before canceling queries when reading streaming WAL;
wal_receiver_status_interval: 1s # send replies at least this often
hot_standby_feedback: on # send info from standby to prevent query conflicts
wal_receiver_timeout: 60s # time that receiver waits for
max_logical_replication_workers: 8 # 4 -> 8
max_sync_workers_per_subscription: 8 # 4 -> 8
#----------------------------------------------------------------------
# QUERY TUNING
#----------------------------------------------------------------------
# planner
# enable_partitionwise_join: on
random_page_cost: 1.1 # 4 for HDD, 1.1 for SSD
effective_cache_size: 320GB # max mem - shared buffer
default_statistics_target: 1000 # stat bucket 100 -> 1000
#----------------------------------------------------------------------
# REPORTING AND LOGGING
#----------------------------------------------------------------------
log_destination: csvlog # use standard csv log
logging_collector: on # enable csvlog
log_directory: log # default log dir: /pg/data/log
# log_filename: 'postgresql-%a.log' # weekly auto-recycle
log_filename: 'postgresql-%Y-%m-%d.log' # YYYY-MM-DD full log retention
log_checkpoints: on # log checkpoint info
log_lock_waits: on # log lock wait info
log_replication_commands: on # log replication info
log_statement: ddl # log ddl change
log_min_duration_statement: 100 # log slow query (>100ms)
#----------------------------------------------------------------------
# STATISTICS
#----------------------------------------------------------------------
track_io_timing: on # collect io statistics
track_functions: all # track all functions (none|pl|all)
track_activity_query_size: 32768 # show full query on critical database
#----------------------------------------------------------------------
# AUTOVACUUM
#----------------------------------------------------------------------
log_autovacuum_min_duration: 1s # log autovacuum activity take more than 1s
autovacuum_max_workers: 3 # default autovacuum worker 3
autovacuum_naptime: 1min # default autovacuum naptime 1min
autovacuum_vacuum_scale_factor: 0.08 # fraction of table size before vacuum 20% -> 8%
autovacuum_analyze_scale_factor: 0.04 # fraction of table size before analyze 10% -> 4%
autovacuum_vacuum_cost_delay: -1 # default vacuum cost delay: same as vacuum_cost_delay
autovacuum_vacuum_cost_limit: -1 # default vacuum cost limit: same as vacuum_cost_limit
autovacuum_freeze_max_age: 100000000 # age > 1 billion triggers force vacuum
#----------------------------------------------------------------------
# CLIENT
#----------------------------------------------------------------------
deadlock_timeout: 50ms # 50ms for deadlock
idle_in_transaction_session_timeout: 1min # 1min timeout for idle in transaction in critical database
#----------------------------------------------------------------------
# CUSTOMIZED OPTIONS
#----------------------------------------------------------------------
# extensions
shared_preload_libraries: '{{ pg_shared_libraries | default("pg_stat_statements, auto_explain") }}'
# auto_explain
auto_explain.log_min_duration: 1s # auto explain query slower than 1s
auto_explain.log_analyze: true # explain analyze
auto_explain.log_verbose: true # explain verbose
auto_explain.log_timing: true # explain timing
auto_explain.log_nested_statements: true
# pg_stat_statements
pg_stat_statements.max: 10000 # 5000 -> 10000 queries
pg_stat_statements.track: all # track all statements (all|top|none)
pg_stat_statements.track_utility: on # TRACK all queries on critical database
pg_stat_statements.track_planning: off # do not track planning metrics
#------------------------------------------------------------------------------
# postgres
#------------------------------------------------------------------------------
postgresql:
#----------------------------------------------------------------------------
# how to connect to postgres
#----------------------------------------------------------------------------
bin_dir: {{ pg_bin_dir }}
data_dir: {{ pg_data }}
config_dir: {{ pg_data }}
pgpass: {{ pg_dbsu_home }}/.pgpass
listen: {{ pg_listen }}:{{ pg_port }}
connect_address: {{ inventory_hostname }}:{{ pg_port }}
use_unix_socket: true # default: /var/run/postgresql, /tmp
#----------------------------------------------------------------------------
# who to connect to postgres
#----------------------------------------------------------------------------
authentication:
superuser:
username: {{ pg_dbsu }}
replication:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
rewind:
username: {{ pg_replication_username }}
password: '{{ pg_replication_password }}'
#----------------------------------------------------------------------------
# how to react to database operations
#----------------------------------------------------------------------------
# event callback script log: /pg/log/callback.log
callbacks:
on_start: /pg/bin/pg-failover-callback
on_stop: /pg/bin/pg-failover-callback
on_reload: /pg/bin/pg-failover-callback
on_restart: /pg/bin/pg-failover-callback
on_role_change: /pg/bin/pg-failover-callback
# rewind policy: data checksum should be enabled before using rewind
use_pg_rewind: true
remove_data_directory_on_rewind_failure: true
remove_data_directory_on_diverged_timelines: false
#----------------------------------------------------------------------------
# how to create replica
#----------------------------------------------------------------------------
# create replica method: default pg_basebackup
create_replica_methods:
- basebackup
basebackup:
- max-rate: '1000M'
- checkpoint: fast
- status-interva: 1s
- verbose
- progress
#----------------------------------------------------------------------------
# ad hoc parameters (overwrite with default)
#----------------------------------------------------------------------------
# parameters:
#----------------------------------------------------------------------------
# host based authentication, overwrite default pg_hba.conf
#----------------------------------------------------------------------------
# pg_hba:
# - local all postgres ident
# - local all all md5
# - host all all 0.0.0.0/0 md5
# - local replication postgres ident
# - local replication all md5
# - host replication all 0.0.0.0/0 md5
...
9 - 专业支持
需要专业支持?看看这儿!
Pigsty是一个开源系统,欢迎各位贡献PR或ISSUE。
但是,时间是是很宝贵的啊,同志们,如果我天天都来处理各种疑难杂症,可就没时间来写Bug了!
专业支持
Pigsty亦提供可选的专业支持,包括下列扩展内容与服务支持:
- 管控界面
- 完整的监控系统,包含约三千余项监控指标。
- 安全加固
- 额外的监控面板,提供更为丰富的集群监控信息。
- 生产级部署运维管理方案
- 元数据库建设,全局数据字典
- 日志收集系统,日志摘要信息聚合汇总
- 备份/恢复,并发备份、延时备份、备份校验等一条龙解决方案
- 协助部署,系统集成,对接监控报警基础设施或接入已有数据库
- 故障诊断服务
- 答疑咨询培训
- 其他定制化需求
详情咨询 @Vonng(rh@vonng.com)
9.1 - 同类对比
与其他PostgreSQL监控系统的横向对比
概览
下面是PostgreSQL的相关监控系统。
下面是一些候选竞品,但没一个能打的,还是得我行我上。
横向对比
这里是指标数量的横向对比。这里只取和数据库相关的指标,也就是说机器CPU磁盘这些指标就抛开不计了。
有一些开源的、或者商业的,或者云厂商的PG监控系统,这里根据它们公开的代码或文档进行统计。一家之言有卖瓜自夸之嫌,欢迎各位指正。至少在数量级上,这个图还是没有太大问题的,详情参考文末连接。
有人可能会问,虽然指标很多看起来很厉害的样子,但这有什么实际意义呢?诚然,对于故障预警来说只需要有几个关键性指标就可以了。但是充分的指标覆盖率,能进一步提高我们对数据库的洞察力与掌控力,而这一点是再高也不过分的,多多益善。
竞品
PGWatch
PG Analyze
PGDash
PGMonitor
AWS RDS
Azure RDS
Aliyun RDS
参考连接
pgwatch
pgmonitor
datadog
pgDash
ClusterControl
pganalyze
Aliyun RDS
AWS RDS
Azure RDS
9.2 - 开源初心
为什么Pigsty会选择开源?
开发Pigsty的初心是希望弥补PostgreSQL开源生态中的遗憾。
Pigsty基于开源组件构建,因此也决定采用开源的方式回馈社区。
这个东西能不能卖钱呢,当然可以卖钱,所以也会提供可选的专业支持,供大户人家选购!
专业版会有更多的监控面板与指标,更美观的UI,更丰富的功能。但开源版本身对于生产使用也完全绰绰有余了。
那为什么要开源呢,除了打广告的因素,主要还真就是情怀了。
开源就是这样,靠的当然还是喜爱,热情与奉献。
就好比PostgreSQL,世界上最先进的开源关系型数据库,就是免费给大家用,多么有情怀。
我也算是吃PG这碗饭的,虽然写不出PG,但写个配得上PG的世界上最好的开源关系型数据库PostgreSQL的监控系统还是可以做到的。
Pigsty基于开源生态,回馈开源社区,希望Pigsty能在大家使用PG的过程中起到帮助,提升使用PG的体验和爽度!
9.3 - 群组
问题交流
Overview
9.4 - 路线图
Pigsty项目的下一步发展规划
版本规划
Pigsty当前版本为v0.8.0,仍处于Beta状态。但保证供给方案功能Freeze,API不再发生变化。
下一个版本v0.9.0将对监控系统指标,规则,可视化方案进行最后一次整体校订,进入RC状态(2021年5月)。
将于v1.0(2021年中)进入GA状态(2021年6月)。
v1.0后,供给方案部分不再添加新功能,着重关注监控系统指标、面板的开发与优化。
长期规划
将Pigsty做成完整的PostgreSQL私有云平台,包括完整的:
同时将着重开发Pigsty专业版功能,包括: