ストリーミングレプリケーション(つづき)
【補】構築
-
じっさいにdockerでmaster-standby構成を(雑に)組んだ
- ストリーミングレプリケーション環境 (docker)
- HBAはザル
-
PG12での変更点
- キータ
recovery.conf
がない-
かわりに
standby.signal
ファイルを作る必要があるtouch
でいい
/docker:267138f94e60:/var/lib/postgresql/data $ ls -lA standby.signal
-rw------- 1 postgres postgres 0 2020-02-10 04:00 standby.signal
-
pg_basebackup
でmasterからベースバックアップを持ってくるときの-R
オプションは続投standby.signal
ファイルが生成される
Usage:
pg_basebackup [OPTION]...
Options controlling the output:
...
-R, --write-recovery-conf
write configuration for replication
- もう
recovery.conf
はないのに--write-recovery-conf
… -
-R
オプションつきでpg_basebackup
を呼び出すと、postgres.auto.conf
にprimary_conninfo
が自動生成される- プライマリ接続先情報
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=replication_user passfile=''/var/lib/postgresql/.pgpass'' host=master port=5432 sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any'
- これがあると
postgresql.conf
で何書いても効かないので注意する
レプリケーションの状況確認
- ストリーミングレプリケーションを安全に運用するために
サーバーログの確認
-
master
- 同期レプリケーションだと出てくるやつ
master_1 | 2020-02-22 16:41:04.809 UTC [97] LOG: standby "sby" is now a synchronous standby with priority 1
- standby
standby1_1 | 2020-02-22 16:41:04.709 UTC [21] LOG: started streaming WAL from primary at 0/3000000 on timeline 1
プロセスの確認
- master
~ $ docker-compose exec master ps faxww | grep postgres | grep wal
91 ? Ss 0:00 postgres: walwriter
97 ? Ss 0:00 postgres: walsender replication_user 172.20.0.4(53194) streaming 0/301E060
- standby
~ $ docker-compose exec standby1 ps faxww | grep postgres | grep wal
21 ? Ss 0:00 \_ postgres: walreceiver streaming 0/301E060
レプリケーション遅延の確認
- プロセスの存在は確認できた
- ちゃんと動いているか確認する
- masterにて:
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 97
usesysid | 16384
usename | replication_user
application_name | walreceiver
client_addr | 172.22.0.3
client_hostname |
client_port | 32834
backend_start | 2020-02-10 15:09:34.524869+00
backend_xmin |
state | streaming
sent_lsn | 0/3013310
write_lsn | 0/3013310
flush_lsn | 0/3013310
replay_lsn | 0/3013310
write_lag |
flush_lag |
replay_lag |
sync_priority | 1
sync_state | sync
reply_time | 2020-02-10 15:16:16.326192+00
- スタンバイ
walreceiver
がプライマリからデータを同期的に受け取る準備ができている
application_name | walreceiver
sync_state | sync
-
レプリケーション遅延
-
「同期」といっても、プライマリとスタンバイの状態が常に同じとは限らない
- masterの
synchronous_commit
設定がremote_apply
でない限りは遅延しうる
- masterの
-
- 遅延具合
sent_lsn | 0/3013310
write_lsn | 0/3013310
flush_lsn | 0/3013310
replay_lsn | 0/3013310
sent_lsn
: プライマリがスタンバイに送出したLSNwrite_lsn
: スタンバイが受け取ってバッファに書き込んだLSNflush_lsn
: スタンバイがディスクに書き込んだLSNreplay_lsn
: スタンバイがWALを適用したLSN- 今回は全部同じなので完全に同期できている
レプリケーションの管理
- プライマリが死んだら?
- 自動的に立ち直って欲しいところだが…
- PostgreSQL12時点では、手を加えてスタンバイを昇格してあげる必要あり
recovery.confのtrigger_fileを用いる方法
- standbyの設定
postgres=# SELECT name,setting,context FROM pg_settings WHERE name='promote_trigger_file';
name | setting | context
----------------------+---------+---------
promote_trigger_file | | sighup
(1 row)
- 指定のパスにファイルがあるのを検出したらmasterに昇格する
- ためす
postgresql.conf
promote_trigger_file = 'trigger.file'
- reload必要
- レプリケーション状態でmasterを停止してみる
standby1_1 | 2020-02-10 15:36:49.623 UTC [21] LOG: replication terminated by primary server
standby1_1 | 2020-02-10 15:36:49.623 UTC [21] DETAIL: End of WAL reached on timeline 1 at 0/3013388.
standby1_1 | 2020-02-10 15:36:49.623 UTC [21] FATAL: could not send end-of-streaming message to primary: no COPY in progress
standby1_1 | 2020-02-10 15:36:49.623 UTC [17] LOG: invalid record length at 0/3013388: wanted 24, got 0
master_1 | 2020-02-10 15:36:49.624 UTC [1] LOG: database system is shut down
standby1_1 | 2020-02-10 15:36:49.625 UTC [34] FATAL: could not connect to the primary server: could not connect to server: Connection refused
standby1_1 | Is the server running on host "master" (172.22.0.2) and accepting
standby1_1 | TCP/IP connections on port 5432?
streaming_replication_master_1 exited with code 0
standby1_1 | 2020-02-10 15:36:54.636 UTC [35] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:36:59.641 UTC [36] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
- standbyサーバーで所定のファイルを生成
postgres@a1ed78157801:~/data$ touch trigger.file
-
ほどなくmasterに昇格する
- 若干のタイムラグあり
standby1_1 | 2020-02-10 15:37:04.649 UTC [37] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:37:09.653 UTC [38] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:37:14.653 UTC [17] LOG: promote trigger file found: trigger.file
standby1_1 | 2020-02-10 15:37:14.653 UTC [17] LOG: redo done at 0/3013310
standby1_1 | 2020-02-10 15:37:14.653 UTC [17] LOG: last completed transaction was at log time 2020-02-10 15:10:05.098076+00
standby1_1 | 2020-02-10 15:37:14.661 UTC [17] LOG: selected new timeline ID: 2
standby1_1 | 2020-02-10 15:37:14.774 UTC [17] LOG: archive recovery complete
standby1_1 | 2020-02-10 15:37:14.788 UTC [16] LOG: database system is ready to accept connections
pg_ctl promoteを用いる方法
standbyサーバーにログインしてコマンド実行
postgres@3a3af865b441:/$ pg_ctl promote
waiting for server to promote.... done
server promoted
- ログ
master_1 | 2020-02-10 15:50:07.545 UTC [1] LOG: database system is shut down
standby1_1 | 2020-02-10 15:50:07.547 UTC [22] FATAL: could not connect to the primary server: could not connect to server: Connection refused
standby1_1 | Is the server running on host "master" (172.23.0.2) and accepting
standby1_1 | TCP/IP connections on port 5432?
streaming_replication_master_1 exited with code 0
standby1_1 | 2020-02-10 15:50:12.557 UTC [23] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:50:17.563 UTC [24] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:50:22.566 UTC [25] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:50:27.572 UTC [31] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:50:28.906 UTC [16] LOG: received promote request
standby1_1 | 2020-02-10 15:50:28.906 UTC [16] LOG: redo done at 0/3000060
standby1_1 | 2020-02-10 15:50:28.909 UTC [16] LOG: selected new timeline ID: 2
standby1_1 | 2020-02-10 15:50:29.017 UTC [16] LOG: archive recovery complete
standby1_1 | 2020-02-10 15:50:29.028 UTC [15] LOG: database system is ready to accept connections
- こちらはシグナル送信を起点に即昇格
【補】pg_promote
関数呼び出し
- 公式
- PostgreSQL12で追加されたやつ
- サーバーにログインしなくていいのが嬉しい
- クライアントからSQL文発行
postgres@7054e6e2b74f:/$ psql -h standby1 -c "SELECT pg_promote();"
pg_promote
------------
t
(1 row)
- ログ
master_1 | 2020-02-10 15:52:42.567 UTC [1] LOG: database system is shut down
standby1_1 | 2020-02-10 15:52:42.568 UTC [23] FATAL: could not connect to the primary server: could not connect to server: Connection refused
standby1_1 | Is the server running on host "master" (172.24.0.2) and accepting
standby1_1 | TCP/IP connections on port 5432?
streaming_replication_master_1 exited with code 0
standby1_1 | 2020-02-10 15:52:47.578 UTC [24] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:52:52.584 UTC [25] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:52:57.588 UTC [26] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:53:02.592 UTC [27] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:53:07.598 UTC [28] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:53:12.604 UTC [29] FATAL: could not connect to the primary server: could not translate host name "master" to address: Name or service not known
standby1_1 | 2020-02-10 15:53:15.994 UTC [30] LOG: connection received: host=172.24.0.5 port=60854
standby1_1 | 2020-02-10 15:53:15.994 UTC [30] LOG: connection authorized: user=postgres database=postgres application_name=psql
standby1_1 | 2020-02-10 15:53:15.998 UTC [17] LOG: received promote request
standby1_1 | 2020-02-10 15:53:15.998 UTC [17] LOG: redo done at 0/3000060
standby1_1 | 2020-02-10 15:53:16.028 UTC [17] LOG: selected new timeline ID: 2
standby1_1 | 2020-02-10 15:53:16.139 UTC [17] LOG: archive recovery complete
standby1_1 | 2020-02-10 15:53:16.155 UTC [16] LOG: database system is ready to accept connections
standby1_1 | 2020-02-10 15:53:16.201 UTC [30] LOG: disconnection: session time: 0:00:00.207 user=postgres database=postgres host=172.24.0.5 port=60854
共通
- 昇格したstandbyのタイムラインIDは1つ繰り上がる
昇格前
standby1_1 | 2020-02-10 15:36:49.623 UTC [21] DETAIL: End of WAL reached on timeline 1 at 0/3013388.
昇格時
standby1_1 | 2020-02-10 15:37:14.661 UTC [17] LOG: selected new timeline ID: 2
設定手順の整理
- docker環境(再掲)
- ストリーミングレプリケーション環境 (docker)
Column: 循環するレプリケーション
- 全部スタンバイにできちゃう
- 設定をしくじるとプライマリがない状態になっちゃう。気をつけよう