死活監視
- データベースサービスを「使える」こと
サーバの死活監視
-
ping
- ネットワークインタフェースの正常動作
~ $ ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.025 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.025/0.025/0.025/0.000 ms
~/sandbox/git/sample $
~ $ echo $?
0
- 疎通していないと
1
PostgreSQLの死活監視(プロセスの確認)
psコマンドを用いる方法
ps faxww | grep postgres
1 ? Ss 0:00 postgres
57 ? Ss 0:00 postgres: checkpointer
58 ? Ss 0:00 postgres: background writer
59 ? Ss 0:00 postgres: walwriter
60 ? Ss 0:00 postgres: autovacuum launcher
61 ? Ss 0:00 postgres: stats collector
62 ? Ss 0:00 postgres: logical replication launcher
-
psのオプションの意味
a
: 全ユーザのx
: ttyのないものも (バックグラウンドプロセス)f
(forest): ツリー状に表示ww
: unlimited width
Column: プロセス確認の落とし穴
-
偽陽性
- PostgreSQL以外の、
postgres
が名前に含まれるプロセス
- PostgreSQL以外の、
-
偽陰性
-
マスタサーバプロセスが
postmaster
で起動することがある- systemdの起動スクリプトで起動した場合など?
-
pg_isreadyコマンドを用いる方法
- Unixドメインソケット
pg_isready
/var/run/postgresql:5432 - accepting connections
- TCP/IP
postgres@31982ca0e241:/$ pg_isready -h localhost
localhost:5432 - accepting connections
- 正常動作: exitコード
0
postgres@31982ca0e241:/$ echo $?
0
- 起動していない場合: exitコード
2
postgres@31982ca0e241:/$ pg_isready -h localhost -p 5433
localhost:5433 - no response
postgres@31982ca0e241:/$ echo $?
2
pg_isready
コマンドエラー: exitコード3
postgres@31982ca0e241:/$ pg_isready -H localhost
/usr/lib/postgresql/12/bin/pg_isready: invalid option -- 'H'
Try "pg_isready --help" for more information.
postgres@31982ca0e241:/$ echo $?
3
- 権限なし、起動中などでアクセスできない(rejectされる)場合はexitコード
1
PostgreSQLの死活監視(SQLの実行確認)
- 監視したいのは「データベースがちゃんと使える」こと
- SQLを実行できないと意味がない
- 最低限のSQLを実行してみる例
postgres@31982ca0e241:/$ psql postgres -c "SELECT 1"
postgres@31982ca0e241:/$ psql postgres -c "SELECT now()"
?column?
----------
1
(1 row)
now
-------------------------------
2020-02-04 11:53:54.324172+00
(1 row)
正常動作の監視
- 「想定する性能で使える」こと
- リソース面・負荷面を見る
サーバの正常動作の監視
-
リソースの利用状況確認
- vmstatで大まかに
- netstat, iostat, sarで詳細に
vmstatコマンド
postgres@31982ca0e241:/$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 0 0 25038680 291032 555204 0 0 1 6 11 31 0 0 100 0 0
netstatコマンド
-
apt-get install net-tools
で入れる- 古い
postgres@31982ca0e241:/$ netstat -atonp
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name Timer
tcp 0 0 0.0.0.0:5432 0.0.0.0:* LISTEN 1/postgres off (0.00/0/0)
tcp 0 0 172.17.0.2:57312 151.101.108.204:80 TIME_WAIT - timewait (49.30/0/0)
tcp 0 0 172.17.0.2:55900 204.145.124.244:80 TIME_WAIT - timewait (3.34/0/0)
tcp 0 0 172.17.0.2:36482 151.101.228.204:80 TIME_WAIT - timewait (3.19/0/0)
tcp 0 0 172.17.0.2:36480 151.101.228.204:80 TIME_WAIT - timewait (3.19/0/0)
tcp6 0 0 :::5432 :::* LISTEN 1/postgres off (0.00/0/0)
-
netstatのオプションの意味
-
-a
,--all
LISTEN
とそうでないもの両方表示
-
-t
,--tcp
-
TCPのみ表示
- Unixドメインソケット非表示
-
-
-o
,--timers
- Include information related to networking timers.
- timerカラム
-
-n
,--numeric
- ホスト名、ポート名等を解決
-
-p
,--program
- PIDとプログラム名表示
-
- Stateが
TIMEOUT
になっているものはないか - Recv-Q, Send-Qが想定以上の値になっていないか
Recv-Q
Established: The count of bytes not copied by the user program connected to this socket. Listening: Since
Kernel 2.6.18 this column contains the current syn backlog.
Send-Q
Established: The count of bytes not acknowledged by the remote host. Listening: Since Kernel 2.6.18 this
column contains the maximum size of the syn backlog.
【補】ssコマンド
- netstatの後発
apt-get install iproute2
で入れる
postgres@31982ca0e241:/$ ss -atonp
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 128 0.0.0.0:5432 0.0.0.0:* users:(("postgres",pid=1,fd=3))
LISTEN 0 128 [::]:5432 [::]:* users:(("postgres",pid=1,fd=4))
- timer情報どこ…?
iostatコマンド
postgres@31982ca0e241:/$ iostat
Linux 4.19.84-microsoft-standard (31982ca0e241) 02/04/2020 _x86_64_ (6 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.07 0.00 0.14 0.00 0.00 99.79
Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 0.07 0.00 35.32 165 4195612
sdb 0.74 6.07 2.04 721209 242452
sarコマンド
- いろいろ監視できる
-u
: CPUの状況確認
postgres@31982ca0e241:/$ sar -u 1
Linux 4.19.84-microsoft-standard (31982ca0e241) 02/04/2020 _x86_64_ (6 CPU)
12:25:45 PM CPU %user %nice %system %iowait %steal %idle
12:25:46 PM all 0.17 0.00 0.00 0.00 0.00 99.83
12:25:47 PM all 0.17 0.00 0.17 0.00 0.00 99.67
12:25:48 PM all 0.17 0.00 0.00 0.00 0.00 99.83
12:25:49 PM all 0.17 0.00 0.17 0.00 0.00 99.67
12:25:50 PM all 0.33 0.00 0.00 0.00 0.00 99.67
12:25:51 PM all 0.17 0.00 0.00 0.00 0.00 99.83
12:25:52 PM all 0.17 0.00 0.17 0.00 0.00 99.67
12:25:53 PM all 0.17 0.00 0.00 0.00 0.00 99.83
12:25:54 PM all 0.33 0.00 0.00 0.00 0.00 99.67
...
Average: all 0.20 0.00 0.06 0.00 0.00 99.74
-
-d
: I/Oの状況確認- iostatよりも詳細
postgres@31982ca0e241:/$ sar -d 1
Linux 4.19.84-microsoft-standard (31982ca0e241) 02/04/2020 _x86_64_ (6 CPU)
12:26:05 PM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
12:26:06 PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:06 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:06 PM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
12:26:07 PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:07 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:07 PM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
12:26:08 PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:08 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:08 PM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
12:26:09 PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:09 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:09 PM DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
12:26:10 PM dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:26:10 PM dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
...
Average: DEV tps rkB/s wkB/s areq-sz aqu-sz await svctm %util
Average: dev8-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: dev8-16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
PostgreSQlの正常動作の監視
- 負荷状況確認
- 統計情報収集が有効になっていること
postgres=# SELECT name,setting,context FROM pg_settings WHERE name LIKE 'track%';
name | setting | context
---------------------------+---------+------------
track_activities | on | superuser
track_activity_query_size | 1024 | postmaster
track_commit_timestamp | off | postmaster
track_counts | on | superuser
track_functions | none | superuser
track_io_timing | off | superuser
(6 rows)
-
track_activities
とtrack_counts
- いずれもスーパーユーザの
SET
文で設定可能
- いずれもスーパーユーザの
pgstatdatabaseビュー
postgres=# TABLE pg_stat_database;
datid | datname | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | checksum_failures | checksum_last_failure | blk_read_time | blk_write_time | stats_reset
-------+-----------+-------------+-------------+---------------+-----------+----------+--------------+-------------+--------------+-------------+-------------+-----------+------------+------------+-----------+-------------------+-----------------------+---------------+----------------+-------------------------------
0 | | 0 | 0 | 0 | 17 | 1289 | 455 | 274 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 | 2020-02-04 11:33:00.045219+00
13408 | postgres | 1 | 131 | 0 | 118 | 5921 | 72740 | 1430 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 | 2020-02-04 11:33:00.045198+00
1 | template1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 |
13407 | template0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | | 0 | 0 |
(4 rows)
blks_read
: 共有バッファ以外からのreadblks_hit
: 共有バッファからのread(キャッシュヒット)- キャッシュヒット率を算出できる
SELECT datname, (100.0 * blks_hit / (blks_hit + blks_read))
FROM pg_stat_database
WHERE datname='postgres';
datname | ?column?
----------+---------------------
postgres | 97.7080291970802920
(1 row)
pgstatuser_tablesビュー
postgres=# TABLE pg_stat_user_tables;
-[ RECORD 1 ]-------+-------
relid | 16385
schemaname | public
relname | tbl
seq_scan | 3
seq_tup_read | 9
idx_scan |
idx_tup_fetch |
n_tup_ins | 3
n_tup_upd | 3
n_tup_del | 0
n_tup_hot_upd | 3
n_live_tup | 3
n_dead_tup | 3
n_mod_since_analyze | 6
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0
-
n_tup_hot_upd
: HOT更新の回数-
HOT: Heap Only Tuple
- VACUUMが不要になる最適化
-
pgstatiousertablesビュー / pgstatiouserindexesビュー
postgres=# TABLE pg_statio_user_tables;
-[ RECORD 1 ]---+-------
relid | 16385
schemaname | public
relname | tbl
heap_blks_read | 5
heap_blks_hit | 49
idx_blks_read |
idx_blks_hit |
toast_blks_read |
toast_blks_hit |
tidx_blks_read |
tidx_blks_hit |
- テーブル/インデックス単位でキャッシュヒット率を算出できる
SELECT (100.0 * heap_blks_hit / (heap_blks_read + heap_blks_hit))
FROM pg_statio_user_tables;
-[ RECORD 1 ]-----------------
?column? | 90.7407407407407407
pgstatactivityビュー
postgres=# TABLE pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+----------+------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------+------------------------------
| | 60 | | | | | | | 2020-02-04 11:26:18.825327+00 | | | | Activity | AutoVacuumMain | | | | | autovacuum launcher
| | 62 | 10 | postgres | | | | | 2020-02-04 11:26:18.825748+00 | | | | Activity | LogicalLauncherMain | | | | | logical replication launcher
13408 | postgres | 1079 | 10 | postgres | psql | | | -1 | 2020-02-04 12:31:27.905789+00 | 2020-02-04 12:51:45.796916+00 | 2020-02-04 12:51:45.796916+00 | 2020-02-04 12:51:45.796917+00 | | | active | | 493 | TABLE pg_stat_activity; | client backend
| | 58 | | | | | | | 2020-02-04 11:26:18.824986+00 | | | | Activity | BgWriterHibernate | | | | | background writer
| | 57 | | | | | | | 2020-02-04 11:26:18.824728+00 | | | | Activity | CheckpointerMain | | | | | checkpointer
| | 59 | | | | | | | 2020-02-04 11:26:18.825119+00 | | | | Activity | WalWriterMain | | | | | walwriter
(6 rows)
-
動作中のプロセスの情報
- バックエンドプロセス
- PostgreSQL 10からはバックグラウンドプロセスの情報も
pg_locksビュー
- ロック状況を確認できる
postgres=# TABLE pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+------------------+---------+----------
relation | 13408 | 12143 | | | | | | | | 4/65 | 1126 | AccessShareLock | t | t
relation | 13408 | 16385 | | | | | | | | 4/65 | 1126 | RowExclusiveLock | t | t
virtualxid | | | | | 4/65 | | | | | 4/65 | 1126 | ExclusiveLock | t | t
relation | 13408 | 16385 | | | | | | | | 3/410 | 1079 | RowExclusiveLock | t | t
virtualxid | | | | | 3/410 | | | | | 3/410 | 1079 | ExclusiveLock | t | t
transactionid | | | | | | 495 | | | | 3/410 | 1079 | ShareLock | f | f
transactionid | | | | | | 496 | | | | 3/410 | 1079 | ExclusiveLock | t | f
transactionid | | | | | | 495 | | | | 4/65 | 1126 | ExclusiveLock | t | f
tuple | 13408 | 16385 | 0 | 10 | | | | | | 3/410 | 1079 | ExclusiveLock | t | f
(9 rows)
pg_stat_activity
ビュー等と結合して必要な情報を補う
postgres=# TABLE pg_stat_activity;
datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query | backend_type
-------+----------+------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+---------------------------+------------------------------
| | 60 | | | | | | | 2020-02-04 11:26:18.825327+00 | | | | Activity | AutoVacuumMain | | | | | autovacuum launcher
| | 62 | 10 | postgres | | | | | 2020-02-04 11:26:18.825748+00 | | | | Activity | LogicalLauncherMain | | | | | logical replication launcher
13408 | postgres | 1079 | 10 | postgres | psql | | | -1 | 2020-02-04 12:31:27.905789+00 | 2020-02-04 13:04:22.986041+00 | 2020-02-04 13:04:22.986041+00 | 2020-02-04 13:04:22.986043+00 | Lock | transactionid | active | 496 | 495 | update tbl set col=col+1; | client backend
13408 | postgres | 1126 | 10 | postgres | psql | | | -1 | 2020-02-04 13:02:31.511045+00 | 2020-02-04 13:04:18.73882+00 | 2020-02-04 13:06:23.754336+00 | 2020-02-04 13:06:23.754338+00 | | | active | 495 | 495 | TABLE pg_stat_activity; | client backend
| | 58 | | | | | | | 2020-02-04 11:26:18.824986+00 | | | | Activity | BgWriterHibernate | | | | | background writer
| | 57 | | | | | | | 2020-02-04 11:26:18.824728+00 | | | | Activity | CheckpointerMain | | | | | checkpointer
| | 59 | | | | | | | 2020-02-04 11:26:18.825119+00 | | | | Activity | WalWriterMain | | | | | walwriter
(7 rows)
- PostgreSQL 9.6以降では
pg_blocking_pids(<ブロックされているプロセスのPID>)
で、ブロックしているプロセスを確認できる
postgres=# SELECT pg_blocking_pids(1079);
pg_blocking_pids
------------------
{1126}
(1 row)
- PID1126が排他ロックでPID1079をブロックしている