内部構造から学ぶPostgreSQL 設計・運用計画の鉄則 ch12 死活監視と正常動作の監視

OSS-DBPostgreSQL勉強メモ

出典: 


死活監視

  • データベースサービスを「使える」こと

サーバの死活監視

  • ping

    • ネットワークインタフェースの正常動作
~ $ ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.025 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.025/0.025/0.025/0.000 ms
~/sandbox/git/sample $ 
~ $ echo $?
0
  • 疎通していないと1

PostgreSQLの死活監視(プロセスの確認)

psコマンドを用いる方法

ps faxww | grep postgres
    1 ?        Ss     0:00 postgres
   57 ?        Ss     0:00 postgres: checkpointer   
   58 ?        Ss     0:00 postgres: background writer   
   59 ?        Ss     0:00 postgres: walwriter   
   60 ?        Ss     0:00 postgres: autovacuum launcher   
   61 ?        Ss     0:00 postgres: stats collector   
   62 ?        Ss     0:00 postgres: logical replication launcher   
  • psのオプションの意味

    • a: 全ユーザの
    • x: ttyのないものも (バックグラウンドプロセス)
    • f (forest): ツリー状に表示
    • ww: unlimited width

Column: プロセス確認の落とし穴

  • 偽陽性

    • PostgreSQL以外の、postgresが名前に含まれるプロセス
  • 偽陰性

    • マスタサーバプロセスがpostmasterで起動することがある

      • systemdの起動スクリプトで起動した場合など?

pg_isreadyコマンドを用いる方法

  • Unixドメインソケット
pg_isready
/var/run/postgresql:5432 - accepting connections
  • TCP/IP
postgres@31982ca0e241:/$ pg_isready -h localhost
localhost:5432 - accepting connections
  • 正常動作: exitコード0
postgres@31982ca0e241:/$ echo $?
0
  • 起動していない場合: exitコード2
postgres@31982ca0e241:/$ pg_isready -h localhost -p 5433
localhost:5433 - no response
postgres@31982ca0e241:/$ echo $?
2
  • pg_isreadyコマンドエラー: exitコード3
postgres@31982ca0e241:/$ pg_isready -H localhost
/usr/lib/postgresql/12/bin/pg_isready: invalid option -- 'H'
Try "pg_isready --help" for more information.
postgres@31982ca0e241:/$ echo $?
3
  • 権限なし、起動中などでアクセスできない(rejectされる)場合はexitコード1

PostgreSQLの死活監視(SQLの実行確認)

  • 監視したいのは「データベースがちゃんと使える」こと
  • SQLを実行できないと意味がない
  • 最低限のSQLを実行してみる例
postgres@31982ca0e241:/$ psql postgres -c "SELECT 1"
postgres@31982ca0e241:/$ psql postgres -c "SELECT now()"
 ?column? 
----------
        1
(1 row)
              now              
-------------------------------
 2020-02-04 11:53:54.324172+00
(1 row)

正常動作の監視

  • 「想定する性能で使える」こと
  • リソース面・負荷面を見る

サーバの正常動作の監視

  • リソースの利用状況確認

    • vmstatで大まかに
    • netstat, iostat, sarで詳細に

vmstatコマンド

postgres@31982ca0e241:/$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0 25038680 291032 555204    0    0     1     6   11   31  0  0 100  0  0

netstatコマンド

  • apt-get install net-toolsで入れる

    • 古い
postgres@31982ca0e241:/$ netstat -atonp
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name     Timer
tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN      1/postgres           off (0.00/0/0)
tcp        0      0 172.17.0.2:57312        151.101.108.204:80      TIME_WAIT   -                    timewait (49.30/0/0)
tcp        0      0 172.17.0.2:55900        204.145.124.244:80      TIME_WAIT   -                    timewait (3.34/0/0)
tcp        0      0 172.17.0.2:36482        151.101.228.204:80      TIME_WAIT   -                    timewait (3.19/0/0)
tcp        0      0 172.17.0.2:36480        151.101.228.204:80      TIME_WAIT   -                    timewait (3.19/0/0)
tcp6       0      0 :::5432                 :::*                    LISTEN      1/postgres           off (0.00/0/0)
  • netstatのオプションの意味

    • -a, --all

      • LISTENとそうでないもの両方表示
    • -t, --tcp

      • TCPのみ表示

        • Unixドメインソケット非表示
    • -o, --timers

      • Include information related to networking timers.
      • timerカラム
    • -n, --numeric

      • ホスト名、ポート名等を解決
    • -p, --program

      • PIDとプログラム名表示
  • StateがTIMEOUTになっているものはないか
  • Recv-Q, Send-Qが想定以上の値になっていないか
   Recv-Q
       Established: The count of bytes not copied by the user program connected to this socket.  Listening: Since
       Kernel 2.6.18 this column contains the current syn backlog.

   Send-Q
       Established: The count of bytes not acknowledged by the remote host.  Listening: Since Kernel 2.6.18  this
       column contains the maximum size of the syn backlog.

【補】ssコマンド

  • netstatの後発
  • apt-get install iproute2で入れる
postgres@31982ca0e241:/$ ss -atonp
State     Recv-Q    Send-Q       Local Address:Port       Peer Address:Port                                      
LISTEN    0         128                0.0.0.0:5432            0.0.0.0:*       users:(("postgres",pid=1,fd=3))   
LISTEN    0         128                   [::]:5432               [::]:*       users:(("postgres",pid=1,fd=4))   
  • timer情報どこ…?

iostatコマンド

postgres@31982ca0e241:/$ iostat
Linux 4.19.84-microsoft-standard (31982ca0e241) 	02/04/2020 	_x86_64_	(6 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    0.14    0.00    0.00   99.79

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.07         0.00        35.32        165    4195612
sdb               0.74         6.07         2.04     721209     242452

sarコマンド

  • いろいろ監視できる
  • -u: CPUの状況確認
postgres@31982ca0e241:/$ sar -u 1
Linux 4.19.84-microsoft-standard (31982ca0e241) 	02/04/2020 	_x86_64_	(6 CPU)

12:25:45 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
12:25:46 PM     all      0.17      0.00      0.00      0.00      0.00     99.83
12:25:47 PM     all      0.17      0.00      0.17      0.00      0.00     99.67
12:25:48 PM     all      0.17      0.00      0.00      0.00      0.00     99.83
12:25:49 PM     all      0.17      0.00      0.17      0.00      0.00     99.67
12:25:50 PM     all      0.33      0.00      0.00      0.00      0.00     99.67
12:25:51 PM     all      0.17      0.00      0.00      0.00      0.00     99.83
12:25:52 PM     all      0.17      0.00      0.17      0.00      0.00     99.67
12:25:53 PM     all      0.17      0.00      0.00      0.00      0.00     99.83
12:25:54 PM     all      0.33      0.00      0.00      0.00      0.00     99.67
...


Average:        all      0.20      0.00      0.06      0.00      0.00     99.74
  • -d: I/Oの状況確認

    • iostatよりも詳細
postgres@31982ca0e241:/$ sar -d 1
Linux 4.19.84-microsoft-standard (31982ca0e241) 	02/04/2020 	_x86_64_	(6 CPU)

12:26:05 PM       DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
12:26:06 PM    dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:26:06 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:26:06 PM       DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
12:26:07 PM    dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:26:07 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:26:07 PM       DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
12:26:08 PM    dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:26:08 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:26:08 PM       DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
12:26:09 PM    dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:26:09 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

12:26:09 PM       DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
12:26:10 PM    dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
12:26:10 PM   dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
...



Average:          DEV       tps     rkB/s     wkB/s   areq-sz    aqu-sz     await     svctm     %util
Average:       dev8-0      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:      dev8-16      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00

PostgreSQlの正常動作の監視

  • 負荷状況確認
  • 統計情報収集が有効になっていること
postgres=# SELECT name,setting,context FROM pg_settings WHERE name LIKE 'track%';
           name            | setting |  context   
---------------------------+---------+------------
 track_activities          | on      | superuser
 track_activity_query_size | 1024    | postmaster
 track_commit_timestamp    | off     | postmaster
 track_counts              | on      | superuser
 track_functions           | none    | superuser
 track_io_timing           | off     | superuser
(6 rows)
  • track_activitiestrack_counts

    • いずれもスーパーユーザのSET文で設定可能

pgstatdatabaseビュー

postgres=# TABLE pg_stat_database;

 datid |  datname  | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | checksum_failures | checksum_last_failure | blk_read_time | blk_write_time |          stats_reset          
-------+-----------+-------------+-------------+---------------+-----------+----------+--------------+-------------+--------------+-------------+-------------+-----------+------------+------------+-----------+-------------------+-----------------------+---------------+----------------+-------------------------------
     0 |           |           0 |           0 |             0 |        17 |     1289 |          455 |         274 |            1 |           0 |           0 |         0 |          0 |          0 |         0 |                   |                       |             0 |              0 | 2020-02-04 11:33:00.045219+00
 13408 | postgres  |           1 |         131 |             0 |       118 |     5921 |        72740 |        1430 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |                   |                       |             0 |              0 | 2020-02-04 11:33:00.045198+00
     1 | template1 |           0 |           0 |             0 |         0 |        0 |            0 |           0 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |                   |                       |             0 |              0 | 
 13407 | template0 |           0 |           0 |             0 |         0 |        0 |            0 |           0 |            0 |           0 |           0 |         0 |          0 |          0 |         0 |                   |                       |             0 |              0 | 
(4 rows)
  • blks_read: 共有バッファ以外からのread
  • blks_hit: 共有バッファからのread(キャッシュヒット)
  • キャッシュヒット率を算出できる
SELECT datname, (100.0 * blks_hit / (blks_hit + blks_read)) 
  FROM pg_stat_database
 WHERE datname='postgres';
 datname  |      ?column?       
----------+---------------------
 postgres | 97.7080291970802920
(1 row)

pgstatuser_tablesビュー

postgres=# TABLE pg_stat_user_tables;

-[ RECORD 1 ]-------+-------
relid               | 16385
schemaname          | public
relname             | tbl
seq_scan            | 3
seq_tup_read        | 9
idx_scan            | 
idx_tup_fetch       | 
n_tup_ins           | 3
n_tup_upd           | 3
n_tup_del           | 0
n_tup_hot_upd       | 3
n_live_tup          | 3
n_dead_tup          | 3
n_mod_since_analyze | 6
last_vacuum         | 
last_autovacuum     | 
last_analyze        | 
last_autoanalyze    | 
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0
  • n_tup_hot_upd: HOT更新の回数

    • HOT: Heap Only Tuple

      • VACUUMが不要になる最適化

pgstatiousertablesビュー / pgstatiouserindexesビュー

postgres=# TABLE pg_statio_user_tables;

-[ RECORD 1 ]---+-------
relid           | 16385
schemaname      | public
relname         | tbl
heap_blks_read  | 5
heap_blks_hit   | 49
idx_blks_read   | 
idx_blks_hit    | 
toast_blks_read | 
toast_blks_hit  | 
tidx_blks_read  | 
tidx_blks_hit   | 
  • テーブル/インデックス単位でキャッシュヒット率を算出できる
SELECT (100.0 * heap_blks_hit / (heap_blks_read + heap_blks_hit))
  FROM pg_statio_user_tables;
-[ RECORD 1 ]-----------------
?column? | 90.7407407407407407

pgstatactivityビュー

postgres=# TABLE pg_stat_activity;

 datid | datname  | pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin |          query          |         backend_type         
-------+----------+------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------+------------------------------
       |          |   60 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.825327+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |                         | autovacuum launcher
       |          |   62 |       10 | postgres |                  |             |                 |             | 2020-02-04 11:26:18.825748+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |                         | logical replication launcher
 13408 | postgres | 1079 |       10 | postgres | psql             |             |                 |          -1 | 2020-02-04 12:31:27.905789+00 | 2020-02-04 12:51:45.796916+00 | 2020-02-04 12:51:45.796916+00 | 2020-02-04 12:51:45.796917+00 |                 |                     | active |             |          493 | TABLE pg_stat_activity; | client backend
       |          |   58 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.824986+00 |                               |                               |                               | Activity        | BgWriterHibernate   |        |             |              |                         | background writer
       |          |   57 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.824728+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |                         | checkpointer
       |          |   59 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.825119+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |                         | walwriter
(6 rows)
  • 動作中のプロセスの情報

    • バックエンドプロセス
    • PostgreSQL 10からはバックグラウンドプロセスの情報も

pg_locksビュー

  • ロック状況を確認できる
postgres=# TABLE pg_locks;

   locktype    | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid  |       mode       | granted | fastpath 
---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+------+------------------+---------+----------
 relation      |    13408 |    12143 |      |       |            |               |         |       |          | 4/65               | 1126 | AccessShareLock  | t       | t
 relation      |    13408 |    16385 |      |       |            |               |         |       |          | 4/65               | 1126 | RowExclusiveLock | t       | t
 virtualxid    |          |          |      |       | 4/65       |               |         |       |          | 4/65               | 1126 | ExclusiveLock    | t       | t
 relation      |    13408 |    16385 |      |       |            |               |         |       |          | 3/410              | 1079 | RowExclusiveLock | t       | t
 virtualxid    |          |          |      |       | 3/410      |               |         |       |          | 3/410              | 1079 | ExclusiveLock    | t       | t
 transactionid |          |          |      |       |            |           495 |         |       |          | 3/410              | 1079 | ShareLock        | f       | f
 transactionid |          |          |      |       |            |           496 |         |       |          | 3/410              | 1079 | ExclusiveLock    | t       | f
 transactionid |          |          |      |       |            |           495 |         |       |          | 4/65               | 1126 | ExclusiveLock    | t       | f
 tuple         |    13408 |    16385 |    0 |    10 |            |               |         |       |          | 3/410              | 1079 | ExclusiveLock    | t       | f
(9 rows)
  • pg_stat_activityビュー等と結合して必要な情報を補う
postgres=# TABLE pg_stat_activity;

 datid | datname  | pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin |           query           |         backend_type         
-------+----------+------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+---------------------------+------------------------------
       |          |   60 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.825327+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |                           | autovacuum launcher
       |          |   62 |       10 | postgres |                  |             |                 |             | 2020-02-04 11:26:18.825748+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |                           | logical replication launcher
 13408 | postgres | 1079 |       10 | postgres | psql             |             |                 |          -1 | 2020-02-04 12:31:27.905789+00 | 2020-02-04 13:04:22.986041+00 | 2020-02-04 13:04:22.986041+00 | 2020-02-04 13:04:22.986043+00 | Lock            | transactionid       | active |         496 |          495 | update tbl set col=col+1; | client backend
 13408 | postgres | 1126 |       10 | postgres | psql             |             |                 |          -1 | 2020-02-04 13:02:31.511045+00 | 2020-02-04 13:04:18.73882+00  | 2020-02-04 13:06:23.754336+00 | 2020-02-04 13:06:23.754338+00 |                 |                     | active |         495 |          495 | TABLE pg_stat_activity;   | client backend
       |          |   58 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.824986+00 |                               |                               |                               | Activity        | BgWriterHibernate   |        |             |              |                           | background writer
       |          |   57 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.824728+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |                           | checkpointer
       |          |   59 |          |          |                  |             |                 |             | 2020-02-04 11:26:18.825119+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |                           | walwriter
(7 rows)
  • PostgreSQL 9.6以降ではpg_blocking_pids(<ブロックされているプロセスのPID>)で、ブロックしているプロセスを確認できる
postgres=# SELECT pg_blocking_pids(1079);

 pg_blocking_pids 
------------------
 {1126}
(1 row)
  • PID1126が排他ロックでPID1079をブロックしている