内部構造から学ぶPostgreSQL 設計・運用計画の鉄則 ch8 監視計画

OSS-DBPostgreSQL勉強メモ

出典: 


監視とは

  • 公式
  • 計画する内容:
  • 監視項目

    • OS
    • PostgreSQLサーバプロセス
  • 監視間隔

    • SLA等に照らして必要十分な間隔で計測する
    • 監視内容により監視間隔も変わってくる

      • 死活は数秒とか
      • 性能は数分とか
  • 閾値

    • 想定、実績、閾値をもって、「異常」の定義をする

監視項目の選定

  • 最終目的は「データベースが健全に動作しているか」の監視
  • そのためにブレークダウン

    • サーバに問題が起きていないか
    • PostgreSQLに問題が起きていないか

サーバに問題が起きていないか?を監視する

  • Unixの監視ツールをつかう
  • sar

    • CPU,メモリ,I/O,ネットワークぜんぶ
    • sysstatデーモンが動いていて/var/log/sysstat/sa<日>にログを書き出すやつ
    • sarクライアントで取り出したりする
  • iostat

    • I/O
    • CPU
Linux 4.19.84-microsoft-standard (DESKTOP-2PJLLS0) 	02/02/20 	_x86_64_	(6 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.41    0.00    0.14    0.00    0.00   99.44

Device             tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.56         0.01       283.69        165    4195632
sdb               3.89        49.66        43.73     734501     646704
  • vmstat

    • virtual memory
    • CPUも
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 24867800 136884 835760    0    0     8    54   16   43  0  0 99  0  0
  • top

    • 各プロセスのCPU、メモリ
  • netstat

    • ネットワーク

      • TCP/IPとかUnixドメインソケット通信とか
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 localhost:33154         localhost:6010          ESTABLISHED
tcp        0      0 localhost:6010          localhost:33144         ESTABLISHED
...
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  3      [ ]         STREAM     CONNECTED     1564     
unix  3      [ ]         STREAM     CONNECTED     1563     
...

PostgreSQLに問題が起きていないか?を監視する

必要なプロセスは正常に動作しているか

  • psコマンド
  • pg_stat_activityビュー

    • 公式
    • 1行あたり1サーバプロセス

      • 実行中のクエリ情報とか
postgres=# TABLE pg_stat_activity;
 datid | datname  | pid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin |          query          |         backend_type         
-------+----------+-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+-------------------------+------------------------------
       |          |  60 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707822+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |                         | autovacuum launcher
       |          |  62 |       10 | postgres |                  |             |                 |             | 2020-02-02 05:55:26.708161+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |                         | logical replication launcher
 13408 | postgres | 674 |       10 | postgres | psql             |             |                 |          -1 | 2020-02-02 07:52:49.013246+00 | 2020-02-02 07:53:08.898073+00 | 2020-02-02 07:53:08.898073+00 | 2020-02-02 07:53:08.898074+00 |                 |                     | active |             |          489 | TABLE pg_stat_activity; | client backend
       |          |  58 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707524+00 |                               |                               |                               | Activity        | BgWriterHibernate   |        |             |              |                         | background writer
       |          |  57 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707363+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |                         | checkpointer
       |          |  59 |          |          |                  |             |                 |             | 2020-02-02 05:55:26.707688+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |                         | walwriter
(6 rows)
  • pg_stat_archiverビュー

    • 公式
    • WALアーカイバの活動監視
postgres=# TABLE pg_stat_archiver;
 archived_count | last_archived_wal | last_archived_time | failed_count | last_failed_wal | last_failed_time |          stats_reset          
----------------+-------------------+--------------------+--------------+-----------------+------------------+-------------------------------
              0 |                   |                    |            0 |                 |                  | 2020-02-02 05:55:26.500914+00
(1 row)
  • pg_stat_progress_vacuumビュー

    • 公式
    • 1行あたりVACUUMを実行しているバックエンド1プロセス

      • autovacuumワーカ含む
postgres=# TABLE pg_stat_progress_vacuum;
 pid | datid | datname | relid | phase | heap_blks_total | heap_blks_scanned | heap_blks_vacuumed | index_vacuum_count | max_dead_tuples | num_dead_tuples 
-----+-------+---------+-------+-------+-----------------+-------------------+--------------------+--------------------+-----------------+-----------------
(0 rows)
  • C-S接続確認

ディスク容量に問題はないか

  • df, duコマンド
  • pg_database_size(<database_name>)関数
postgres=# SELECT pg_database_size('postgres');
 pg_database_size 
------------------
          8135535
(1 row)
  • pg_total_relation_size(<relation_size>)関数
postgres=# CREATE TABLE tbl (col int);
CREATE TABLE

postgres=# INSERT INTO tbl values (1);
INSERT 0 1

postgres=# SELECT pg_total_relation_size('tbl');
 pg_total_relation_size 
------------------------
                   8192
(1 row)
  • 1ページ(8kB)に収まっている
  • テーブル、インデックスの容量やスキャン履歴も監視対象
postgres=# CREATE INDEX idx ON tbl USING btree (col);
CREATE INDEX
  • pg_stat_user_tablesビュー
postgres=# TABLE pg_stat_user_tables;
 relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count 
-------+------------+---------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
 16386 | public     | tbl     |        1 |            2 |        0 |             0 |         2 |         0 |         0 |             0 |          2 |          0 |                   2 |             |                 |              |                  |            0 |                0 |             0 |                 0
(1 row)
  • pg_stat_user_indexesビュー
postgres=# TABLE pg_stat_user_indexes;
 relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch 
-------+------------+------------+---------+--------------+----------+--------------+---------------
 16386 |      16389 | public     | tbl     | idx          |        0 |            0 |             0
(1 row)

想定通りの性能を出せているか

  • log_min_duration_statementでスロークエリをロギングできる
#log_min_duration_statement = -1	# -1 is disabled, 0 logs all statements
					# and their durations, > 0 logs only
					# statements running at least this number
					# of milliseconds
  • 当該設定項目について
postgres=# SELECT name,setting,context,unit,boot_val FROM pg_settings WHERE name='log_min_duration_statement';
            name            | setting |  context  | unit | boot_val 
----------------------------+---------+-----------+------+----------
 log_min_duration_statement | -1      | superuser | ms   | -1
(1 row)
  • スーパーユーザのSET文で設定変更可能
  • ミリ秒単位
  • -1で無効
  • 0以上ならば、設定値ミリ秒よりも長時間かかったクエリをロギング
postgres=# SET log_min_duration_statement=0;
SET


postgres=# SELECT * FROM tbl;
 col 
-----
   1
   2
(2 rows)
  • stderr(デフォルトのログ出力先)を見てみる
2020-02-02 08:20:34.384 UTC [674] LOG:  duration: 0.041 ms  statement: SET log_min_duration_statement=0;
2020-02-02 08:20:44.917 UTC [674] LOG:  duration: 0.101 ms  statement: SELECT * FROM tbl;
  • log_min_duration_statementの設定を変更したSET文自体のログも出るよう

PostgreSQLのログに異常を知らせる通知はないか

  • デフォルトで最低限なので適宜設定する

PostgreSQLログの設定

  • 実運用ではデータベースに触れることが許されないことがある

    • システムカタログを参照できない
  • のでログが大事

PostgreSQLログをどこに出力するか

postgresql.conf

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

...

# - Where to Log -

#log_destination = 'stderr'		# Valid values are combinations of
					# stderr, csvlog, syslog, and eventlog,
					# depending on platform.  csvlog
					# requires logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off		# Enable capturing of stderr and csvlog
					# into log files. Required to be on for
					# csvlogs.
					# (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'log'			# directory where log files are written,
					# can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'	# log file name pattern,
					# can include strftime() escapes
#log_file_mode = 0600			# creation mode for log files,
					# begin with 0 to use octal notation
#log_truncate_on_rotation = off		# If on, an existing log file with the
					# same name as the new log file will be
					# truncated rather than appended to.
					# But such truncation only occurs on
					# time-driven rotation, not on restarts
					# or size-driven rotation.  Default is
					# off, meaning append to existing files
					# in all cases.
#log_rotation_age = 1d			# Automatic rotation of logfiles will
					# happen after that time.  0 disables.
#log_rotation_size = 10MB		# Automatic rotation of logfiles will
					# happen after that much log output.
					# 0 disables.

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
# (change requires restart)
#event_source = 'PostgreSQL'

PostgreSQLログをいつ出力するか

postgresql.conf

#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

#client_min_messages = notice		# values in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   log
					#   notice
					#   warning
					#   error

...

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

...

# - When to Log -

#log_min_messages = warning		# values in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   info
					#   notice
					#   warning
					#   error
					#   log
					#   fatal
					#   panic

#log_min_error_statement = error	# values in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   info
					#   notice
					#   warning
					#   error
					#   log
					#   fatal
					#   panic (effectively off)

#log_min_duration_statement = -1	# -1 is disabled, 0 logs all statements
					# and their durations, > 0 logs only
					# statements running at least this number
					# of milliseconds

#log_transaction_sample_rate = 0.0	# Fraction of transactions whose statements
					# are logged regardless of their duration. 1.0 logs all
					# statements from all transactions, 0.0 never logs.

PostgreSQLログに何を出力するか

postgresql.conf

#------------------------------------------------------------------------------
# AUTOVACUUM
#------------------------------------------------------------------------------

...

#log_autovacuum_min_duration = -1	# -1 disables, 0 logs all actions and
					# their durations, > 0 logs only
					# actions running at least this number
					# of milliseconds.

...

#------------------------------------------------------------------------------
# REPORTING AND LOGGING
#------------------------------------------------------------------------------

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default		# terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '%m [%p] '		# special values:
					#   %a = application name
					#   %u = user name
					#   %d = database name
					#   %r = remote host and port
					#   %h = remote host
					#   %p = process ID
					#   %t = timestamp without milliseconds
					#   %m = timestamp with milliseconds
					#   %n = timestamp with milliseconds (as a Unix epoch)
					#   %i = command tag
					#   %e = SQL state
					#   %c = session ID
					#   %l = session line number
					#   %s = session start timestamp
					#   %v = virtual transaction ID
					#   %x = transaction ID (0 if none)
					#   %q = stop here in non-session
					#        processes
					#   %% = '%'
					# e.g. '<%u%%%d> '
#log_lock_waits = off			# log lock waits >= deadlock_timeout
#log_statement = 'none'			# none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1			# log temporary files equal or larger
					# than the specified size in kilobytes;
					# -1 disables, 0 logs all temp files
log_timezone = 'Etc/UTC'
  • ログ出力内容例
2020-02-02 08:20:44.917 UTC [674] LOG:  duration: 0.101 ms  statement: SELECT * FROM tbl;
  • prefixはこの部分:
2020-02-02 08:20:44.917 UTC [674]
  • デフォルトは'%m [%p] '

    • %m: millisec単位のタイムスタンプ (TZ)
    • %p: PID

PostgreSQLログをどのように保持するか

  • ログローテート

    • syslog使用時

      • syslogのログローテート機能を使う
    • syslog不使用時

      • PostgreSQLのログローテート機能を使う

このへん

#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'	# log file name pattern,
					# can include strftime() escapes
...
#log_truncate_on_rotation = off		# If on, an existing log file with the
					# same name as the new log file will be
					# truncated rather than appended to.
					# But such truncation only occurs on
					# time-driven rotation, not on restarts
					# or size-driven rotation.  Default is
					# off, meaning append to existing files
					# in all cases.
#log_rotation_age = 1d			# Automatic rotation of logfiles will
					# happen after that time.  0 disables.
#log_rotation_size = 10MB		# Automatic rotation of logfiles will
					# happen after that much log output.
					# 0 disables.
  • 間隔、もしくはログファイルサイズ、もしくはその両方でログローテートできる

異常時の判断基準

  • 何をもって異常とするか

    • 想定と実績と閾値で判断する
    • そのための閾値を決定する