PostgreSQL 高度技術者育成テキスト ch3 (運用管理) (2/2)

OSS-DBPostgreSQL勉強メモ

[asin:B00P4WD4QG:detail]


メンテナンス

  • VACUUM/自動VACUUM

    • 不要領域の再利用
    • トランザクションID周回問題の防止
  • ANALYZE
  • REINDEX

VACUUM

  • FULLとそうでないのは別用途

    • FULL

      • 更新/削除された行を物理的に切り詰める

        • ページサイズ小さくなる
    • FULLでないの

      • 更新/削除された行の再利用

        • ページサイズ変わらない
      • トランザクションID周回の回避
  • 作りたてのテーブルに1000000行挿入
CREATE TABLE tbl (col int);
INSERT INTO tbl VALUES (generate_series(1,1000000));
SELECT relname,reltuples,relpages FROM pg_class WHERE relname='tbl';
-[ RECORD 1 ]----
relname   | tbl
reltuples | 1e+06
relpages  | 4425
  • pgstatuple
SELECT * FROM pgstattuple('tbl');
-[ RECORD 1 ]------+---------
table_len          | 36249600
tuple_count        | 1000000
tuple_len          | 28000000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 125700
free_percent       | 0.35
  • 半分削除して再度挿入
DELETE FROM tbl WHERE col%2=0;
INSERT INTO tbl VALUES(generate_series(2,1000000,2));

ANALYZE tbl;
SELECT relname,reltuples,relpages FROM pg_class WHERE relname='tbl';
-[ RECORD 1 ]----
relname   | tbl
reltuples | 1e+06
relpages  | 6638
  • pgstattuple
SELECT * FROM pgstattuple('tbl');
-[ RECORD 1 ]------+---------
table_len          | 54378496
tuple_count        | 1000000
tuple_len          | 28000000
tuple_percent      | 51.49
dead_tuple_count   | 500000
dead_tuple_len     | 14000000
dead_tuple_percent | 25.75
free_space         | 192632
free_percent       | 0.35
  • vacuumしてみる
VACUUM VERBOSE tbl;
INFO:  vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 38 nonremovable row versions in 1 out of 6638 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 523
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
ANALYZE tbl;
SELECT relname,reltuples,relpages FROM pg_class WHERE relname='tbl';
  • FULLでないので物理的なページサイズは変わらず
-[ RECORD 1 ]----
relname   | tbl
reltuples | 1e+06
relpages  | 6638
  • dead tupleは回収された
SELECT * FROM pgstattuple('tbl');
-[ RECORD 1 ]------+---------
table_len          | 54378496
tuple_count        | 1000000
tuple_len          | 28000000
tuple_percent      | 51.49
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 16192632
free_percent       | 29.78
  • VACUUM FULL
VACUUM FULL VERBOSE tbl;
INFO:  vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 1000000 nonremovable row versions in 6638 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.17 s, system: 0.04 s, elapsed: 0.37 s.
VACUUM
SELECT relname,reltuples,relpages FROM pg_class WHERE relname='tbl';
  • 物理的なページ数も減少
-[ RECORD 1 ]----
relname   | tbl
reltuples | 1e+06
relpages  | 4425
SELECT * FROM pgstattuple('tbl');
-[ RECORD 1 ]------+---------
table_len          | 36249600
tuple_count        | 1000000
tuple_len          | 28000000
tuple_percent      | 77.24
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 125700
free_percent       | 0.35

トランザクションID周回問題

  • 公式/定期的なvacuum
  • PostgreSQLは追記型のアーキテクチャでMVCC (Multi-Version Concurency Control)を実現している
  • versionの比較はトランザクションIDで行われる

XIDs are compared using modulo-2^32 arithmetic.

  • トランザクションIDは32ビット整数なので、20億が新、20億が旧となる
  • ので、少なくとも20億トランザクションごとにfreezing処理を行わないと、タプルの可視性がおかしくなる

    • 「過去のトランザクション」で挿入され、それまで可視だったタプルが、「未来のトランザクション」判定となり、軒並み見えなくなる
  • freezing処理はVACUUMで行われる

    • 手動
    • autovacuum
  • freezingの昔話

    • ~9.4まではFrozenTransactionId(2)を「どのトランザクションIDよりも古いトランザクションID」扱いとし、この値で行のxminを上書きしていた
    • 新しいバージョンでは、単にフラグを立てる

      • 元のxmin値は調査用(forensic use)に保存される
  • 各データベースの、最古のXIDから現在のXIDまでのトランザクション数
SELECT datname,age(datfrozenxid) FROM pg_database;
  datname  | age 
-----------+-----
 postgres  | 132
 template1 | 132
 template0 | 132
(3 rows)
  • テーブル別の、最古のXIDから現在のXIDまでのトランザクション数
SELECT relname,age(relfrozenxid) FROM pg_class where relkind='r';
         relname         | age 
-------------------------+-----
 pg_statistic            | 132
 pg_type                 | 132
 tbl                     |  88
...
(71 rows)
  • 公式/pg_class
  • relkind = 'r'は「通常のテーブル」
  • VERBOSE FREEZEでfreezing処理を行う
VACUUM FREEZE VERBOSE;
...
INFO:  aggressively vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 176 nonremovable row versions in 1 out of 4425 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 612
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 4424 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
...
  • ageが0になる
SELECT datname,age(datfrozenxid) FROM pg_database;
  datname  | age 
-----------+-----
 postgres  |   0
 template1 | 132
 template0 | 132
(3 rows)
SELECT relname,age(relfrozenxid) FROM pg_class WHERE relkind='r';
         relname         | age 
-------------------------+-----
 pg_statistic            |   0
 pg_type                 |   0
 tbl                     |   0
...

遅延vacuum

postgresql.conf

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0			# 0-100 milliseconds (0 disables)
#vacuum_cost_page_hit = 1		# 0-10000 credits
#vacuum_cost_page_miss = 10		# 0-10000 credits
#vacuum_cost_page_dirty = 20		# 0-10000 credits
#vacuum_cost_limit = 200		# 1-10000 credits
  • 他のデータベース活動のI/Oとvacuumとが競合するのを避ける機能
  • vacuumのI/O操作をコストとして積算し、閾値を超えたらvacuumを一時中断(nap)

autovacuum

  • 前回VACUUM後のUPDATE/DELETE回数が閾値を超えたらVACUUMが自動実行される

    • cf. AUTOANALYZEはINSERTも

VACUUM/autovacuumの動作確認

DELETE FROM tbl WHERE col%2=0;
INSERT INTO tbl VALUES (generate_series(2,1000000,2));

TABLE pg_stat_user_tables;
-[ RECORD 1 ]-------+------------------------------
relid               | 16458
schemaname          | public
relname             | tbl
seq_scan            | 6
seq_tup_read        | 6500000
idx_scan            | 
idx_tup_fetch       | 
n_tup_ins           | 1500000
n_tup_upd           | 0
n_tup_del           | 500000
n_tup_hot_upd       | 0
n_live_tup          | 1000000
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         | 2020-02-28 10:43:07.031521+00
last_autovacuum     | 2020-02-28 10:10:47.467538+00
last_analyze        | 2020-02-28 10:43:19.134225+00
last_autoanalyze    | 2020-02-28 10:10:47.596796+00
vacuum_count        | 6
autovacuum_count    | 1
analyze_count       | 5
autoanalyze_count   | 2
VACUUM VERBOSE tbl;
INFO:  vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 38 nonremovable row versions in 1 out of 6638 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 658
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 4424 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
TABLE pg_stat_user_tables;
-[ RECORD 1 ]-------+------------------------------
relid               | 16458
schemaname          | public
relname             | tbl
seq_scan            | 7
seq_tup_read        | 7500000
idx_scan            | 
idx_tup_fetch       | 
n_tup_ins           | 2000000
n_tup_upd           | 0
n_tup_del           | 1000000
n_tup_hot_upd       | 0
n_live_tup          | 999887
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         | 2020-02-28 10:56:19.6923+00
last_autovacuum     | 2020-02-28 10:55:19.445485+00
last_analyze        | 2020-02-28 10:43:19.134225+00
last_autoanalyze    | 2020-02-28 10:55:19.784912+00
vacuum_count        | 7
autovacuum_count    | 2
analyze_count       | 5
autoanalyze_count   | 3
  • n_dead_tupが0に
  • last_vacuumが更新された

ANALYZE

autoanalyze

  • 前回ANALYZE後のINSERT/UPDATE/DELETE回数が閾値を超えたらANALYZEが自動実行される

analyzeの動作確認

  • VACUUM同様、pg_stat_user_tablesで確認できる

    • last_analyze
    • last_autoanalyze
  • 統計情報の内容はpg_statsビューで確認できる
SELECT * FROM pg_stats WHERE tablename='tbl';
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tbl
attname                | col
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {19,10102,20969,30202,40396,50128,59774,68357,77885,86540,96720,105907,117040,127045,137472,148207,157800,167979,177740,187833,197898,207900,216911,226639,236399,246664,256203,266022,275753,285657,295739,305197,315949,326191,337116,347317,357285,367682,377767,387847,397098,407678,417632,426761,436724,446741,456940,466834,476455,486075,496231,506432,516392,525813,535829,545462,555056,565962,575183,585422,594819,605429,616488,625835,635298,645977,655527,665398,675739,685726,695043,704075,714334,723537,733644,743675,754400,764724,774897,785568,796077,806288,816085,827060,837054,847974,858128,867579,878894,888263,898976,909578,920909,931088,940888,950540,960383,969776,979597,989158,999997}
correlation            | 0.50307155
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
  • 半分重複させてみる
DELETE FROM tbl where col%2=1;
INSERT INTO tbl VALUES(generate_series(2,1000000,2));
ANALYZE VERBOSE tbl;
SELECT * FROM pg_stats WHERE tablename='tbl';
  • n_distinct-0.483089くらいになった

    • すべて同じ値だと1になる
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tbl
attname                | col
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -0.483089
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {26,9890,19898,29156,40100,49796,60944,71946,81258,91518,102058,111782,121298,131442,141590,151562,161292,170440,180748,190446,201294,212782,223116,232946,242624,253124,262706,271792,282286,293080,303706,314286,324250,334808,345646,355838,367076,377294,387430,397838,407538,417268,427288,437266,447684,458166,467658,478280,487960,497870,507468,516416,527334,537138,546636,556092,566000,575662,584946,596124,605754,616234,625962,635204,644802,653914,664466,674872,685876,696172,706478,715658,725776,735222,744456,754098,763182,773792,783348,792702,802528,811422,821902,831464,841780,850256,860542,870374,879638,889338,899712,909398,919792,928878,938944,949642,960718,971236,980278,990056,999966}
correlation            | 0.50362015
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
  • さらに半分nullにしてみる
UPDATE tbl SET col=NULL WHERE col%4 = 0;
UPDATE 500000
ANALYZE VERBOSE tbl;
SELECT * FROM  pg_stats WHERE tablename='tbl';
  • null_fracが0.5くらいになった
-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tbl
attname                | col
inherited              | f
null_frac              | 0.49976668
avg_width              | 4
n_distinct             | -0.250741
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {54,11650,21694,32594,42594,51594,60842,71170,80226,90234,99242,108918,118618,127514,138226,148014,157494,167914,179554,187762,197670,207254,216886,228398,239698,249158,257918,268682,278310,289482,299546,309826,320706,330278,339650,349810,359334,368354,377998,387910,398234,407914,418358,429418,439782,450646,459646,470690,480318,489470,499570,509142,519122,529766,540122,551214,560758,571286,581142,591654,602454,611542,620406,630498,640286,649750,658370,668138,679082,689726,699530,709022,719130,729038,739838,749910,758998,770234,780942,790942,800190,810590,821378,830766,842154,852194,861906,870846,879310,889578,900118,909250,920126,931898,941630,950682,960702,970050,979846,990058,999898}
correlation            | 0.50730336
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
  • 可変長カラムの場合
CREATE TABLE tbl2 (name text);
-- 126文字
INSERT INTO tbl2 VALUES ('zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz');
ANALYZE tbl2;
SELECT * FROM pg_stats WHERE tablename='tbl2';
  • widthは127

    • 126文字以下なのでヘッダ長が1バイト
-[ RECORD 1 ]----------+-------
schemaname             | public
tablename              | tbl2
attname                | name
inherited              | f
null_frac              | 0
avg_width              | 127
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | 
correlation            | 
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
-- 127文字
UPDATE tbl2 SET name='zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz';
ANALYZE tbl2;
SELECT * FROM pg_stats WHERE tablename='tbl2';
  • widthは131

    • 127文字になるとヘッダ長が4バイトになるので
-[ RECORD 1 ]----------+-------
schemaname             | public
tablename              | tbl2
attname                | name
inherited              | f
null_frac              | 0
avg_width              | 131
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | 
correlation            | 
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 

REINDEX

-- DROP TABLE tbl;

CREATE TABLE tbl(id int primary key);
INSERT INTO tbl (id) VALUES(generate_series(1,1000000));
SELECT relname,reltuples,relpages FROM pg_class WHERE relname='tbl_pkey';
-[ RECORD 1 ]-------
relname   | tbl_pkey
reltuples | 1e+06
relpages  | 2745
  • UPDATEによりインデックスは断片化する
UPDATE tbl SET id=id WHERE id%2=0;
ANALYZE tbl;

SELECT relname,reltuples,relpages FROM pg_class WHERE relname='tbl_pkey';
-[ RECORD 1 ]-------
relname   | tbl_pkey
reltuples | 1e+06
relpages  | 5486
  • 有効データの密度が下がっている
SELECT * FROM pgstatindex('tbl_pkey');
-[ RECORD 1 ]------+---------
version            | 4
tree_level         | 2
index_size         | 44941312
root_block_no      | 412
internal_pages     | 20
leaf_pages         | 5465
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 45.19
leaf_fragmentation | 49.99
  • インデックスはvacuumできない
VACUUM tbl_pkey;
WARNING:  skipping "tbl_pkey" --- cannot vacuum non-tables or special system tables
VACUUM
  • 断片化したインデックスはREINDEXで再構成する
REINDEX (VERBOSE) INDEX tbl_pkey;
INFO:  index "tbl_pkey" was reindexed
DETAIL:  CPU: user: 0.18 s, system: 0.06 s, elapsed: 0.33 s
REINDEX
SELECT * FROM pgstatindex('tbl_pkey');
-[ RECORD 1 ]------+---------
version            | 4
tree_level         | 2
index_size         | 22487040
root_block_no      | 290
internal_pages     | 11
leaf_pages         | 2733
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 90.06
leaf_fragmentation | 0

サーバログ

  • どこに・いつ・何をログ出力する
  • どのようにログを保持する

どこに

# - Where to Log -

#log_destination = 'stderr'		# Valid values are combinations of
					# stderr, csvlog, syslog, and eventlog,
					# depending on platform.  csvlog
					# requires logging_collector to be on.

# This is used when logging to stderr:
#logging_collector = off		# Enable capturing of stderr and csvlog
					# into log files. Required to be on for
					# csvlogs.
					# (change requires restart)

# These are only used if logging_collector is on:
#log_directory = 'log'			# directory where log files are written,
					# can be absolute or relative to PGDATA
#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'	# log file name pattern,
					# can include strftime() escapes
#log_file_mode = 0600			# creation mode for log files,
					# begin with 0 to use octal notation

...

# These are relevant when logging to syslog:
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#syslog_sequence_numbers = on
#syslog_split_messages = on

# This is only relevant when logging to eventlog (win32):
# (change requires restart)
#event_source = 'PostgreSQL'

いつ

# - When to Log -

#log_min_messages = warning		# values in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   info
					#   notice
					#   warning
					#   error
					#   log
					#   fatal
					#   panic

#log_min_error_statement = error	# values in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   info
					#   notice
					#   warning
					#   error
					#   log
					#   fatal
					#   panic (effectively off)

#log_min_duration_statement = -1	# -1 is disabled, 0 logs all statements
					# and their durations, > 0 logs only
					# statements running at least this number
					# of milliseconds

#log_transaction_sample_rate = 0.0	# Fraction of transactions whose statements
					# are logged regardless of their duration. 1.0 logs all
					# statements from all transactions, 0.0 never logs.
  • クライアントにエラーを送信する設定
#------------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#------------------------------------------------------------------------------

# - Statement Behavior -

#client_min_messages = notice		# values in order of decreasing detail:
					#   debug5
					#   debug4
					#   debug3
					#   debug2
					#   debug1
					#   log
					#   notice
					#   warning
					#   error
  • client側はlogのレベルが低い

    • cf. server側はlogのレベルがfatalに次いで高い

      • 設定値のほとんどでlogは出力するということ

何を

# - What to Log -

#debug_print_parse = off
#debug_print_rewritten = off
#debug_print_plan = off
#debug_pretty_print = on
#log_checkpoints = off
#log_connections = off
#log_disconnections = off
#log_duration = off
#log_error_verbosity = default		# terse, default, or verbose messages
#log_hostname = off
#log_line_prefix = '%m [%p] '		# special values:
					#   %a = application name
					#   %u = user name
					#   %d = database name
					#   %r = remote host and port
					#   %h = remote host
					#   %p = process ID
					#   %t = timestamp without milliseconds
					#   %m = timestamp with milliseconds
					#   %n = timestamp with milliseconds (as a Unix epoch)
					#   %i = command tag
					#   %e = SQL state
					#   %c = session ID
					#   %l = session line number
					#   %s = session start timestamp
					#   %v = virtual transaction ID
					#   %x = transaction ID (0 if none)
					#   %q = stop here in non-session
					#        processes
					#   %% = '%'
					# e.g. '<%u%%%d> '
#log_lock_waits = off			# log lock waits >= deadlock_timeout
#log_statement = 'none'			# none, ddl, mod, all
#log_replication_commands = off
#log_temp_files = -1			# log temporary files equal or larger
					# than the specified size in kilobytes;
					# -1 disables, 0 logs all temp files
log_timezone = 'Etc/UTC'

どのように保持する

  • ローテーション、追記or上書きの設定など
#log_truncate_on_rotation = off		# If on, an existing log file with the
					# same name as the new log file will be
					# truncated rather than appended to.
					# But such truncation only occurs on
					# time-driven rotation, not on restarts
					# or size-driven rotation.  Default is
					# off, meaning append to existing files
					# in all cases.
#log_rotation_age = 1d			# Automatic rotation of logfiles will
					# happen after that time.  0 disables.
#log_rotation_size = 10MB		# Automatic rotation of logfiles will
					# happen after that much log output.
					# 0 disables.

レプリケーションとホットスタンバイ

  • レプリケーションの目的

    • 高可用性
    • 負荷分散

状態確認

  • 【master】WAL位置
SELECT * FROM pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/3013230
(1 row)
  • 【standby】WAL適用位置
SELECT * FROM pg_last_wal_receive_lsn();
 pg_last_wal_receive_lsn 
-------------------------
 0/3013230
(1 row)
  • 【standby】WAL適用日時
SELECT * FROM pg_last_xact_replay_timestamp()
 pg_last_xact_replay_timestamp 
-------------------------------
 2020-02-28 13:56:58.162382+00
(1 row)

コンフリクト軽減

postgresql.conf

#max_standby_archive_delay = 30s	# max delay before canceling queries
					# when reading WAL from archive;
					# -1 allows indefinite delay
#max_standby_streaming_delay = 30s	# max delay before canceling queries
					# when reading streaming WAL;
					# -1 allows indefinite delay
#vacuum_defer_cleanup_age = 0	# number of xacts by which cleanup is delayed
  • スタンバイで、プライマリからWALを受け取ってディスクに書き込んだが、コンフリクトして適用は不可能な場合、クエリをキャンセルするまでの待ち時間

    • from archive
    • from streaming
  • vacuumせずに残しておくトランザクションの世代の数

    • デフォルトの0だと最新以外消す