PostgreSQL 高度技術者育成テキスト ch2 (基本整理)

OSS活動PostgreSQL勉強メモ

[asin:B00P4WD4QG:detail]


プロセス構造

docker exec 2a459 ps auxfww | grep postgres
postgres    62  0.2  0.0   2472  1688 pts/0    Ss+  13:00   0:00 /bin/sh
postgres     1  0.0  0.0 213900 26180 ?        Ss   13:00   0:00 postgres
postgres    56  0.0  0.0 213900  4016 ?        Ss   13:00   0:00 postgres: checkpointer   
postgres    57  0.0  0.0 213900  5776 ?        Ss   13:00   0:00 postgres: background writer   
postgres    58  0.0  0.0 213900 10040 ?        Ss   13:00   0:00 postgres: walwriter   
postgres    59  0.0  0.0 214440  6756 ?        Ss   13:00   0:00 postgres: autovacuum launcher   
postgres    60  0.0  0.0  68280  4580 ?        Ss   13:00   0:00 postgres: stats collector   
postgres    61  0.0  0.0 214324  6676 ?        Ss   13:00   0:00 postgres: logical replication launcher   
  • loggerプロセスはlogging_collectoronのときのみ

    • ログ出力を行うプロセス
ALTER SYSTEM SET logging_collector='on'; -- then, restart
...
postgres    26  0.0  0.0  68196  4712 ?        Ss   13:07   0:00 postgres: logger   
...
  • archiverプロセスはarchive_modeonのときのみ

    • 不要になったWALを別の領域に保存するプロセス
ALTER SYSTEM SET archive_mode='on'; -- then, restart
...
postgres    32  0.0  0.0  68280  4764 ?        Ss   13:11   0:00 postgres: archiver   
...
  • walsender/walreceiverはレプリケーション時のみ
docker exec streaming_replication_master_1 ps auxfww | grep postgres
...
postgres    34  0.0  0.0 214712  9548 ?        Ss   13:13   0:00 postgres: walsender replication_user 172.30.0.3(47138) streaming 0/3013180
...
docker exec streaming_replication_standby1_1 ps auxfww | grep postgres
postgres    20  0.0  0.0 214580 10640 ?        Ss   13:13   0:00  \_ postgres: walreceiver   streaming 0/3013180
  • logical replication launcher/workerはロジカルレプリケーション時のみ
docker exec logical_replication_sub_1 ps auxfww | grep postgres
postgres   110  0.0  0.0 214100  6756 ?        Ss   13:15   0:00 postgres: logical replication launcher   
postgres   111  0.0  0.0 215604 14760 ?        Ss   13:15   0:00 postgres: logical replication worker for subscription 16390   
  • バックエンドプロセス

    • クライアント(フロントエンド)ごとに立ち上がる
postgres  5240  0.0  0.0 215348 11988 ?        Ss   13:05   0:00 postgres: postgres postgres [local] idle

利用する主なメモリ

  • 共有バッファ

    • ダーティページをメモリ上に持っておくやつ
  • WALバッファ

    • WALを書き込むときのやつ
  • ワークメモリ

    • ソートやハッシュ操作などでバックエンドプロセスが使用
    • 増やすとGroupAggregateがHashAggregateになったりする
  • メンテナンスワークメモリ

    • VACUUMやCREATE INDEXなどで使用

データベースクラスタの構造

/docker:postgres@2a45:/var/lib/postgresql/data $ ls -lA
total 104
-rw-------   1 postgres       postgres     3 2020-02-27 22:00 PG_VERSION
drwx------   5 postgres       postgres  4096 2020-02-27 22:00 base
-rw-------   1 postgres       postgres    44 2020-02-27 22:11 current_logfiles
drwx------   2 postgres       postgres  4096 2020-02-27 22:11 global
drwx------   2 postgres       postgres  4096 2020-02-27 22:11 log
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_commit_ts
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_dynshmem
-rw-------   1 postgres       postgres  4537 2020-02-27 22:00 pg_hba.conf
-rw-------   1 postgres       postgres  1636 2020-02-27 22:00 pg_ident.conf
drwx------   4 postgres       postgres  4096 2020-02-27 22:11 pg_logical
drwx------   4 postgres       postgres  4096 2020-02-27 22:00 pg_multixact
drwx------   2 postgres       postgres  4096 2020-02-27 22:11 pg_notify
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_replslot
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_serial
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_snapshots
drwx------   2 postgres       postgres  4096 2020-02-27 22:07 pg_stat
drwx------   2 postgres       postgres  4096 2020-02-27 22:22 pg_stat_tmp
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_subtrans
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_tblspc
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_twophase
drwx------   3 postgres       postgres  4096 2020-02-27 22:00 pg_wal
drwx------   2 postgres       postgres  4096 2020-02-27 22:00 pg_xact
-rw-------   1 postgres       postgres   133 2020-02-27 22:10 postgresql.auto.conf
-rw-------   1 postgres       postgres 26588 2020-02-27 22:00 postgresql.conf
-rw-------   1 postgres       postgres    36 2020-02-27 22:11 postmaster.opts
-rw-------   1 postgres       postgres    94 2020-02-27 22:11 postmaster.pid

データの格納方法

base/

  • テーブルファイル
/docker:postgres@2a45:/var/lib/postgresql/data $ ls -lA base/1
total 7801
-rw-------   1 postgres       postgres   8192 2020-02-27 22:00 112
-rw-------   1 postgres       postgres   8192 2020-02-27 22:00 113
-rw-------   1 postgres       postgres  81920 2020-02-27 22:00 1247
-rw-------   1 postgres       postgres  24576 2020-02-27 22:00 1247_fsm
-rw-------   1 postgres       postgres   8192 2020-02-27 22:00 1247_vm
-rw-------   1 postgres       postgres 442368 2020-02-27 22:00 1249
-rw-------   1 postgres       postgres  24576 2020-02-27 22:00 1249_fsm
-rw-------   1 postgres       postgres   8192 2020-02-27 22:00 1249_vm
...
  • ページサイズ単位

    • デフォルト設定でPostgreSQLをビルドすると8192B
  • fsm: free space map

    • 再利用できる領域を覚えておくやつ
  • vm: visibility map

    • vacuumをサボるためのやつ
    • インデックスオンリースキャンでも使う
  • インデックスもページ単位

pg_wal/

  • WALファイル領域

    • パフォーマンスを確保しつつACIDのdurabilityを実現するためのやつ

pg_tblspc/

  • テーブル空間へのsymlinkを格納する
  • テーブル空間

    • base/のほか、任意の領域をデータ保存先にできる
    • ユースケース

      • 高価で高速な領域
      • 安価で低速な領域
    • バックアップ時に取得を忘れないよう注意

contribモジュール

  • インストール済のEXTENSIONはpg_extension`ビューで確認可能
SELECT extname,extversion FROM pg_extension;
 extname | extversion 
---------+------------
 plpgsql | 1.0
(1 row)

pgbench

transaction type: <builtin: TPC-B (sort of)>
  • つかってみる
createdb benchdb -- ベンチ用DB作る
pgbench -i benchdb -- 初期化
dropping old tables...
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
creating tables...
generating data...
100000 of 100000 tuples (100%) done (elapsed 0.06 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
pgbench benchdb -- 実施
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 2.321 ms
tps = 430.859910 (including connections establishing)
tps = 463.329766 (excluding connections establishing)

ベンチマーク用DBの内容

\d+
              List of relations
 Schema |       Name       | Type  |  Owner   
--------+------------------+-------+----------
 public | pgbench_accounts | table | postgres
 public | pgbench_branches | table | postgres
 public | pgbench_history  | table | postgres
 public | pgbench_tellers  | table | postgres
(4 rows)
  • 件数・fillfactor
SELECT relname,reltuples,reloptions FROM pg_class WHERE relname LIKE 'pgbench_%';
        relname        | reltuples |    reloptions    
-----------------------+-----------+------------------
 pgbench_accounts      |    100000 | {fillfactor=100}
 pgbench_accounts_pkey |    100000 | 
 pgbench_branches      |         1 | {fillfactor=100}
 pgbench_branches_pkey |         1 | 
 pgbench_history       |         0 | 
 pgbench_tellers       |        10 | {fillfactor=100}
 pgbench_tellers_pkey  |        10 | 
(7 rows)
  • 初期化時のオプションで調整できる
pgbench -i -s 10 -F 70 benchdb
SELECT relname,reltuples,reloptions FROM pg_class WHERE relname LIKE 'pgbench_%';
        relname        | reltuples |   reloptions    
-----------------------+-----------+-----------------
 pgbench_accounts      |     1e+06 | {fillfactor=70}
 pgbench_accounts_pkey |     1e+06 | 
 pgbench_branches      |        10 | {fillfactor=70}
 pgbench_branches_pkey |        10 | 
 pgbench_history       |         0 | 
 pgbench_tellers       |       100 | {fillfactor=70}
 pgbench_tellers_pkey  |       100 | 
(7 rows)

ベンチマーク処理の中身

ALTER SYSTEM SET log_statement='all';
pgbench benchdb
  • ログ
2020-02-27 14:06:52.333 UTC [370] LOG:  statement: BEGIN;
2020-02-27 14:06:52.333 UTC [370] LOG:  statement: UPDATE pgbench_accounts SET abalance = abalance + -3533 WHERE aid = 40109;
2020-02-27 14:06:52.334 UTC [370] LOG:  statement: SELECT abalance FROM pgbench_accounts WHERE aid = 40109;
2020-02-27 14:06:52.334 UTC [370] LOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance + -3533 WHERE tid = 5;
2020-02-27 14:06:52.334 UTC [370] LOG:  statement: UPDATE pgbench_branches SET bbalance = bbalance + -3533 WHERE bid = 1;
2020-02-27 14:06:52.334 UTC [370] LOG:  statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (5, 1, 40109, -3533, CURRENT_TIMESTAMP);
2020-02-27 14:06:52.335 UTC [370] LOG:  statement: END;
  • トランザクションで下記の処理を実施

    1. pgbench_accountsの1行更新
    2. pgbench_accountsの1行参照
    3. pgbench_tellersの1行更新
    4. pgbench_branchesの1行更新
    5. pgbench_historyに1行追加
  • これをvacuum後に10回実施
2020-02-27 14:06:52.318 UTC [369] LOG:  statement: vacuum pgbench_branches
2020-02-27 14:06:52.329 UTC [369] LOG:  statement: vacuum pgbench_tellers
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 2.727 ms
tps = 366.747718 (including connections establishing)
tps = 393.761247 (excluding connections establishing)

負荷の調整

  • 一部オプション抜粋
Options to select what to run:
  -b, --builtin=NAME[@W]   add builtin script NAME weighted at W (default: 1)
                           (use "-b list" to list available scripts)
...
  -S, --select-only        perform SELECT-only transactions
                           (same as "-b select-only")

Benchmarking options:
  -c, --client=NUM         number of concurrent database clients (default: 1)
  -C, --connect            establish new connection for each transaction
...
  -j, --jobs=NUM           number of threads (default: 1)
  -l, --log                write transaction times to log file
...
  -t, --transactions=NUM   number of transactions each client runs (default: 10)
  -T, --time=NUM           duration of benchmark test in seconds

測定結果の見方

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 2.727 ms
tps = 366.747718 (including connections establishing)
tps = 393.761247 (excluding connections establishing)
  • 設定パラメータ
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
  • トランザクション数

    • 完了/予定
number of transactions actually processed: 10/10
  • Transaction per Second
tps = 366.747718 (including connections establishing)
tps = 393.761247 (excluding connections establishing)
  • -C, --connectを指定すると、これらの差分から接続のオーバヘッドを確認できる
pgbench --connect
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 5.937 ms
tps = 168.432116 (including connections establishing)
tps = 256.256503 (excluding connections establishing)

oid2name

oidと名前一覧を出力

  • 引数なしで呼び出し
oid2name
All databases:
    Oid  Database Name  Tablespace
----------------------------------
  16384        benchdb  pg_default
  13408       postgres  pg_default
  13407      template0  pg_default
      1      template1  pg_default

テーブル空間一覧

mkdir -m 700 /var/lib/postgresql/tblspc/
CREATE TABLESPACE my_table_space LOCATION '/var/lib/postgresql/tblspc/';
oid2name -s # --tablespaces でも可
All tablespaces:
    Oid  Tablespace Name
------------------------
   1663       pg_default
   1664        pg_global
  16482   my_table_space

指定のデータベースのデータベースオブジェクト一覧

  • postgresデータベースのテーブルのみ
oid2name -d postgres
From database "postgres":
  Filenode        Table Name
----------------------------
     16420  pgbench_accounts
     16421  pgbench_branches
     16481   pgbench_history
     16423   pgbench_tellers
     16483               tbl
  • インデックスも
oid2name -d postgres -i
From database "postgres":
  Filenode             Table Name
---------------------------------
     16420       pgbench_accounts
     16428  pgbench_accounts_pkey
     16421       pgbench_branches
     16424  pgbench_branches_pkey
     16481        pgbench_history
     16423        pgbench_tellers
     16426   pgbench_tellers_pkey
     16483                    tbl
  • ファイルノード指定
oid2name -f 16483
From database "postgres":
  Filenode  Table Name
----------------------
     16483         tbl

注意点

oid2name --help
oid2name helps examining the file structure used by PostgreSQL.

...

Connection options:
  -d, --dbname=DBNAME        database to connect to
  -h, --host=HOSTNAME        database server host or socket directory
  -H                         same as -h, deprecated option
  -p, --port=PORT            database server port number
  -U, --username=USERNAME    connect as specified database user
  • -H (deprecated)

    • 他のツール群では-hでホスト指定するところを、oid2nameは例外的に-Hだったらしい

pgstattuple/pgstatindex

  • 公式/pgstattuple
  • ソースコードはcontrib/pgbench/配下
  • CREATE EXTENSIONで利用
CREATE EXTENSION pgstattuple;
SELECT extname,extversion FROM pg_extension;
   extname   | extversion 
-------------+------------
 plpgsql     | 1.0
 pgstattuple | 1.5
(2 rows)

pgstattuple

SELECT * FROM pgstattuple('tbl');
-[ RECORD 1 ]------+------
table_len          | 8192
tuple_count        | 100
tuple_len          | 2800
tuple_percent      | 34.18
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 4564
free_percent       | 55.71
  • 1レコードあたりのサイズ等算出できる
  • DELETE/INSERTしてdead tupleを増やしてみる
DELETE FROM tbl WHERE col % 2 = 0;
INSERT INTO tbl VALUES (generate_series(2,100,2));
SELECT * FROM pgstattuple('tbl');
  • dead tupleが50行増加
  • free_space減少
-[ RECORD 1 ]------+------
table_len          | 8192
tuple_count        | 100
tuple_len          | 2800
tuple_percent      | 34.18
dead_tuple_count   | 50
dead_tuple_len     | 1400
dead_tuple_percent | 17.09
free_space         | 2764
free_percent       | 33.74
  • もっとDELETE\INSERTしてみる
DELETE FROM tbl WHERE col % 2 = 0;
INSERT INTO tbl VALUES (generate_series(2,100,2));
DELETE FROM tbl WHERE col % 2 = 0;
INSERT INTO tbl VALUES (generate_series(2,100,2));
DELETE FROM tbl WHERE col % 2 = 0;
INSERT INTO tbl VALUES (generate_series(2,100,2));
SELECT * FROM pgstattuple('tbl');
  • 2ページ(2 x 8192B)になった
-[ RECORD 1 ]------+------
table_len          | 16384
tuple_count        | 100
tuple_len          | 2800
tuple_percent      | 17.09
dead_tuple_count   | 50
dead_tuple_len     | 1400
dead_tuple_percent | 8.54
free_space         | 10328
free_percent       | 63.04
  • VACUUMして再度見てみる
VACUUM VERBOSE tbl;
INFO:  vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 100 nonremovable row versions in 2 out of 2 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 652
There were 200 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
SELECT * FROM pgstattuple('tbl');
  • dead tupleはなくなった
  • VACUUM FULLではないので物理的な領域は回収されない (2ページのまま)
-[ RECORD 1 ]------+------
table_len          | 16384
tuple_count        | 100
tuple_len          | 2800
tuple_percent      | 17.09
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 11928
free_percent       | 72.8
  • VACUUM FULLして再度見てみる
VACUUM FULL VERBOSE tbl;
INFO:  vacuuming "public.tbl"
INFO:  "tbl": found 0 removable, 100 nonremovable row versions in 2 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
SELECT * FROM pgstattuple('tbl');
  • 物理的な領域も回収された
-[ RECORD 1 ]------+------
table_len          | 8192
tuple_count        | 100
tuple_len          | 2800
tuple_percent      | 34.18
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 4564
free_percent       | 55.71

pgstatindex

CREATE INDEX ON tbl using BTREE (col);
SELECT * FROM pgstatindex('tbl_col_idx');
-[ RECORD 1 ]------+------
version            | 4
tree_level         | 0
index_size         | 16384
root_block_no      | 1
internal_pages     | 0
leaf_pages         | 1
empty_pages        | 0
deleted_pages      | 0
avg_leaf_density   | 24.58
leaf_fragmentation | 0
  • 2ページある(2 x 8192B)のはメタページとrootページが1つずつあるから

    • root_block_noが1なのは「1番目のページがroot」の意
  • 木の深さ(tree_level)は0

    • rootがleafでもある(leaf_pagesが1)
    • 中間ノードなし(internal_pagesが0)

注意点

  • pgstattupleモジュールは、情報収集のためにリレーション全体を走査する

    • cf. ANALYZEは全レコード走査はせずサンプリングする
  • 本番環境で利用すると性能に影響を与える可能性があることに注意する