内部構造から学ぶPostgreSQL 設計・運用計画の鉄則 ch16 (パフォーマンスチューニング) (2/2)

OSS-DBPostgreSQL勉強メモ

出典: 


パラメータチューニング

【事例3】work_memのチューニング

CREATE TABLE tbl(col int);
INSERT INTO tbl VALUES (generate_series(1,1000000));
EXPLAIN ANALYZE SELECT MIN(col),AVG(col) FROM tbl GROUP BY col%3;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=130257.34..155257.34 rows=1000000 width=40) (actual time=322.977..449.161 rows=3 loops=1)
   Group Key: ((col % 3))
   ->  Sort  (cost=130257.34..132757.34 rows=1000000 width=8) (actual time=259.981..338.863 rows=1000000 loops=1)
         Sort Key: ((col % 3))
         Sort Method: external merge  Disk: 17696kB
         ->  Seq Scan on tbl  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=38.925..124.728 rows=1000000 loops=1)
 Planning Time: 0.067 ms
 JIT:
   Functions: 7
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.943 ms, Inlining 0.000 ms, Optimization 6.459 ms, Emission 31.840 ms, Total 39.242 ms
 Execution Time: 560.798 ms
(12 rows)
  • ソートやハッシュといった処理に使えるメモリ量をwork_memで設定する
  • 上の実行計画では…

    • GroupAggregate
    • Sort (external merge)

      • ディスクを使っている
SET work_mem = '128MB';
EXPLAIN ANALYZE SELECT MIN(col),AVG(col) FROM tbl GROUP BY col%3;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=24425.00..39425.00 rows=1000000 width=40) (actual time=242.029..244.481 rows=3 loops=1)
   Group Key: (col % 3)
   ->  Seq Scan on tbl  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=0.018..97.615 rows=1000000 loops=1)
 Planning Time: 0.044 ms
 Execution Time: 249.735 ms
(5 rows)
  • work_memを増やしたことで…

    • GroupAggregateがHashAggregateになった
    • ソートが無くなった

      • GroupAggregateで必要だったもの
    • 高速化した(560.798 ms => 249.735 ms)
  • クエリ数でスケールするので注意する

    • スワップして逆に遅くなる

work_memのパラメータの最適値

  • 先の例でexternal mergeに17MB程度しか使用していなかった
  • が、18MBにしてもexternal mergeは解消されない
SET work_mem = '17MB';
EXPLAIN ANALYZE SELECT MIN(col),AVG(col) FROM tbl GROUP BY col%3;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=130257.34..155257.34 rows=1000000 width=40) (actual time=277.090..401.535 rows=3 loops=1)
   Group Key: ((col % 3))
   ->  Sort  (cost=130257.34..132757.34 rows=1000000 width=8) (actual time=214.055..289.812 rows=1000000 loops=1)
         Sort Key: ((col % 3))
         Sort Method: external merge  Disk: 17640kB
         ->  Seq Scan on tbl  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=4.076..89.327 rows=1000000 loops=1)
 Planning Time: 0.045 ms
 JIT:
   Functions: 7
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.706 ms, Inlining 0.000 ms, Optimization 0.193 ms, Emission 3.750 ms, Total 4.650 ms
 Execution Time: 403.967 ms
(12 rows)
  • ハッシュにもメモリを要するのが理由
  • 段階的に調整して最適値を探ろう
  • 今回の例では122/123MBが境目だった
SET work_mem = '122MB';
EXPLAIN ANALYZE SELECT MIN(col),AVG(col) FROM tbl GROUP BY col%3;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=116582.84..141582.84 rows=1000000 width=40) (actual time=229.893..330.129 rows=3 loops=1)
   Group Key: ((col % 3))
   ->  Sort  (cost=116582.84..119082.84 rows=1000000 width=8) (actual time=179.160..218.016 rows=1000000 loops=1)
         Sort Key: ((col % 3))
         Sort Method: quicksort  Memory: 71452kB
         ->  Seq Scan on tbl  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=4.613..90.972 rows=1000000 loops=1)
 Planning Time: 0.044 ms
 JIT:
   Functions: 7
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.724 ms, Inlining 0.000 ms, Optimization 0.197 ms, Emission 4.260 ms, Total 5.180 ms
 Execution Time: 331.579 ms
(12 rows)
SET work_mem = '123MB';
EXPLAIN ANALYZE SELECT MIN(col),AVG(col) FROM tbl GROUP BY col%3;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=24425.00..39425.00 rows=1000000 width=40) (actual time=233.830..235.826 rows=3 loops=1)
   Group Key: (col % 3)
   ->  Seq Scan on tbl  (cost=0.00..16925.00 rows=1000000 width=8) (actual time=0.018..92.967 rows=1000000 loops=1)
 Planning Time: 0.038 ms
 Execution Time: 239.939 ms
(5 rows)

【事例4】チェックポイント間隔のチューニング

  • チェックポイント

    • 共有バッファ上のダーティページをディスクに書き戻す処理
  • 頻発するとログメッセージが出る
2020-02-26 14:45:41.202 UTC [57] LOG:  checkpoints are occurring too frequently (5 seconds apart)
2020-02-26 14:45:41.202 UTC [57] HINT:  Consider increasing the configuration parameter "max_wal_size".
2020-02-26 14:45:46.440 UTC [57] LOG:  checkpoints are occurring too frequently (5 seconds apart)
2020-02-26 14:45:46.440 UTC [57] HINT:  Consider increasing the configuration parameter "max_wal_size".
2020-02-26 14:45:51.798 UTC [57] LOG:  checkpoints are occurring too frequently (5 seconds apart)
2020-02-26 14:45:51.798 UTC [57] HINT:  Consider increasing the configuration parameter "max_wal_size".

ベンチマークコマンド「pgbench」での性能測定

  • スケールファクタ100で初期化
pgbench -i -s 100
dropping old tables...
creating tables...
generating data...
100000 of 10000000 tuples (1%) done (elapsed 0.05 s, remaining 5.25 s)
200000 of 10000000 tuples (2%) done (elapsed 0.13 s, remaining 6.53 s)
300000 of 10000000 tuples (3%) done (elapsed 0.22 s, remaining 6.96 s)
...
9900000 of 10000000 tuples (99%) done (elapsed 10.75 s, remaining 0.11 s)
10000000 of 10000000 tuples (100%) done (elapsed 10.83 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done.
  • クライアント数100、5分間実施
pgbench -c 100 -T 300
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 1
duration: 300 s
number of transactions actually processed: 2136661
latency average = 14.044 ms
tps = 7120.707752 (including connections establishing)
tps = 7120.745540 (excluding connections establishing)
  • チェックポイントは頻発していた
2020-02-26 14:36:47.152 UTC [57] LOG:  checkpoints are occurring too frequently (14 seconds apart)
2020-02-26 14:36:47.152 UTC [57] HINT:  Consider increasing the configuration parameter "max_wal_size".
2020-02-26 14:37:01.555 UTC [57] LOG:  checkpoints are occurring too frequently (14 seconds apart)
2020-02-26 14:37:01.555 UTC [57] HINT:  Consider increasing the configuration parameter "max_wal_size".
...
  • max_wal_sizeを2GBにして再実施してみる
psql -c "ALTER SYSTEM SET max_wal_size='2GB'"
psql -c 'SELECT pg_reload_conf();'
pgbench -c 100 -T 300
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: simple
number of clients: 100
number of threads: 1
duration: 300 s
number of transactions actually processed: 1899319
latency average = 15.798 ms
tps = 6329.801232 (including connections establishing)
tps = 6329.834891 (excluding connections establishing)
  • チェックポイント発生せず
  • なぜかtpsは下がっちゃった(よくわからない)

【事例5】統計情報のチューニング

  • 全レコードチェックするのではなく、サンプリングしている
postgres=# select * from pg_settings where name='default_statistics_target';
-[ RECORD 1 ]---+-------------------------------------------------------------------------------------------------------------
name            | default_statistics_target
setting         | 100
unit            | 
category        | Query Tuning / Other Planner Options
short_desc      | Sets the default statistics target.
extra_desc      | This applies to table columns that have not had a column-specific target set via ALTER TABLE SET STATISTICS.
context         | user
vartype         | integer
source          | default
min_val         | 1
max_val         | 10000
enumvals        | 
boot_val        | 100
reset_val       | 100
sourcefile      | 
sourceline      | 
pending_restart | f
  • デフォルト値をdefault_statistics_targetで設定できる

    • user context
  • ALTER TABLE文で、カラム単位で設定上書きできる
ALTER TABLE tbl ALTER COLUMN col SET STATISTICS 200;
postgres=# \d+ tbl
                                    Table "public.tbl"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target | Description 
--------+---------+-----------+----------+---------+---------+--------------+-------------
 col    | integer |           |          |         | plain   | 200          | 
Access method: heap

【事例6】パラレルスキャン

EXPLAIN ANALYZE SELECT MIN(col) FROM tbl;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=4) (actual time=53.599..53.599 rows=1 loops=1)
   ->  Gather  (cost=10633.33..10633.54 rows=2 width=4) (actual time=53.566..55.450 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=9633.33..9633.34 rows=1 width=4) (actual time=36.193..36.193 rows=1 loops=3)
               ->  Parallel Seq Scan on tbl  (cost=0.00..8591.67 rows=416667 width=4) (actual time=0.010..19.194 rows=333333 loops=3)
 Planning Time: 0.049 ms
 Execution Time: 55.474 ms
(8 rows)
  • ワーカーが2つ立ち上がって、leader含め3プロセスで仲良く分割している

    • rows=333333
  • getherごとのワーカー数
postgres=# SELECT * FROM pg_settings WHERE name='max_parallel_workers_per_gather'; 
-[ RECORD 1 ]---+-----------------------------------------------------------------
name            | max_parallel_workers_per_gather
setting         | 2
unit            | 
category        | Resource Usage / Asynchronous Behavior
short_desc      | Sets the maximum number of parallel processes per executor node.
extra_desc      | 
context         | user
vartype         | integer
source          | default
min_val         | 0
max_val         | 1024
enumvals        | 
boot_val        | 2
reset_val       | 2
sourcefile      | 
sourceline      | 
pending_restart | f
  • 1にしてみる
SET max_parallel_workers_per_gather=1;
EXPLAIN ANALYZE SELECT MIN(col) FROM tbl;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=12777.69..12777.70 rows=1 width=4) (actual time=66.168..66.168 rows=1 loops=1)
   ->  Gather  (cost=12777.57..12777.68 rows=1 width=4) (actual time=66.112..67.713 rows=2 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Partial Aggregate  (cost=11777.57..11777.58 rows=1 width=4) (actual time=52.731..52.731 rows=1 loops=2)
               ->  Parallel Seq Scan on tbl  (cost=0.00..10307.06 rows=588206 width=4) (actual time=0.008..24.513 rows=500000 loops=2)
 Planning Time: 0.050 ms
 Execution Time: 67.737 ms
(8 rows)
  • rows=500000になった

    • 1worker 1leaderの2分割
  • 0にしてみる
SET max_parallel_workers_per_gather=0;
EXPLAIN ANALYZE SELECT MIN(col) FROM tbl;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16924.38..16924.38 rows=1 width=4) (actual time=104.309..104.309 rows=1 loops=1)
   ->  Seq Scan on tbl  (cost=0.00..14424.50 rows=999950 width=4) (actual time=0.008..46.805 rows=1000000 loops=1)
 Planning Time: 0.041 ms
 Execution Time: 104.326 ms
(4 rows)
  • 分割しなくなった

  • パラレルスキャンには分割損がある

    • CPU数
    • CPU性能
    • ディスク性能
 Finalize Aggregate  (cost=12777.69..12777.70 rows=1 width=4) (actual time=66.168..66.168 rows=1 loops=1)
   ->  Gather  (cost=12777.57..12777.68 rows=1 width=4) (actual time=66.112..67.713 rows=2 loops=1)
  • パラメータ
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
...
# - Asynchronous Behavior -
...
#max_worker_processes = 8		# (change requires restart)
#max_parallel_maintenance_workers = 2	# taken from max_parallel_workers
#max_parallel_workers_per_gather = 2	# taken from max_parallel_workers
#parallel_leader_participation = on
#max_parallel_workers = 8		# maximum number of max_worker_processes that
					# can be used in parallel operations
  • パラレルスキャン対象テーブルの下限サイズ

    • 昔は「relation」として統合されてたみたい
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
...
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
  • コスト見積もり

    • 普通触らない
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
...
#parallel_tuple_cost = 0.1		# same scale as above
#parallel_setup_cost = 1000.0	# same scale as above

クエリチューニング

【事例7】ユーザ定義関数のチューニング

  • ユーザ定義関数を作る (初めて)
CREATE OR REPLACE FUNCTION user_func() RETURNS SETOF int AS
$$
SELECT * FROM tbl; $$ LANGUAGE sql;
  • 呼んでみる
SELECT * FROM user_func() limit 1;
 user_func 
-----------
         1
(1 row)

ユーザ定義関数に行数を指定する

EXPLAIN ANALYZE SELECT * FROM user_func();
                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Function Scan on user_func  (cost=0.25..10.25 rows=1000 width=4) (actual time=132.789..203.562 rows=1000000 loops=1)
 Planning Time: 0.017 ms
 Execution Time: 228.334 ms
(3 rows)
  • PostgreSQLはユーザ定義関数の中身の複雑性を知らない
  • ので、関数作成時、デフォルトで「1000行返す」と見積もる
  • 上記のユーザ定義関数は、実際には1000000行返しており、大きくずれている
  • 「1000000行くらいだよ」と教えてあげることで実行計画の精度を上げることができる
CREATE OR REPLACE FUNCTION user_func() RETURNS SETOF int AS
$$
SELECT * FROM tbl; $$ LANGUAGE sql ROWS 1000000;
  • ALTER FUNCTION文でも可
ALTER FUNCTION user_func ROWS 1000000;

pg_stat_user_functionsビュー

  • ユーザ定義関数の実行にかかった時間の記録
  • 情報を収集するには、track_functionsを有効にしておく必要あり

    • デフォルト無効(none)
postgres=# SELECT * FROM pg_settings WHERE name = 'track_functions';
-[ RECORD 1 ]---+---------------------------------------------------------
name            | track_functions
setting         | none
unit            | 
category        | Statistics / Query and Index Statistics Collector
short_desc      | Collects function-level statistics on database activity.
extra_desc      | 
context         | superuser
vartype         | enum
source          | default
min_val         | 
max_val         | 
enumvals        | {none,pl,all}
boot_val        | none
reset_val       | none
sourcefile      | 
sourceline      | 
pending_restart | f
SET track_functions=all;
  • 情報閲覧
SELECT * FROM pg_stat_user_functions;
 funcid | schemaname | funcname  | calls | total_time | self_time 
--------+------------+-----------+-------+------------+-----------
  16426 | public     | user_func |     1 |     261.87 |    261.87
(1 row)

【事例8】インデックスの追加

  • 関数インデックスの利用

    • 式に対してインデックスを作成
  • 部分インデックスの利用

    • whereで条件を指定してインデックスを作成
    • 行全体に対して作成するよりもサイズを抑えられる

【事例9】テーブルデータのクラスタ化

  • クラスタ性

    • インデックスの順番とテーブルの行データの物理的な並び順との相関
    • 欠落すると無駄なI/Oが発生してしまう
  • 歯抜けのテーブルを作ってみる
CREATE TABLE tbl2 (id int primary key, col int);
INSERT INTO tbl2 VALUES (generate_series(1,1000000),generate_series(1,1000000));
DELETE FROM tbl2 WHERE id % 2 = 0;
INSERT INTO tbl2 VALUES (generate_series(2,1000000,2),generate_series(2,1000000,2));

ANALYZE tbl2;
  • クラスタ性を確認

    • 低い
SELECT tablename,attname,correlation FROM pg_stats WHERE tablename = 'tbl2';
 tablename | attname | correlation 
-----------+---------+-------------
 tbl2      | id      |  0.50449115
 tbl2      | col     |  0.50449115
(2 rows)
  • 半数の行をSELECTするクエリの実行計画を見てみる
EXPLAIN ANALYZE SELECT col FROM tbl2 WHERE id < 500000;
  • Seq Scan
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on tbl2  (cost=0.00..19138.00 rows=498154 width=4) (actual time=0.009..81.759 rows=499999 loops=1)
   Filter: (id < 500000)
   Rows Removed by Filter: 500001
 Planning Time: 0.090 ms
 Execution Time: 94.453 ms
(5 rows)
  • クラスタ性を上げてみる
CLUSTER VERBOSE tbl2 USING tbl2_pkey;
INFO:  clustering "public.tbl2" using index scan on "tbl2_pkey"
INFO:  "tbl2": found 0 removable, 1000000 nonremovable row versions in 6638 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 0.39 s, system: 0.02 s, elapsed: 0.54 s.
  • ANALYZEを実行し、統計情報を更新する
postgres=# SELECT tablename,attname,correlation FROM pg_stats WHERE tablename = 'tbl2';
 tablename | attname | correlation 
-----------+---------+-------------
 tbl2      | id      |  0.50449115
 tbl2      | col     |  0.50449115
(2 rows)

postgres=# ANALYZE tbl2;
ANALYZE

postgres=# SELECT tablename,attname,correlation FROM pg_stats WHERE tablename = 'tbl2';
 tablename | attname | correlation 
-----------+---------+-------------
 tbl2      | id      |           1
 tbl2      | col     |           1
(2 rows)
  • 再度、同じSELECT文の実行計画を取得する
EXPLAIN ANALYZE SELECT col FROM tbl2 WHERE id < 500000;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tbl2_pkey on tbl2  (cost=0.42..16397.56 rows=498122 width=4) (actual time=0.022..66.706 rows=499999 loops=1)
   Index Cond: (id < 500000)
 Planning Time: 0.092 ms
 Execution Time: 79.227 ms
(4 rows)
  • インデックススキャンになり、実行時間も少し速くなった

    • 94.453 ms => 79.227 ms