PostgreSQL 高度技術者育成テキスト ch4 (性能監視とパフォーマンス) (1/2)

OSS-DBPostgreSQL勉強メモ

[asin:B00P4WD4QG:detail]


アクセス統計情報

  • 「何回スキャンした」とか「何行フェッチした」とか「キャッシュヒット率は」とかそういうの

パラメータ

#track_counts = on
#track_activities = on
#stats_temp_directory = 'pg_stat_tmp'
SELECT name,setting,context FROM pg_settings WHERE name IN ('track_counts','track_activities','stats_temp_directory');
         name         |   setting   |  context  
----------------------+-------------+-----------
 stats_temp_directory | pg_stat_tmp | sighup
 track_activities     | on          | superuser
 track_counts         | on          | superuser
(3 rows)
  • stats_temp_directoryをRAMディスク等にすることで性能向上を図ることができる

標準統計情報ビュー

  • 同一トランザクション内では同一結果を取得する
  • pg_stat_clear_snapshot()関数を呼び出すと最新情報に更新

pgstatactivity

SELECT * FROM pg_stat_activity;
 datid | datname  | pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         |          xact_start           |          query_start          |         state_change          | wait_event_type |     wait_event      | state  | backend_xid | backend_xmin |              query              |         backend_type         
-------+----------+------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+-----------------+---------------------+--------+-------------+--------------+---------------------------------+------------------------------
       |          |   60 |          |          |                  |             |                 |             | 2020-02-28 05:08:31.562713+00 |                               |                               |                               | Activity        | AutoVacuumMain      |        |             |              |                                 | autovacuum launcher
       |          |   62 |       10 | postgres |                  |             |                 |             | 2020-02-28 05:08:31.563011+00 |                               |                               |                               | Activity        | LogicalLauncherMain |        |             |              |                                 | logical replication launcher
 13408 | postgres | 5930 |       10 | postgres | psql             |             |                 |          -1 | 2020-02-28 13:59:31.380246+00 | 2020-02-28 14:36:17.461076+00 | 2020-02-28 14:36:17.461076+00 | 2020-02-28 14:36:17.461078+00 |                 |                     | active |             |          724 | SELECT * FROM pg_stat_activity; | client backend
       |          |   58 |          |          |                  |             |                 |             | 2020-02-28 05:08:31.562336+00 |                               |                               |                               | Activity        | BgWriterMain        |        |             |              |                                 | background writer
       |          |   57 |          |          |                  |             |                 |             | 2020-02-28 05:08:31.562139+00 |                               |                               |                               | Activity        | CheckpointerMain    |        |             |              |                                 | checkpointer
       |          |   59 |          |          |                  |             |                 |             | 2020-02-28 05:08:31.562557+00 |                               |                               |                               | Activity        | WalWriterMain       |        |             |              |                                 | walwriter
(6 rows)
  • クエリでlike検索してバックエンドプロセスのpidを特定したりできる
SELECT pid,query FROM pg_stat_activity WHERE query ~ 'select';
 pid  |                             query                              
------+----------------------------------------------------------------
 5930 | SELECT pid,query FROM pg_stat_activity WHERE query ~ 'select';
(1 row)
pg_cancel_backend(pid int)
pg_terminate_backend(pid int)
  • ロングトランザクションの特定にはxact_startカラムが有用
SELECT pid,xact_start,current_timestamp - xact_start as duration,query FROM pg_stat_activity;
 pid  |          xact_start           |    duration     |                                             query                                             
------+-------------------------------+-----------------+-----------------------------------------------------------------------------------------------
   60 |                               |                 | 
   62 |                               |                 | 
 5930 | 2020-02-28 14:44:39.646029+00 | 00:00:00        | SELECT pid,xact_start,current_timestamp - xact_start as duration,query FROM pg_stat_activity;
 6266 | 2020-02-28 14:43:26.322961+00 | 00:01:13.323068 | begin;
   58 |                               |                 | 
   57 |                               |                 | 
   59 |                               |                 | 
(7 rows)

pgstatdatabase

SELECT * FROM pg_stat_database WHERE datname='postgres';
-[ RECORD 1 ]---------+------------------------------
datid                 | 13408
datname               | postgres
numbackends           | 2
xact_commit           | 2534
xact_rollback         | 37
blks_read             | 118132
blks_hit              | 44975739
tup_returned          | 33852918
tup_fetched           | 29521
tup_inserted          | 13000518
tup_updated           | 2001360
tup_deleted           | 6000331
conflicts             | 0
temp_files            | 10
temp_bytes            | 80207872
deadlocks             | 0
checksum_failures     | 
checksum_last_failure | 
blk_read_time         | 0
blk_write_time        | 0
stats_reset           | 2020-02-28 05:31:21.975402+00
  • キャッシュヒット率を算出できる
SELECT 100.0*blks_hit/(blks_hit+blks_read) AS cache_hit_percent
  FROM pg_stat_database
 WHERE datname='postgres';
  cache_hit_percent  
---------------------
 99.7380336249316627
(1 row)
  • 共有バッファ上にないデータを取得するとblks_readが増加する
  • が、必ずしもディスクアクセスは発生していないことに留意する

    • OSのファイルシステムキャッシュ

pgstatbgwriter

SELECT * FROM pg_stat_bgwriter;
-[ RECORD 1 ]---------+------------------------------
checkpoints_timed     | 116
checkpoints_req       | 0
checkpoint_write_time | 1475203
checkpoint_sync_time  | 175
buffers_checkpoint    | 56492
buffers_clean         | 0
maxwritten_clean      | 0
buffers_backend       | 65991
buffers_backend_fsync | 0
buffers_alloc         | 67479
stats_reset           | 2020-02-28 05:08:31.361955+00
  • dirty pageの書き戻しの3パターン
  • checkpointer processによるもの

    • buffers_checkpoint
  • backgroud writer processによるもの

    • buffers_clean
    • background writerによる書き戻しは「クリーニング」という
  • backend processによるもの

    • buffers_backend
    • backend processが新しいデータを共有バッファに載せようとして空きがないときに生じる

      • buffers_backendが大きいということは、swapが頻発しているようなもの。shared_buffersのチューニングを検討する

pgstatall_tables

SELECT * FROM pg_stat_all_tables WHERE relname='tbl';
-[ RECORD 1 ]-------+------------------------------
relid               | 16517
schemaname          | public
relname             | tbl
seq_scan            | 4
seq_tup_read        | 2000000
idx_scan            | 0
idx_tup_fetch       | 0
n_tup_ins           | 1000000
n_tup_upd           | 500000
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 1000000
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         | 
last_autovacuum     | 2020-02-28 11:40:21.269596+00
last_analyze        | 2020-02-28 11:39:47.092236+00
last_autoanalyze    | 2020-02-28 11:38:20.543509+00
vacuum_count        | 0
autovacuum_count    | 1
analyze_count       | 2
autoanalyze_count   | 1
  • 取れる情報

    • シーケンシャルスキャン1回あたり読み込んだタプル数

      • seq_tup_read / seq_scan
    • VACUUM対象行数

      • n_dead_tup
    • HOT更新比率

      • n_hot_hot_upd / n_hot_upd
  • pg_stat_user_tablespg_stat_sys_tablesも同じ情報
SELECT * FROM pg_views WHERE viewname = 'pg_stat_user_tables';
-[ RECORD 1 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname | pg_catalog
viewname   | pg_stat_user_tables
viewowner  | postgres
definition |  SELECT pg_stat_all_tables.relid,                                                                                                                                 +
           |     pg_stat_all_tables.schemaname,                                                                                                                                +
           |     pg_stat_all_tables.relname,                                                                                                                                   +
           |     pg_stat_all_tables.seq_scan,                                                                                                                                  +
           |     pg_stat_all_tables.seq_tup_read,                                                                                                                              +
           |     pg_stat_all_tables.idx_scan,                                                                                                                                  +
           |     pg_stat_all_tables.idx_tup_fetch,                                                                                                                             +
           |     pg_stat_all_tables.n_tup_ins,                                                                                                                                 +
           |     pg_stat_all_tables.n_tup_upd,                                                                                                                                 +
           |     pg_stat_all_tables.n_tup_del,                                                                                                                                 +
           |     pg_stat_all_tables.n_tup_hot_upd,                                                                                                                             +
           |     pg_stat_all_tables.n_live_tup,                                                                                                                                +
           |     pg_stat_all_tables.n_dead_tup,                                                                                                                                +
           |     pg_stat_all_tables.n_mod_since_analyze,                                                                                                                       +
           |     pg_stat_all_tables.last_vacuum,                                                                                                                               +
           |     pg_stat_all_tables.last_autovacuum,                                                                                                                           +
           |     pg_stat_all_tables.last_analyze,                                                                                                                              +
           |     pg_stat_all_tables.last_autoanalyze,                                                                                                                          +
           |     pg_stat_all_tables.vacuum_count,                                                                                                                              +
           |     pg_stat_all_tables.autovacuum_count,                                                                                                                          +
           |     pg_stat_all_tables.analyze_count,                                                                                                                             +
           |     pg_stat_all_tables.autoanalyze_count                                                                                                                          +
           |    FROM pg_stat_all_tables                                                                                                                                        +
           |   WHERE ((pg_stat_all_tables.schemaname <> ALL (ARRAY['pg_catalog'::name, 'information_schema'::name])) AND (pg_stat_all_tables.schemaname !~ '^pg_toast'::text));

pgstatioall_tables

SELECT * FROM pg_statio_all_tables WHERE relname='tbl';
-[ RECORD 1 ]---+--------
relid           | 16517
schemaname      | public
relname         | tbl
heap_blks_read  | 6643
heap_blks_hit   | 2570806
idx_blks_read   | 8232
idx_blks_hit    | 2679699
toast_blks_read | 
toast_blks_hit  | 
tidx_blks_read  | 
tidx_blks_hit   | 
  • キャッシュヒット率を算出できる

    • テーブルデータ
    • インデックスデータ
    • toastテーブルデータ
    • toastインデックスデータ

pgstatallindexes, pgstatioallindexes

  • テーブル名かインデックス名を指定して、インデックスの統計情報を取得する
SELECT * FROM pg_stat_all_indexes WHERE relname='tbl';
-[ RECORD 1 ]-+---------
relid         | 16517
indexrelid    | 16520
schemaname    | public
relname       | tbl
indexrelname  | tbl_pkey
idx_scan      | 2
idx_tup_read  | 2
idx_tup_fetch | 0
  • ヒット/ミス数取得
  • 利用されていないインデックスを特定できる
SELECT * FROM pg_statio_all_indexes WHERE relname='tbl';
-[ RECORD 1 ]-+---------
relid         | 16517
indexrelid    | 16520
schemaname    | public
relname       | tbl
indexrelname  | tbl_pkey
idx_blks_read | 8235
idx_blks_hit  | 2679702
  • ヒット/ミスしたブロック数取得
SELECT *
  FROM pg_stat_user_indexes 
  NATURAL JOIN pg_statio_user_indexes 
  WHERE relname='tbl';
-[ RECORD 1 ]-+---------
relid         | 16517
indexrelid    | 16520
schemaname    | public
relname       | tbl
indexrelname  | tbl_pkey
idx_scan      | 2
idx_tup_read  | 2
idx_tup_fetch | 0
idx_blks_read | 8235
idx_blks_hit  | 2679702
  • 情報の多くが重複しているのでNATURAL JOINするといい感じ

テーブル/カラム統計情報

  • 一般的に「統計情報」と言ったときに指すのはこれ
  • テーブルのデータ状況に関する集計情報

    • 「何行ある」とか「何ページある」とか「行の平均長は」とかそういうの

pg_class

  • テーブル統計情報が格納される実表

pg_statistic

  • カラム統計情報が格納される実表
  • pg_statsシステムビューの裏側にあるやつ
  • superuserのみアクセス可能

    • 実データの一部が格納されているため
\dp pg_catalog.pg_statistic
Access privileges
-[ RECORD 1 ]-----+--------------------------
Schema            | pg_catalog
Name              | pg_statistic
Type              | table
Access privileges | postgres=arwdDxt/postgres
Column privileges | 
Policies          | 

pg_statsビュー

  • pg_statisticをラップするview

    • 可読性
    • 機密性
\dS+ pg_catalog.pg_stats;
                                 View "pg_catalog.pg_stats"
         Column         |   Type   | Collation | Nullable | Default | Storage  | Description 
------------------------+----------+-----------+----------+---------+----------+-------------
 schemaname             | name     |           |          |         | plain    | 
 tablename              | name     |           |          |         | plain    | 
 attname                | name     |           |          |         | plain    | 
 inherited              | boolean  |           |          |         | plain    | 
 null_frac              | real     |           |          |         | plain    | 
 avg_width              | integer  |           |          |         | plain    | 
 n_distinct             | real     |           |          |         | plain    | 
 most_common_vals       | anyarray |           |          |         | extended | 
 most_common_freqs      | real[]   |           |          |         | extended | 
 histogram_bounds       | anyarray |           |          |         | extended | 
 correlation            | real     |           |          |         | plain    | 
 most_common_elems      | anyarray |           |          |         | extended | 
 most_common_elem_freqs | real[]   |           |          |         | extended | 
 elem_count_histogram   | real[]   |           |          |         | extended | 
View definition:
 SELECT n.nspname AS schemaname,
    c.relname AS tablename,
    a.attname,
    s.stainherit AS inherited,
    s.stanullfrac AS null_frac,
    s.stawidth AS avg_width,
    s.stadistinct AS n_distinct,
        CASE
            WHEN s.stakind1 = 1 THEN s.stavalues1
            WHEN s.stakind2 = 1 THEN s.stavalues2
            WHEN s.stakind3 = 1 THEN s.stavalues3
            WHEN s.stakind4 = 1 THEN s.stavalues4
            WHEN s.stakind5 = 1 THEN s.stavalues5
            ELSE NULL::anyarray
        END AS most_common_vals,
        CASE
            WHEN s.stakind1 = 1 THEN s.stanumbers1
            WHEN s.stakind2 = 1 THEN s.stanumbers2
            WHEN s.stakind3 = 1 THEN s.stanumbers3
            WHEN s.stakind4 = 1 THEN s.stanumbers4
            WHEN s.stakind5 = 1 THEN s.stanumbers5
            ELSE NULL::real[]
        END AS most_common_freqs,
        CASE
            WHEN s.stakind1 = 2 THEN s.stavalues1
            WHEN s.stakind2 = 2 THEN s.stavalues2
            WHEN s.stakind3 = 2 THEN s.stavalues3
            WHEN s.stakind4 = 2 THEN s.stavalues4
            WHEN s.stakind5 = 2 THEN s.stavalues5
            ELSE NULL::anyarray
        END AS histogram_bounds,
        CASE
            WHEN s.stakind1 = 3 THEN s.stanumbers1[1]
            WHEN s.stakind2 = 3 THEN s.stanumbers2[1]
            WHEN s.stakind3 = 3 THEN s.stanumbers3[1]
            WHEN s.stakind4 = 3 THEN s.stanumbers4[1]
            WHEN s.stakind5 = 3 THEN s.stanumbers5[1]
            ELSE NULL::real
        END AS correlation,
        CASE
            WHEN s.stakind1 = 4 THEN s.stavalues1
            WHEN s.stakind2 = 4 THEN s.stavalues2
            WHEN s.stakind3 = 4 THEN s.stavalues3
            WHEN s.stakind4 = 4 THEN s.stavalues4
            WHEN s.stakind5 = 4 THEN s.stavalues5
            ELSE NULL::anyarray
        END AS most_common_elems,
        CASE
            WHEN s.stakind1 = 4 THEN s.stanumbers1
            WHEN s.stakind2 = 4 THEN s.stanumbers2
            WHEN s.stakind3 = 4 THEN s.stanumbers3
            WHEN s.stakind4 = 4 THEN s.stanumbers4
            WHEN s.stakind5 = 4 THEN s.stanumbers5
            ELSE NULL::real[]
        END AS most_common_elem_freqs,
        CASE
            WHEN s.stakind1 = 5 THEN s.stanumbers1
            WHEN s.stakind2 = 5 THEN s.stanumbers2
            WHEN s.stakind3 = 5 THEN s.stanumbers3
            WHEN s.stakind4 = 5 THEN s.stanumbers4
            WHEN s.stakind5 = 5 THEN s.stanumbers5
            ELSE NULL::real[]
        END AS elem_count_histogram
   FROM pg_statistic s
     JOIN pg_class c ON c.oid = s.starelid
     JOIN pg_attribute a ON c.oid = a.attrelid AND a.attnum = s.staattnum
     LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
  WHERE NOT a.attisdropped AND has_column_privilege(c.oid, a.attnum, 'select'::text) AND (c.relrowsecurity = false OR NOT row_security_active(c.oid));
Options: security_barrier=true
  • 狙った実行計画がうまく選ばれないときの原因究明に利用する

    • データの分布が著しく偏っている場合など

      • ALTER COLUMNSET STATISTICSでサンプリングを増やすことを検討する
SELECT * FROM pg_stats WHERE tablename='tbl';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tbl
attname                | id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {6,11112,20381,29603,38720,48218,58267,66937,77704,87383,97340,109027,118322,127614,137654,147661,156475,166068,175875,185442,195534,205481,215361,225577,234928,244671,255300,265418,275266,285158,295130,304705,315066,325755,335769,346576,355716,365933,375686,386432,396559,407528,419104,429297,439898,451131,462276,472714,482634,492001,501715,510535,521108,531142,540705,550849,561133,570801,582099,592533,602856,611872,622566,632553,643258,653579,663135,673316,682994,692543,702465,712022,722747,732361,742607,753056,762753,773339,783232,792908,802985,813022,823096,832868,842635,852173,862259,873031,881590,891798,901543,911009,921113,930949,940710,951101,960961,971411,981154,991145,999959}
correlation            | 0.49950376
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 

n_distinct

CREATE TABLE tbl (col int);
INSERT INTO tbl VALUES (generate_series(1,10000));
UPDATE tbl SET col = col / 10 * 10;

ANALYZE tbl;
SELECT * FROM pg_stats WHERE tablename='tbl';
  • マイナス: 異なる値の割合

    • 0, 10, 20, …, 10000 の1001種類を10000行で割ったものなので-0.1001
-[ RECORD 1 ]----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tbl
attname                | col
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -0.1001
most_common_vals       | {10,20,30,40,50,60,70,80,90,100,110,120,130,140,150,160,170,180,190,200,210,220,230,240,250,260,270,280,290,300,310,320,330,340,350,360,370,380,390,400,410,420,430,440,450,460,470,480,490,500,510,520,530,540,550,560,570,580,590,600,610,620,630,640,650,660,670,680,690,700,710,720,730,740,750,760,770,780,790,800,810,820,830,840,850,860,870,880,890,900,910,920,930,940,950,960,970,980,990,1000}
most_common_freqs      | {0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001}
histogram_bounds       | {0,1090,1180,1270,1360,1450,1540,1630,1720,1810,1900,1990,2080,2170,2260,2350,2440,2530,2620,2710,2800,2890,2980,3070,3160,3250,3340,3430,3520,3610,3700,3790,3880,3970,4060,4150,4240,4330,4420,4510,4600,4690,4780,4870,4960,5050,5140,5230,5320,5410,5500,5590,5680,5770,5860,5950,6040,6130,6220,6310,6400,6490,6580,6670,6760,6850,6940,7030,7120,7210,7300,7390,7480,7570,7660,7750,7840,7930,8020,8110,8200,8290,8380,8470,8560,8650,8740,8830,8920,9010,9100,9190,9280,9370,9460,9550,9640,9730,9820,9910,10000}
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
  • プラス: 異なる値の種類

    • 0, 100, 200, …, 10000 の101種類
    • 個別値の種類が少ないとこちらになるみたい
UPDATE tbl SET col = col / 100 * 100;

ANALYZE tbl;
SELECT * FROM pg_stats WHERE tablename='tbl';
-[ RECORD 1 ]----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | tbl
attname                | col
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | 101
most_common_vals       | {100,200,300,400,500,600,700,800,900,1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000,2100,2200,2300,2400,2500,2600,2700,2800,2900,3000,3100,3200,3300,3400,3500,3600,3700,3800,3900,4000,4100,4200,4300,4400,4500,4600,4700,4800,4900,5000,5100,5200,5300,5400,5500,5600,5700,5800,5900,6000,6100,6200,6300,6400,6500,6600,6700,6800,6900,7000,7100,7200,7300,7400,7500,7600,7700,7800,7900,8000,8100,8200,8300,8400,8500,8600,8700,8800,8900,9000,9100,9200,9300,9400,9500,9600,9700,9800,9900,0}
most_common_freqs      | {0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0099}
histogram_bounds       | 
correlation            | 0.96761876
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 

実行計画

  • EXPLAIN ANALYZE

    • 問い合わせを実際に実行して実行時間等添えるやつ
    • INSERT,UPDATE,DELETEは実際に挿入、更新、削除される
    • 実データに影響を与えたくない場合はトランザクション内で実行してROLLBACKする
CREATE TABLE tbl (id serial primary key, col int);

BEGIN;

EXPLAIN ANALYZE INSERT INTO tbl VALUES(generate_series(1,1000000));
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Insert on tbl  (cost=0.00..5000.02 rows=1000000 width=8) (actual time=2431.284..2431.284 rows=0 loops=1)
   ->  ProjectSet  (cost=0.00..5000.02 rows=1000000 width=8) (actual time=0.002..70.898 rows=1000000 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.022 ms
 Execution Time: 2431.304 ms
(5 rows)
ROLLBACK;
SELECT COUNT(*) FROM tbl;
 count 
-------
     0
(1 row)

ビットマップスキャン

  • インデックススキャンでは、インデックスとテーブルデータとを行き来する

    • 本の索引と本文とを行き来する感じ
    • なので、大量行をフェッチする場合は非効率
  • ビットマップスキャンの流れ

    1. まずインデックスに対して(シーケンシャル)アクセスを行う
    2. 取得対象のデータの格納位置と対応するビットマップを作成する
    3. ビットマップにシーケンシャルアクセスを行う
  • デメリット

    • ビットマップを作るぶんのオーバヘッド
  • メリット

    • 取得行数が多くても効率的
    • ビットマップスキャンでは、ビットマップ同士を論理演算することで、条件式でORが指定されてもインデックスを利用できる

      • cf. 単純なBtreeインデックスはOR条件で利用できない
EXPLAIN ANALYZE SELECT * FROM tbl WHERE id > 999990;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Index Scan using tbl_pkey on tbl  (cost=0.42..8.62 rows=11 width=8) (actual time=0.003..0.004 rows=10 loops=1)
   Index Cond: (id > 999990)
 Planning Time: 0.059 ms
 Execution Time: 0.013 ms
(4 rows)
EXPLAIN ANALYZE SELECT * FROM tbl WHERE id < 10;
                                                  QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
 Index Scan using tbl_pkey on tbl  (cost=0.42..8.58 rows=9 width=8) (actual time=0.003..0.004 rows=9 loops=1)
   Index Cond: (id < 10)
 Planning Time: 0.072 ms
 Execution Time: 0.014 ms
(4 rows)
EXPLAIN ANALYZE SELECT * FROM tbl WHERE id < 10 OR id > 999990;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=9.01..85.28 rows=20 width=8) (actual time=0.009..0.012 rows=19 loops=1)
   Recheck Cond: ((id < 10) OR (id > 999990))
   Heap Blocks: exact=2
   ->  BitmapOr  (cost=9.01..9.01 rows=20 width=0) (actual time=0.006..0.007 rows=0 loops=1)
         ->  Bitmap Index Scan on tbl_pkey  (cost=0.00..4.49 rows=9 width=0) (actual time=0.002..0.002 rows=9 loops=1)
               Index Cond: (id < 10)
         ->  Bitmap Index Scan on tbl_pkey  (cost=0.00..4.51 rows=11 width=0) (actual time=0.003..0.003 rows=10 loops=1)
               Index Cond: (id > 999990)
 Planning Time: 0.084 ms
 Execution Time: 0.028 ms
(10 rows)
  • Btreeインデックス自体はORが使えないので、ビットマップスキャンしない場合は、別々に取得してくっつけるしかない
EXPLAIN ANALYZE (SELECT * FROM tbl WHERE id < 10) UNION ALL (SELECT * FROM tbl WHERE id > 999990);
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.42..17.50 rows=20 width=8) (actual time=0.003..0.009 rows=19 loops=1)
   ->  Index Scan using tbl_pkey on tbl  (cost=0.42..8.58 rows=9 width=8) (actual time=0.002..0.004 rows=9 loops=1)
         Index Cond: (id < 10)
   ->  Index Scan using tbl_pkey on tbl tbl_1  (cost=0.42..8.62 rows=11 width=8) (actual time=0.002..0.003 rows=10 loops=1)
         Index Cond: (id > 999990)
 Planning Time: 0.098 ms
 Execution Time: 0.020 ms
(7 rows)

結合

Nested Loop Join

EXPLAIN ANALYZE SELECT * FROM (SELECT generate_series(1,10) AS id) t1 JOIN tbl t2 ON t1.id = t2.id;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.42..84.59 rows=10 width=12) (actual time=0.015..0.026 rows=10 loops=1)
   ->  ProjectSet  (cost=0.00..0.07 rows=10 width=4) (actual time=0.002..0.003 rows=10 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
   ->  Index Scan using tbl_pkey on tbl t2  (cost=0.42..8.44 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=10)
         Index Cond: (id = (generate_series(1, 10)))
 Planning Time: 0.098 ms
 Execution Time: 0.042 ms
(7 rows)
  • 駆動表の行数が少なく、内部表のインデックスを利用できる場合は速い

Hash Join

EXPLAIN ANALYZE SELECT * FROM (SELECT generate_series(1,10) AS id) t1 JOIN tbl t2 ON t1.id = t2.id + 1;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=0.29..19925.29 rows=50000 width=12) (actual time=0.017..126.043 rows=9 loops=1)
   Hash Cond: ((t2.id + 1) = (generate_series(1, 10)))
   ->  Seq Scan on tbl t2  (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.008..56.648 rows=1000000 loops=1)
   ->  Hash  (cost=0.17..0.17 rows=10 width=4) (actual time=0.006..0.006 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  ProjectSet  (cost=0.00..0.07 rows=10 width=4) (actual time=0.002..0.003 rows=10 loops=1)
               ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
 Planning Time: 0.059 ms
 Execution Time: 126.063 ms
(9 rows)
  • 先のクエリで内部表のインデックスが効かないとこれになった
  • hashを作るぶんのオーバヘッドがあるが、ルックアップは速い
   ->  Hash  (cost=0.17..0.17 rows=10 width=4) (actual time=0.006..0.006 rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
  • Batchesが2以上だと一時ファイルが生成される

    • work_memを増やすことで回避できる

Merge Join

EXPLAIN ANALYZE SELECT * FROM tbl t1 JOIN tbl t2 using (id) order by id;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.85..97744.85 rows=1000000 width=12) (actual time=0.011..428.027 rows=1000000 loops=1)
   Merge Cond: (t1.id = t2.id)
   ->  Index Scan using tbl_pkey on tbl t1  (cost=0.42..41372.43 rows=1000000 width=8) (actual time=0.005..118.542 rows=1000000 loops=1)
   ->  Index Scan using tbl_pkey on tbl t2  (cost=0.42..41372.43 rows=1000000 width=8) (actual time=0.004..106.024 rows=1000000 loops=1)
 Planning Time: 0.128 ms
 Execution Time: 451.885 ms
(6 rows)
  • それぞれソートして突き合わせるやつ
  • 結合キーのBtreeインデックスが効く場合はソートが生じないので速い