Elasticsearchチュートリアル -- 集計

Elasticsearch勉強メモ

Analyze results with aggregations

[https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-aggregations.html:embed:cite]

キーワードで集計

curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}'
{
  "took" : 49,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30
        },
        {
          "key" : "MD",
          "doc_count" : 28
        },
        {
          "key" : "ID",
          "doc_count" : 27
        },
        {
          "key" : "AL",
          "doc_count" : 25
        },
        {
          "key" : "ME",
          "doc_count" : 25
        },
        {
          "key" : "TN",
          "doc_count" : 25
        },
        {
          "key" : "WY",
          "doc_count" : 25
        },
        {
          "key" : "DC",
          "doc_count" : 24
        },
        {
          "key" : "MA",
          "doc_count" : 24
        },
        {
          "key" : "ND",
          "doc_count" : 24
        }
      ]
    }
  }
}
  • "size": 0指定につき、hits.hitsは0件
  • 代わりに"aggregations"がある
...
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
...
  • group_by_stateという名前で集計
  • 集計条件は、state(州)のキーワードの一致
{
  ...
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 743,
      "buckets" : [
        {
          "key" : "TX",
          "doc_count" : 30
        },
        {
          "key" : "MD",
          "doc_count" : 28
        },
        {
          "key" : "ID",
          "doc_count" : 27
        },
        {
          "key" : "AL",
          "doc_count" : 25
        },
        {
          "key" : "ME",
          "doc_count" : 25
        },
        {
          "key" : "TN",
          "doc_count" : 25
        },
        {
          "key" : "WY",
          "doc_count" : 25
        },
        {
          "key" : "DC",
          "doc_count" : 24
        },
        {
          "key" : "MA",
          "doc_count" : 24
        },
        {
          "key" : "ND",
          "doc_count" : 24
        }
      ]
    }
  }
}
  • 州ごとのドキュメント数が集計される
  • デフォルト降順の10件
  • 一番多いTX(テキサス)でキーワード一致検索してみる
curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d '{
  "size": 0,
  "query": {
    "match": {
      "state": "TX"
    }
  }
}'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 30,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}
  • 確かに30件
...
    "total" : {
      "value" : 30,
      "relation" : "eq"
    },
...

集計組み合わせ

  • 州ごとにドキュメントを集計 (外側のaggs)
  • 各州の最低・平均・最高残高を集計 (内側のaggs)
  • 平均残高上位3州を表示 (表示するのが州なので外側のaggs)
curl -XGET 'localhost:9200/bank/_search?pretty' -H 'Content-Type: application/json' -d '{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "size": 3,
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "min_balance": {
          "min": {
            "field": "balance"
          }
        },
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        },
        "max_balance": {
          "max": {
            "field": "balance"
          }
        }
      }
    }
  }
}'
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_state" : {
      "doc_count_error_upper_bound" : -1,
      "sum_other_doc_count" : 956,
      "buckets" : [
        {
          "key" : "CO",
          "doc_count" : 14,
          "min_balance" : {
            "value" : 8840.0
          },
          "max_balance" : {
            "value" : 49119.0
          },
          "average_balance" : {
            "value" : 32460.35714285714
          }
        },
        {
          "key" : "NE",
          "doc_count" : 16,
          "min_balance" : {
            "value" : 17128.0
          },
          "max_balance" : {
            "value" : 47091.0
          },
          "average_balance" : {
            "value" : 32041.5625
          }
        },
        {
          "key" : "AZ",
          "doc_count" : 14,
          "min_balance" : {
            "value" : 6176.0
          },
          "max_balance" : {
            "value" : 49205.0
          },
          "average_balance" : {
            "value" : 31634.785714285714
          }
        }
      ]
    }
  }
}