目录

ElasticSearch聚合简介

简介

聚合是 Elasticsearch 除搜索以外,提供的针对 ES 数据进行统计分析功能。通过聚合,我们可以得到数据的概览,是分析和总结全套的数据,而不是寻找单个文档。

Elasticsearch 的主要优点就是:

  • 实时性高,和 Hadoop 相比会更快
  • 性能高,可直接通过提供的API就能得到分析结果

https://img.zhengyua.cn/img/202203081024738.png

聚合的分类

集合的分类主要有以下:

  • Bucket Aggregation:一些列满足特定条件的文档的集合
  • Metric Aggregation:一些数学运算,可以对文档字段进行统计分析
  • Pipeline Aggregation:对其他的聚合结果进行二次聚合
  • Matrix Aggregation:支持对多个字段的操作并提取一个结果矩阵

Bucket&Metric

与SQL进行比较:

SELECT COUNT(brand) => Metric:一些系列的统计方法 FROM cars GROUP by band => Bucket:一组满足条件的文档

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
#价格统计信息+天气信息
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs":{
    "flight_dest":{
      "terms":{
        "field":"DestCountry"
      },
      "aggs":{
        "stats_price":{
          "stats":{
            "field":"AvgTicketPrice"
          }
        },
        "wather":{
          "terms": {
            "field": "DestWeather",
            "size": 5
          }
        }

      }
    }
  }
}


#返回如下:
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "flight_dest" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3187,
      "buckets" : [
        {
          "key" : "IT",
          "doc_count" : 2371,
          "max_price" : {
            "value" : 1195.3363037109375
          },
          "min_price" : {
            "value" : 100.57646942138672
          },
          "avg_price" : {
            "value" : 586.9627099618385
          }
        },
        {
          "key" : "US",
          "doc_count" : 1987,
          "max_price" : {
            "value" : 1199.72900390625
          },
          "min_price" : {
            "value" : 100.14596557617188
          },
          "avg_price" : {
            "value" : 595.7743908825026
          }
        },
        {
          "key" : "CN",
          "doc_count" : 1096,
          "max_price" : {
            "value" : 1198.4901123046875
          },
          "min_price" : {
            "value" : 102.90382385253906
          },
          "avg_price" : {
            "value" : 640.7101617033464
          }
        },
        {
          "key" : "CA",
          "doc_count" : 944,
          "max_price" : {
            "value" : 1198.8525390625
          },
          "min_price" : {
            "value" : 100.5572509765625
          },
          "avg_price" : {
            "value" : 648.7471090413757
          }
        },
        {
          "key" : "JP",
          "doc_count" : 774,
          "max_price" : {
            "value" : 1199.4913330078125
          },
          "min_price" : {
            "value" : 103.97209930419922
          },
          "avg_price" : {
            "value" : 650.9203447346847
          }
        },
        {
          "key" : "RU",
          "doc_count" : 739,
          "max_price" : {
            "value" : 1196.7423095703125
          },
          "min_price" : {
            "value" : 101.0040054321289
          },
          "avg_price" : {
            "value" : 662.9949632162009
          }
        },
        {
          "key" : "CH",
          "doc_count" : 691,
          "max_price" : {
            "value" : 1196.496826171875
          },
          "min_price" : {
            "value" : 101.3473129272461
          },
          "avg_price" : {
            "value" : 575.1067587028537
          }
        },
        {
          "key" : "GB",
          "doc_count" : 449,
          "max_price" : {
            "value" : 1197.78564453125
          },
          "min_price" : {
            "value" : 111.34574890136719
          },
          "avg_price" : {
            "value" : 650.5326856005696
          }
        },
        {
          "key" : "AU",
          "doc_count" : 416,
          "max_price" : {
            "value" : 1197.6326904296875
          },
          "min_price" : {
            "value" : 102.2943115234375
          },
          "avg_price" : {
            "value" : 669.5588319668403
          }
        },
        {
          "key" : "PL",
          "doc_count" : 405,
          "max_price" : {
            "value" : 1185.43701171875
          },
          "min_price" : {
            "value" : 104.28328704833984
          },
          "avg_price" : {
            "value" : 662.4497233072917
          }
        }
      ]
    }
  }
}

Bucket

Elasticsearch 提供了很多类型的 Bucket,可以多种方式划分文档:

  • Term&Range:时间/年龄区间/地理位置等

比如下面这些例子:

  • 杭州属于浙江/一个演员属于男或女性
  • 嵌套关系 - 杭州属于浙江属于中国属于亚洲
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#按照目的地进行分桶统计
GET kibana_sample_data_flights/_search
{
    "size": 0,
    "aggs":{
        "flight_dest":{
            "terms":{
                "field":"DestCountry"
            }
        }
    }
}

#返回如下:
{
  "took" : 33,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "flight_dest" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3187,
      "buckets" : [
        {
          "key" : "IT",
          "doc_count" : 2371
        },
        {
          "key" : "US",
          "doc_count" : 1987
        },
        {
          "key" : "CN",
          "doc_count" : 1096
        },
        {
          "key" : "CA",
          "doc_count" : 944
        },
        {
          "key" : "JP",
          "doc_count" : 774
        },
        {
          "key" : "RU",
          "doc_count" : 739
        },
        {
          "key" : "CH",
          "doc_count" : 691
        },
        {
          "key" : "GB",
          "doc_count" : 449
        },
        {
          "key" : "AU",
          "doc_count" : 416
        },
        {
          "key" : "PL",
          "doc_count" : 405
        }
      ]
    }
  }
}

Metric

Metric 会基于数据集计算结果,除了支持在字段上进行计算,同样支持在脚本(painless script)产生的结果之上进行计算

  • 大部分 Metric 是数学计算,仅输出一个值
    • min/max/sum/avg/cardinality
  • 部分 metric 支持输出多个数值
    • stats/percentiles/percentile_ranks
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
#查看航班目的地的统计信息,增加平均,最高最低价格
GET kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs":{
    "flight_dest":{
      "terms":{
        "field":"DestCountry"
      },
      "aggs":{
        "avg_price":{
          "avg":{
            "field":"AvgTicketPrice"
          }
        },
        "max_price":{
          "max":{
            "field":"AvgTicketPrice"
          }
        },
        "min_price":{
          "min":{
            "field":"AvgTicketPrice"
          }
        }
      }
    }
  }
}

#返回如下:
{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "flight_dest" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 3187,
      "buckets" : [
        {
          "key" : "IT",
          "doc_count" : 2371,
          "max_price" : {
            "value" : 1195.3363037109375
          },
          "min_price" : {
            "value" : 100.57646942138672
          },
          "avg_price" : {
            "value" : 586.9627099618385
          }
        },
        {
          "key" : "US",
          "doc_count" : 1987,
          "max_price" : {
            "value" : 1199.72900390625
          },
          "min_price" : {
            "value" : 100.14596557617188
          },
          "avg_price" : {
            "value" : 595.7743908825026
          }
        },
        {
          "key" : "CN",
          "doc_count" : 1096,
          "max_price" : {
            "value" : 1198.4901123046875
          },
          "min_price" : {
            "value" : 102.90382385253906
          },
          "avg_price" : {
            "value" : 640.7101617033464
          }
        },
        {
          "key" : "CA",
          "doc_count" : 944,
          "max_price" : {
            "value" : 1198.8525390625
          },
          "min_price" : {
            "value" : 100.5572509765625
          },
          "avg_price" : {
            "value" : 648.7471090413757
          }
        },
        {
          "key" : "JP",
          "doc_count" : 774,
          "max_price" : {
            "value" : 1199.4913330078125
          },
          "min_price" : {
            "value" : 103.97209930419922
          },
          "avg_price" : {
            "value" : 650.9203447346847
          }
        },
        {
          "key" : "RU",
          "doc_count" : 739,
          "max_price" : {
            "value" : 1196.7423095703125
          },
          "min_price" : {
            "value" : 101.0040054321289
          },
          "avg_price" : {
            "value" : 662.9949632162009
          }
        },
        {
          "key" : "CH",
          "doc_count" : 691,
          "max_price" : {
            "value" : 1196.496826171875
          },
          "min_price" : {
            "value" : 101.3473129272461
          },
          "avg_price" : {
            "value" : 575.1067587028537
          }
        },
        {
          "key" : "GB",
          "doc_count" : 449,
          "max_price" : {
            "value" : 1197.78564453125
          },
          "min_price" : {
            "value" : 111.34574890136719
          },
          "avg_price" : {
            "value" : 650.5326856005696
          }
        },
        {
          "key" : "AU",
          "doc_count" : 416,
          "max_price" : {
            "value" : 1197.6326904296875
          },
          "min_price" : {
            "value" : 102.2943115234375
          },
          "avg_price" : {
            "value" : 669.5588319668403
          }
        },
        {
          "key" : "PL",
          "doc_count" : 405,
          "max_price" : {
            "value" : 1185.43701171875
          },
          "min_price" : {
            "value" : 104.28328704833984
          },
          "avg_price" : {
            "value" : 662.4497233072917
          }
        }
      ]
    }
  }
}

参考

https://time.geekbang.org/course/intro/100030501?tab=catalog