ElasticSearch聚合简介¶
简介¶
聚合是 Elasticsearch 除搜索以外,提供的针对 ES 数据进行统计分析功能。通过聚合,我们可以得到数据的概览,是分析和总结全套的数据,而不是寻找单个文档。
Elasticsearch 的主要优点就是:
- 实时性高,和 Hadoop 相比会更快
- 性能高,可直接通过提供的API就能得到分析结果

聚合的分类¶
集合的分类主要有以下:
- Bucket Aggregation:一些列满足特定条件的文档的集合
- Metric Aggregation:一些数学运算,可以对文档字段进行统计分析
- Pipeline Aggregation:对其他的聚合结果进行二次聚合
- Matrix Aggregation:支持对多个字段的操作并提取一个结果矩阵
Bucket&Metric¶
与SQL进行比较:
SELECT COUNT(brand) => Metric:一些系列的统计方法 FROM cars GROUP by band => Bucket:一组满足条件的文档
JSON
#价格统计信息+天气信息
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs":{
"flight_dest":{
"terms":{
"field":"DestCountry"
},
"aggs":{
"stats_price":{
"stats":{
"field":"AvgTicketPrice"
}
},
"wather":{
"terms": {
"field": "DestWeather",
"size": 5
}
}
}
}
}
}
#返回如下:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"flight_dest" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 3187,
"buckets" : [
{
"key" : "IT",
"doc_count" : 2371,
"max_price" : {
"value" : 1195.3363037109375
},
"min_price" : {
"value" : 100.57646942138672
},
"avg_price" : {
"value" : 586.9627099618385
}
},
{
"key" : "US",
"doc_count" : 1987,
"max_price" : {
"value" : 1199.72900390625
},
"min_price" : {
"value" : 100.14596557617188
},
"avg_price" : {
"value" : 595.7743908825026
}
},
{
"key" : "CN",
"doc_count" : 1096,
"max_price" : {
"value" : 1198.4901123046875
},
"min_price" : {
"value" : 102.90382385253906
},
"avg_price" : {
"value" : 640.7101617033464
}
},
{
"key" : "CA",
"doc_count" : 944,
"max_price" : {
"value" : 1198.8525390625
},
"min_price" : {
"value" : 100.5572509765625
},
"avg_price" : {
"value" : 648.7471090413757
}
},
{
"key" : "JP",
"doc_count" : 774,
"max_price" : {
"value" : 1199.4913330078125
},
"min_price" : {
"value" : 103.97209930419922
},
"avg_price" : {
"value" : 650.9203447346847
}
},
{
"key" : "RU",
"doc_count" : 739,
"max_price" : {
"value" : 1196.7423095703125
},
"min_price" : {
"value" : 101.0040054321289
},
"avg_price" : {
"value" : 662.9949632162009
}
},
{
"key" : "CH",
"doc_count" : 691,
"max_price" : {
"value" : 1196.496826171875
},
"min_price" : {
"value" : 101.3473129272461
},
"avg_price" : {
"value" : 575.1067587028537
}
},
{
"key" : "GB",
"doc_count" : 449,
"max_price" : {
"value" : 1197.78564453125
},
"min_price" : {
"value" : 111.34574890136719
},
"avg_price" : {
"value" : 650.5326856005696
}
},
{
"key" : "AU",
"doc_count" : 416,
"max_price" : {
"value" : 1197.6326904296875
},
"min_price" : {
"value" : 102.2943115234375
},
"avg_price" : {
"value" : 669.5588319668403
}
},
{
"key" : "PL",
"doc_count" : 405,
"max_price" : {
"value" : 1185.43701171875
},
"min_price" : {
"value" : 104.28328704833984
},
"avg_price" : {
"value" : 662.4497233072917
}
}
]
}
}
}
Bucket¶
Elasticsearch 提供了很多类型的 Bucket,可以多种方式划分文档:
- Term&Range:时间/年龄区间/地理位置等
比如下面这些例子:
- 杭州属于浙江/一个演员属于男或女性
- 嵌套关系 - 杭州属于浙江属于中国属于亚洲
JSON
#按照目的地进行分桶统计
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs":{
"flight_dest":{
"terms":{
"field":"DestCountry"
}
}
}
}
#返回如下:
{
"took" : 33,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"flight_dest" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 3187,
"buckets" : [
{
"key" : "IT",
"doc_count" : 2371
},
{
"key" : "US",
"doc_count" : 1987
},
{
"key" : "CN",
"doc_count" : 1096
},
{
"key" : "CA",
"doc_count" : 944
},
{
"key" : "JP",
"doc_count" : 774
},
{
"key" : "RU",
"doc_count" : 739
},
{
"key" : "CH",
"doc_count" : 691
},
{
"key" : "GB",
"doc_count" : 449
},
{
"key" : "AU",
"doc_count" : 416
},
{
"key" : "PL",
"doc_count" : 405
}
]
}
}
}
Metric¶
Metric 会基于数据集计算结果,除了支持在字段上进行计算,同样支持在脚本(painless script)产生的结果之上进行计算:
- 大部分 Metric 是数学计算,仅输出一个值
- min/max/sum/avg/cardinality
- 部分 metric 支持输出多个数值
- stats/percentiles/percentile_ranks
JSON
#查看航班目的地的统计信息,增加平均,最高最低价格
GET kibana_sample_data_flights/_search
{
"size": 0,
"aggs":{
"flight_dest":{
"terms":{
"field":"DestCountry"
},
"aggs":{
"avg_price":{
"avg":{
"field":"AvgTicketPrice"
}
},
"max_price":{
"max":{
"field":"AvgTicketPrice"
}
},
"min_price":{
"min":{
"field":"AvgTicketPrice"
}
}
}
}
}
}
#返回如下:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"flight_dest" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 3187,
"buckets" : [
{
"key" : "IT",
"doc_count" : 2371,
"max_price" : {
"value" : 1195.3363037109375
},
"min_price" : {
"value" : 100.57646942138672
},
"avg_price" : {
"value" : 586.9627099618385
}
},
{
"key" : "US",
"doc_count" : 1987,
"max_price" : {
"value" : 1199.72900390625
},
"min_price" : {
"value" : 100.14596557617188
},
"avg_price" : {
"value" : 595.7743908825026
}
},
{
"key" : "CN",
"doc_count" : 1096,
"max_price" : {
"value" : 1198.4901123046875
},
"min_price" : {
"value" : 102.90382385253906
},
"avg_price" : {
"value" : 640.7101617033464
}
},
{
"key" : "CA",
"doc_count" : 944,
"max_price" : {
"value" : 1198.8525390625
},
"min_price" : {
"value" : 100.5572509765625
},
"avg_price" : {
"value" : 648.7471090413757
}
},
{
"key" : "JP",
"doc_count" : 774,
"max_price" : {
"value" : 1199.4913330078125
},
"min_price" : {
"value" : 103.97209930419922
},
"avg_price" : {
"value" : 650.9203447346847
}
},
{
"key" : "RU",
"doc_count" : 739,
"max_price" : {
"value" : 1196.7423095703125
},
"min_price" : {
"value" : 101.0040054321289
},
"avg_price" : {
"value" : 662.9949632162009
}
},
{
"key" : "CH",
"doc_count" : 691,
"max_price" : {
"value" : 1196.496826171875
},
"min_price" : {
"value" : 101.3473129272461
},
"avg_price" : {
"value" : 575.1067587028537
}
},
{
"key" : "GB",
"doc_count" : 449,
"max_price" : {
"value" : 1197.78564453125
},
"min_price" : {
"value" : 111.34574890136719
},
"avg_price" : {
"value" : 650.5326856005696
}
},
{
"key" : "AU",
"doc_count" : 416,
"max_price" : {
"value" : 1197.6326904296875
},
"min_price" : {
"value" : 102.2943115234375
},
"avg_price" : {
"value" : 669.5588319668403
}
},
{
"key" : "PL",
"doc_count" : 405,
"max_price" : {
"value" : 1185.43701171875
},
"min_price" : {
"value" : 104.28328704833984
},
"avg_price" : {
"value" : 662.4497233072917
}
}
]
}
}
}
参考¶
https://time.geekbang.org/course/intro/100030501?tab=catalog