栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

elasticsearch Histogram field type 使用及注意事项

elasticsearch Histogram field type 使用及注意事项

elasticsearch Histogram field type 使用及注意事项

HistogramHistogram field typeQuick start

Error exampleAggregation

min aggregationmaxsumvalue_countavghistogram aggregation Query

exists query END

Histogram

先附上文档链接: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/histogram.html

当在网络上搜索 elasticsearch Histogram 时,会有两个结果:

type Histogramaggregation Histogram

但是 对于 aggregation 的结果会比较多,而 type 的却很少,那么,本篇博文主要记录 type Histogram 的使用以及注意事项。ps(本篇博文还有一些未理解的点待调研,因此,本篇博文会不断更新)

Histogram field type

Histogram 是由两个成对数组定义的类型。
它有以下注意事项:

values 存储类型为 double 而且必须升序counts 必须是 integet 必须是正整数或者0这两个数组的长度是一致的,这是因为他们的值一 一 对应并且不支持 嵌套数组,以及排序。

Histogram 存储的数据为二进制文档,而不是索引,这样可以更快速的聚合,它的字节大小最多为 13*数组的长度。

Quick start

添加 mapping

PUT histogram_test
{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

添加数据

PUT histogram_test/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}
PUT histogram_test/_doc/2
{
  "my_text" : "histogram_2",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 1], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}
Error example

错误示范: 添加 values 不是递增的字段

PUT histogram_test/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.1, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}
 
***********result************** 
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [my_histogram] of type [histogram]",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
    }
  },
  "status" : 400
}

错误示范:counts 的数值小于0

PUT histogram_test/_doc/3
{
  "my_text" : "histogram_3",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 1], 
      "counts" : [3, 7, 23, 12, -6] 
   }
}
 
***********result**************
 
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [my_histogram] of type [histogram]",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
    }
  },
  "status" : 400
}
Aggregation

min aggregationmax aggregationsum aggregationvalue_count aggregationavg aggregationpercentiles aggregation (ps 还没搞懂,待调研)percentile ranks aggregation (ps 还没搞懂,待调研)boxplot aggregation (ps 还没搞懂,待调研)histogram aggregationrange aggregation (ps 还没搞懂,待调研) min aggregation

将 values 中 最小的值返回

GET /histogram_test/_search
{
  "aggs": {
    "min_latency": {
      "min": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
 
 "aggregations" : {
    "min_latency" : {
      "value" : 0.1
    }
  }
max

将 values 中 最大的值返回

GET /histogram_test/_search
{
  "aggs": {
    "max_histogram": {
      "max": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "max_histogram" : {
      "value" : 1.0
    }
  }
sum

将 values 和 counts 的一一对应的值进行相乘,最后在一起相加。

GET /histogram_test/_search
{
  "aggs": {
    "sum_histogram": {
      "sum": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "sum_histogram" : {
      "value" : 35.8
    }
  }
value_count

对所有 counts 的值进行相加。

GET /histogram_test/_search
{
  "aggs": {
    "count_histogram": {
      "value_count": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
  "aggregations" : {
    "count_histogram" : {
      "value" : 102
    }
  }
avg

将值数组 values 中的每个数字乘以其在计数数组 counts 中的关联计数。最终,它将计算所有直方图的这些值的平均值,可以理解成 sum / count.

GET /histogram_test/_search
{
  "aggs": {
    "avg_histogram": {
      "avg": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "avg_histogram" : {
      "value" : 0.3509803921568627
    }
  }
histogram aggregation

根据 values 计算出每个区间的数量。
interval 区间的间隔数。

GET /histogram_test/_search
{
  "aggs": {
    "histogram_histogram": {
      "histogram": {
        "field": "my_histogram",
        "interval": 0.5
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "histogram_histogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 90
        },
        {
          "key" : 0.5,
          "doc_count" : 6
        },
        {
          "key" : 1.0,
          "doc_count" : 6
        }
      ]
    }
  }
Query

只有指定的查询才可用。

exists query
GET /histogram_test/_search
{
  "query": {
    "exists": {
      "field": "my_histogram"
    }
  }
}
END

博文中的待调研的部分,博主会在后续的时间里进行补齐,欢迎小伙伴们多多交流。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/746445.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号