elasticsearch date_histogram min_doc_count extended_bounds 使用

  • Post author:
  • Post category:其他




简述

在 elasticsearch 中做时间的统计分析,最经常遇到的就是date_histogram



date_histogram

按时间间隔统计。支持 year,quarter,month,week ,day 等间隔统计 及时区设置。



min_doc_count

默认值0;返回最小的文档数。强制返回空数据。如果是0,时间间隔内缺少数据,则自动补充0.一般场景就是返回空数据,减少程序的处理。



extended_bounds 扩展,延伸

此值只有当min_doc_count 为0时才具有意义。

此值与min_doc_count 一起使用,是强制返回空数据。



实例

此查询条件查询的时间范围 0 至 2020-03-01,根据时间间隔 月,时区:Shanghai 进行统计分析。并且返回空数。

{
    "size": 0,
    "query": {
        "bool": {
            "must": [
                
                {
                    "range": {
                        "date": {
                            "from": 0,
                            "to": "2020-03-01",
                            "include_lower": true,
                            "include_upper": false,
                            "boost": 1
                        }
                    }
                }
               
            ],
           
            "adjust_pure_negative": true,
            "boost": 1
        }
    },
    "aggregations": {
        "date": {
            "date_histogram": {
                "field": "date",
                "time_zone": "Asia/Shanghai",
                "interval": "1M",
                "offset": 0,
                "order": {
                    "_key": "asc"
                },
                "keyed": false,
                "min_doc_count": 0,
                "extended_bounds": {
                    "min": "2020-01-01",
                    "max": "2020-04-01"
                }
            }
        }
    }
}

查询结果 第二条数据的结果 自动补充0.

{
    "took": 63,
    "timed_out": false,
    "_shards": {
        "total": 6,
        "successful": 6,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 32294,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "date": {
            "buckets": [
                {
                    "key_as_string": "2013-01-01T00:00:00.000+08:00",
                    "key": 1356969600000,
                    "doc_count": 8
                },
                {
                    "key_as_string": "2013-02-01T00:00:00.000+08:00",
                    "key": 1359648000000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2013-03-01T00:00:00.000+08:00",
                    "key": 1362067200000,
                    "doc_count": 0
                },



调整extended_bounds 范围

extended_bounds 的min 调整为 2012-11-01



执行结果

{
    "took": 9,
    "timed_out": false,
    "_shards": {
        "total": 6,
        "successful": 6,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 32294,
        "max_score": 0.0,
        "hits": []
    },
    "aggregations": {
        "date": {
            "buckets": [
                {
                    "key_as_string": "2012-11-01T00:00:00.000+08:00",
                    "key": 1351699200000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2012-12-01T00:00:00.000+08:00",
                    "key": 1354291200000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2013-01-01T00:00:00.000+08:00",
                    "key": 1356969600000,
                    "doc_count": 8
                },

返回的结果集中多了两条数据

{
                    "key_as_string": "2012-11-01T00:00:00.000+08:00",
                    "key": 1351699200000,
                    "doc_count": 0
                },
                {
                    "key_as_string": "2012-12-01T00:00:00.000+08:00",
                    "key": 1354291200000,
                    "doc_count": 0
                },

这两条数据都是0.



总结

extended_bounds 的作用就扩展数据范围。强制补充0. 因此只有min_doc_count 为 0 时才有意义。



版权声明:本文为tianshishangxin1原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。