ElasticSearch DSL 详解

学前必读
- 1 基本概念
SELECT
WHERE
- logical operator
- filter
GROUP BY
- metric
- buckets
- nested
HAVING
ORDER BY
LIMIT
案例分析

在学习之前首先你要弄懂下面这条sql的每个部分的意义，以及它能干什么？因为这条sql就是我们这次内容的提纲。

SELECT 
	FIELD1,  
	AVG(FIELD2) AS AVG_VAL
 FROM 
 	TABLE 
 WHERE 
 	FIELD1='ELASTICSEARCH' 
 GROUP BY 
 	FIELD1 
 HAVING 
 	AVG_VAL>20 
 ORDER BY 
 	FIELD1 desc
 LIMIT 0,10

学前必读

1 基本概念

索引

相当于关系型数据库中的数据库
类型

相当于关系型数据库中的数据表，但新版本的ES中，淡化了类型的概念，索引和类型从1对多的关系变成了一对一的关系，
数据结构

基本数据结构:text, keyword, date, long, double, boolean,ip

json数据结构:object, nested

专用数据结构:geo_point, geo_shape, completion
查询api

单条

GET /index/type/id

多条

GET/POST /index/type/_search
查询语句

json就是es的空气！

GET /analysis/_search
{
  "_source": {   ---SELECT
    "includes": ["fileName","starttime","duration","repNo","repGroup","typeOfService"],
    "excludes": ["blankKeyword","keyword","topicHitDetail"]
  }, 
  "query": {   ---WHERE
    "bool": {
      "filter": {
        "term": {
          "typeOfService": "转账"
        }
      }
    }
  },
  "aggs": {  ---GROUP BY
    "class_buckets": {  ---HAVING
      "filter": {
        "range": {
          "duration": {
            "gte": 600
          }
        }
      },
      "aggs": {
        "class_count": {
          "terms": {
            "field": "classfication_f"
          },
          "aggs": {
            "count": {
              "value_count": {
                "field": "classfication_f"
              }
            },
            "count_filter":{
              "bucket_selector": { ------HAVING
                "buckets_path": {
                  "count":"count"
                },
                "script": "params.count>=1000"
              }
            }
          }
        }
      }
    }
  },
  "from": 0, ---LIMIT
  "size": 10, 
  "sort": [    ---ORDER BY
    {
      "starttime": {
        "order": "desc"
      }
    }
  ]
}

SELECT

以JSON数组格式，传入需要（不需要）的字段。

指定需要的字段

“_source”: [“fileName”,“starttime”,“duration”,“repNo”,“repGroup”,“typeOfService”]

指定不需要的字段

“_source”: {

“excludes”: [“blankKeyword”,“keyword”,“topicHitDetail”]

}

即指定需要的也指定不需要的

“_source”: {

“includes”: [“fileName”,“starttime”,“duration”,“repNo”,“repGroup”,“typeOfService”],

“excludes”: [“blankKeyword”,“keyword”,“topicHitDetail”]

}

WHERE

logical operator

逻辑关系	对应关键字
与	must, filter
或	should
非	must_not

它们之间可以通过bool关键子互相组合 e.g.

"query": {
    "bool": {
      "must": [
        {}
      ], 
      "must_not": [
        {}
      ],
      "should": [
        {"bool": {
          "must": [
            {"term": {
              "callNumber": {
                "value": "95533"
              }
            }}
          ]
        }}
      ], 
      "filter": [
        { "term": {
          "typeOfService": "转账"
          }
        }
      ]
    }
  }

filter

filter名称	说明
Exists	是否存在
Ids	ID
Prefix	前缀匹配
Range	范围
Regexp	正则
Script	脚本
Term	单值精确匹配
Terms	多值精确匹配
WildCard	通配符

更多查询方法

GROUP BY

metric

聚合函数	描述
avg	平均值
min	最小值
max	最大值
sum	求和
value_count	值的数量
cardinality	不重复值的数量
percentile	百分位数
stats	状态

buckets

分组类型	描述
terms	按字段分组
histogram	按指定数值间隔分组
date_histogram	按指定日期间隔分组
range	按区间分组

nested

聚合nested结构

ElasticSearch 的聚合

HAVING

aggs filter

ORDER BY

查询的排序

"sort": [ 
    {
      "starttime": {
        "order": "desc"
      }
    }
  ]

聚合的排序

"order": [
{"_key": "asc"}
]

LIMIT

因为查询性能，通过limit方式并不能得到全部分页，elasticsearch默认配置，可以通过，分页查询到10000条的数据，这个阈值可以修改。如果需要的到全部查询结果，通过scrollAPI进行获取。

案例分析

1、查询

2、聚合

2.1简单分桶

2.2按固定日期间隔分桶

2.3按时段分桶

2.4子聚合查询

2.4度量聚合

重点讲 cardinality和percentile

2.5nested聚合

3.推荐算法

搜索推荐

seach-time推荐相关阅读

索引推荐

index-time推荐相关阅读

3.评分算法

TF/IDF

空间向量模型

原文链接：https://blog.csdn.net/zzl394935072/article/details/88920361