文章目录

bulk批量索引
- bulk API
- 测试bulk
简单查询
- match_all
- from/size
- match
- match_phrase
- bool
小结

bulk批量索引

bulk
用于在一个API调用中执行多个索引或删除操作. 这减少了开销, 并可以极大地提高索引速度.

bulk API

语法格式如下:

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

action包括:

index(索引)

索引要求下一行有数据, 该操作用于新增或替换document
create(创建)

创建要求下一行有数据, 该操作用于新增document, 如果document已存在则该操作失败
update(更新)

更新要求下一行有数据, 用于更新document
delete(删除)

删除时不需要下一行

注意: 最后一行数据必须以换行字符\n结束. 每个换行字符的前面可以有一个回车\r. 当向_bulk端点发送请求时, 应该将Content-Type头设置为application/x-ndjson

测试bulk

从官方下载

accounts.json

示例数据.

wget -O accounts.json https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true

这是一些随机生成的银行账户信息, 格式如下:

{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"amberduke@pyrami.com","city":"Brogan","state":"IL"}
...

使用以下
_bulk
请求将帐户数据索引到
bank
索引中

curl -H "Content-Type: application/x-ndjson" -XPOST "localhost:9200/bank/_bulk?pretty&refresh" --data-binary "@accounts.json"

在Kibana中查看索引目录
```
# 查看索引目录, ?v用于显示表头信息
GET _cat/indices?v
```
可以看到已经索引成功, 一共1000条数据.

简单查询

一旦将一些数据存入到Elasticsearch索引中, 我们就可以通过向
_search
端点发送请求来进行查询.

match_all

例如: 查询bank索引下的所有document, 按年龄, 编号降序排序

GET /bank/_search
{
  "query": {"match_all": {}},
  "sort": [
    {
      "age": {
        "order": "desc"
      },
      "account_number": {
        "order": "desc"
      }
    }
  ]
}

结果如下(默认只查询前10条):

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "bank",
        "_type" : "_doc",
        "_id" : "998",
        "_score" : null,
        "_source" : {
          "account_number" : 998,
          "balance" : 16869,
          "firstname" : "Letha",
          "lastname" : "Baker",
          "age" : 40,
          "gender" : "F",
          "address" : "206 Llama Court",
          "employer" : "Dognosis",
          "email" : "lethabaker@dognosis.com",
          "city" : "Dunlo",
          "state" : "WV"
        },
        "sort" : [
          40,
          998
        ]
      },
      ...
    ]
  }
}

响应数据包含的一些信息:

took
: Elasticsearch运行本次查询所消耗的时间(以毫秒为单位)
time_out
: 本次查询请求是否超时
_shards
: 表示本次查询搜索了多少分片, 并对多少碎片成功, 失败或跳过进行了细分
hits.total.value
: 一共找到多少匹配的文档
hits.max_score
: 文档相关性的最高得分(match_all时为1.0)
hits.hits._score
: 当前文档的相关性得分(match_all时为1.0)
hits.hits.sort
: 文档的排序位置

from/size

from: 查询的起始位置, 从0开始
size: 每次查询的数据量, 默认10

GET /bank/_search
{
  "query": {"match_all": {}},
  "from": 10,
  "size": 10
}

match

使用
match
可以进行条件查询. 使用match时, Elasticsearch会自动进行分词.

例如: 查询address字段包含”mill”或”lane”的客户, 19条

GET /bank/_search
{
  "query": { "match": { "address": "mill lane" } }
}

match_phrase

match_phrase
是短语匹配, 也即Elasticsearch不会进行分词.

例如: 查询address字段包含”mill lane”的客户, 1条

GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

bool

bool
用于构造复杂查询, 可以组合多个查询条件.

must
: 必须匹配
must_not
: 必须不匹配
should
: 最好(应该)匹配
filter
: 过滤, 筛选

布尔查询中的每个must, should和must_not元素都被称为查询子句.

must
和
should
子句匹配的程度, 决定了文档的相关性评分. 得分越高, 文档就越符合我们的搜索条件. 默认情况下, Elasticsearch将返回的文档按照相关性得分进行排序.

must_not
用于排除操作, 不会影响文档的相关性评分.

例如: 查询年龄是40, state不是”ID”, city最好是”Ferney”的客户

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": 30 } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ],
      "should": [
        { "match": { "city": "Ferney" } }
      ]
    }
  }
}

filter
用于按指定的条件筛选需要的文档

例如: 查询余额在2000~2500之间的客户

GET /bank/_search
{
  "query": {
    "bool": {
      "filter": [
        {"range": {
          "balance": {
            "gte": 2000,
            "lte": 2500
          }
        }}
      ]
    }
  }
}

小结

Elasticsearch使用非常简洁, 学无止境, 继续努力!

原文链接：https://blog.csdn.net/YaoRoy/article/details/105297661