Elasticsearch入门

1-Elasticsearch入门

2-初识ElasticSearch

2.1-基于数据库查询的问题

数据库查询存在的问题：

性能低：使用模糊查询，左边有通配符，不会走索引，会全表扫描，性能低
功能弱：
- 对于如下的数据如果以”华为手机“作为条件，查询不出来数据
```
select * from goods where title like '%华为手机%'
```
- 华为手机需要拆成华为和手机两个词然后分别查询
```
select * from goods where title like '%华为%' or title like '%手机%'
```
- 但是MySQL等关系型数据库并没有拆分词语的功能

Es通过倒排索引解决这些问题

，比如京东的商品信息就保存在ElasticSearch中，可以很快速的得到搜索结果

2.2-倒排索引

正向索引：由《静夜思》–>床前明月光—>“前”字

倒排索引(反向索引)

：将文档进行分词，形成词条和id的对应关系即为反向索引。

先对“床前明月光”–> 分词

将一段文本按照一定的规则，拆分为不同的词条（term）
所有的分词结果都记录对应的诗句内容

反向索引的实现就是对诗句进行分词，分成单个的词，由词推句，即为反向索引

2.3-ES存储和查询的原理

需要解决数据库查询存在的问题：

性能低：使用模糊查询，左边有通配符，不会走索引，会全表扫描，性能低
功能弱：对于如下的数据如果以”华为手机“作为条件，查询不出来数据

存储和查询原理：

存储

对存储数据中的title进行分词，记录每个词语和数据id的对应关系（倒排索引）
搜索：使用倒排索引，自定将对title进行分词（“华为”，“手机”），找到所有的匹配：1，2，3

使用“华为手机”作为关键字查询

2.4-ES概念详解

Lucene：是一套用于全文检索和搜寻的开源程式库，由Apache软件基金会支持和提供

ElasticSearch是一个基于Lucene的搜索服务器。

隐藏了Lucene的复杂性，对外提供Restful 接口来搜索

es和solr选择哪个？

1.如果你公司现在用的solr可以满足需求就不要换了。

2.如果你公司准备进行全文检索功能的开发，建议优先考虑es，因为像Github这样大规模的搜索都在用它。

介绍

一个分布式、高扩展、高实时的搜索与数据分析引擎
基于RESTful web接口：http请求进行增删查改
用Java语言开发的，并作为Apache许可条款下的开放源码发布，是一种流行的企业级搜索引擎

应用场景

搜索：海量数据的查询

1）用户在前端搜索关键字

2）项目前端通过http方式请求项目服务端

3）项目服务端通过Http RESTful方式请求ES集群进行搜索

4）ES集群从索引库检索数据
日志数据分析
实时数据分析

2.5 ES和MySQL的区别

•MySQL有事务性,而ElasticSearch没有事务性,所以你删了的数据是无法恢复的。

•ElasticSearch没有物理外键这个特性，,如果你的数据强一致性要求比较高,还是建议慎用

•ElasticSearch和MySql分工不同，MySQL负责

存储（增删改）数据

，ElasticSearch负责

搜索数据

。

MySQL同步数据到ES常用工具：

通过JavaAPI写入ES
logstash, es官方推荐的
canal, 阿里开源的

3-启动ElasticSearch

3.1-ES启动

查看elastic是否启动

ps -ef|grep elastic

启动ES

#switch user
su ithe  # 切换到ithe用户启动
#password=ithe

cd /opt/elasticsearch-7.4.0/bin
./elasticsearch #启动

访问地址：

192.168.52.128:9200

3.2-ES辅助工具启动

克隆远程连接会话，启动Kibana：

# 切换到kibana的bin目录
cd /opt/kibana-7.4.0-linux-x86_64/bin
# 启动
./kibana --allow-root

浏览器访问：http://192.168.52.128:5601/

4-核心概念(重点)

1 索引（index）

ElasticSearch存储数据的地方，可以理解成关系型数据库中的数据库概念。

2 类型（type）

一种type就像一类表。如用户表、角色表等。在Elasticsearch7.X默认type为_doc

\- ES 5.x中一个index可以有多种type。

\- ES 6.x中一个index只能有一种type。

\- ES 7.x以后，将逐步移除type这个概念，现在的操作已经不再使用，默认_doc

3 映射（mapping）

mapping定义了每个字段的类型、字段所使用的分词器等。相当于关系型数据库中的表结构。

4 文档（document）

Elasticsearch中的最小数据单元，常以json格式显示。一个document相当于MySQL数据库中的一行数据。

5 倒排索引

一个倒排索引由文档中所有不重复词的列表构成，对于其中每个词，对应一个包含它的文档id列表。

对比MySQL

在这里插入图片描述

5-脚本操作ES(重点)

5.1-复习RESTful风格

1.REST（Representational State Transfer），表述性状态转移，是一组架构约束条件和原则。满足这些约束条件和原则的应用程序或设计就是RESTful。就是一种定义接口的规范。

2.基于HTTP。

3.使用XML格式定义或JSON格式定义。

4.每一个URI代表一种资源。

5.客户端使用GET、POST、PUT、DELETE 4个表示操作方式的动词对服务端资源进行操作：

GET：用来获取资源（查询）

POST：用来新建资源（新增）

PUT：用来更新资源（修改）

DELETE：用来删除资源（删除）

5.2-操作索引

使用Kibana操作ES：http://192.168.52.128:5601/

kibana是操作ES的WEB客户端，相当于操作MySQL数据库的sqlyog

# 创建索引 
PUT person
# 查看索引
GET person
# 删除索引(同时会删除其所有数据,相当于mysql的drop database)
DELETE person
# 查询所有索引
GET _all

delete /c*   (通配符删除c 开头的索引)

5.3-ES数据类型

简单数据类型

字符串

text：会分词，不支持聚合
keyword：不会分词，将全部内容作为一个词条，支持聚合

数值：long.inteter,double等

在这里插入图片描述

布尔：boolean
二进制：binary

范围类型

integer_range, float_range, long_range, double_range, date_range

日期:date

复杂数据类型

数组：[ ] Nested:
nested
(for arrays of JSON objects 数组类型的JSON对象)
对象：{ } Object: object(for single JSON objects 单个JSON对象)

注意: 字段类型没有修改功能

5.4-操作映射

5.4.1 添加

# 删除索引(同时会删除其所有数据,相当于mysql的drop database)
DELETE person

# 创建索引 
PUT person

# 查看索引
GET person

# 添加映射(相当于添加表字段)
PUT /person/_mapping
{
    "properties":{
        "name":{
            "type":"text"
        },
        "age":{
            "type":"integer"
        }
    }
}

5.4.2 查看

# 仅查看映射(查看表结构)
GET person/_mapping
# 仅查看索引,会自动显示表结构(查看表结构)
GET person

5.4.3 索引+ 映射一起创建

# 创建索引并添加映射(相当于建立数据库时,(因为只有一张表type=_doc)同时制定表字段)
PUT /person
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      }
    }
  }
}

5.5-操作文档

5.5.1添加/更新文档

# 指定id，如果id=1数据不存在，则添加(insert)数据；否则是修改(update)
PUT /person/_doc/1
{
  "name":"张三",
  "age":18,
  "address":"北京海淀区"
}

# 添加文档，不指定id
POST /person/_doc/
{
  "name":"王五",
  "age":18,
  "address":"北京"
}

5.5.2查看文档(简单查看)

# 根据id 查看
GET /person1/_doc/1
# 查看所有(无条件查询)
GET /person1/_search

5.5.3删除

# 删除指定id文档
DELETE /person1/_doc/1

6-分词器

6.1分词器-介绍

在这里插入图片描述

6.2-ik分词器

中文分词器

•IKAnalyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包

•是一个基于Maven构建的项目

•具有60万字/秒的高速处理能力

•支持用户词典扩展定义

•下载地址：https://github.com/medcl/elasticsearch-analysis-ik/archive/v7.4.0.zip

6.3-ik分词器使用

IK分词器有两种分词模式：ik_max_word和ik_smart模式

1、

ik_max_word

# 方式一ik_max_word
# 会将文本做最细粒度的拆分，比如会将“乒乓球明年总冠军”拆分为“乒乓球、乒乓、球、明年、总冠军、冠军。
GET /_analyze
{
  "analyzer": "ik_max_word",
  "text": "乒乓球明年总冠军"
}

ik_max_word分词器执行如下：

{
  "tokens" : [
    {
      "token" : "乒乓球",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "乒乓",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "球",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 2
    },
    {
      "token" : "明年",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "总冠军",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "冠军",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 5
    }
  ]
}

2、

ik_smart

# 方式二ik_smart
# 会做最粗粒度的拆分，比如会将“乒乓球明年总冠军”拆分为乒乓球、明年、总冠军。
GET /_analyze
{
  "analyzer": "ik_smart",
  "text": "乒乓球明年总冠军"
}

ik_smart分词器执行如下：

{
  "tokens" : [
    {
      "token" : "乒乓球",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "明年",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "总冠军",
      "start_offset" : 5,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

由此可见：使用ik_smart可以将文本”text”: “乒乓球明年总冠军”分成了【乒乓球】【明年】【总冠军】

这样看的话，这样的分词效果更智能一些，达到了我们的要求。

6.4使用IK分词器-查询文档

6.4.1 准备测试数据

1.创建索引，添加映射，并指定分词器为ik分词器

# 如果有删除
DELETE person

# 添加映射_指定分词器(相当于添加表字段)
PUT person
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"   // keyword 类型 不会分词
      },
      "address": {
        "type": "text",  // text 类型 会分词, 但不能进行聚合查询(类似SQL group by/sum函数)
        "analyzer": "ik_max_word"
      }
    }
  }
}

GET person

2.添加文档

# 添加几条数据备用
# 指定id
POST /person/_doc/1
{
  "name":"张三",
  "age":18,
  "address":"北京海淀区"
}

POST /person/_doc/2
{
  "name":"李四",
  "age":18,
  "address":"北京朝阳区"
}

POST /person/_doc/3
{
  "name":"王五",
  "age":18,
  "address":"北京昌平区"
}

POST /person/_doc/4
{
  "name":"李雷",
  "age":18,
  "address":"华为5G手机"
}

3.查询映射数据

GET /person/_search

4.查看分词效果

GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "北京昌平"
}

6.4.2 term查询-关键词

词条查询：term，不会将查询条件拆分

GET /person/_search
{
  "query": {
    "term": {
      "address": {
        "value": "北京昌平"
      }
    }
  }
}

6.4.3 match查询-全文

全文查询：match

全文查询会分析查询条件，先将查询条件进行分词，然后查询，求并集

GET /person/_search
{
  "query": {
    "match": {
      "address": "北京昌平"
    }
  }
}
# 1.对查询条件“北京昌平”进行分词: 北京，昌平
# 2.根据分词结果逐个查询

词条查询：term

词条查询不会分析查询条件，只有当词条和查询字符串完全匹配时才匹配搜索
全文查询：match

全文查询会分析查询条件，先将查询条件进行分词，然后查询，求并集（or）

7-JavaAPI(重点)

7.1-SpringBoot整合ES

①搭建SpringBoot工程

②引入ElasticSearch相关坐标

<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.1.8.RELEASE</version>
    <relativePath/> <!-- lookup parent from repository -->
</parent>
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>

    <!--引入es的坐标-->
    <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>elasticsearch-rest-high-level-client</artifactId>
        <version>7.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.elasticsearch.client</groupId>
        <artifactId>elasticsearch-rest-client</artifactId>
        <version>7.4.0</version>
    </dependency>
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>7.4.0</version>
    </dependency>
    
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
    </dependency>

    <dependency>
        <groupId>com.alibaba</groupId>
        <artifactId>fastjson</artifactId>
        <version>1.2.4</version>
    </dependency>

</dependencies>

③测试

编写配置类ElasticSearchConfig

@Configuration
@ConfigurationProperties(prefix="elasticsearch")
public class ElasticSearchConfig {

    private String host;

    private int port;

    //添加get，set方法
    
    @Bean
    public RestHighLevelClient client(){
        return new RestHighLevelClient(RestClient.builder(
                new HttpHost(host,port,"http")
        ));
    }
}

配置es信息: resources\application.yml

elasticsearch:
  host: 192.168.52.128
  port: 9200

编写单元测试类：ElasticsearchDay01ApplicationTests

注意：使用@Autowired注入RestHighLevelClient 如果报红线，则是因为配置类所在的包和测试类所在的包，包名不一致造成的

@RunWith(SpringRunner.class)
@SpringBootTest(classes = ElasticsearchDemoApplication.class)
public class ElasticsearchDemoApplicationTests {

    @Autowired
    private RestHighLevelClient client;

    @Test
    public void contextLoads() {
        System.out.println(client);
    }
}

7.2-创建索引

1.添加索引

注意导包：org.elasticsearch.client.indices.CreateIndexRequest

/**
 * 添加索引
 */
@Test
public void addIndex() throws IOException {
    //1.使用client获取操作索引的对象
    IndicesClient indicesClient = client.indices();
    //2.具体操作，获取返回值
    CreateIndexRequest createRequest = new CreateIndexRequest("ithe");
    CreateIndexResponse response = indicesClient.create(createRequest,
            RequestOptions.DEFAULT);

    //3.根据返回值判断结果
    System.out.println(response.isAcknowledged());
}

2.添加索引，并添加映射

 /**
     * 添加索引，并添加映射
     */
    @Test
    public void addIndexAndMapping() throws IOException {
       //1.使用client获取操作索引对象
        IndicesClient indices = client.indices();
        //2.具体操作获取返回值
        //2.具体操作，获取返回值
        CreateIndexRequest createIndexRequest = new CreateIndexRequest("test");
        //2.1 设置mappings
        String mapping = "{\n" +
                "      \"properties\" : {\n" +
                "        \"address\" : {\n" +
                "          \"type\" : \"text\",\n" +
                "          \"analyzer\" : \"ik_max_word\"\n" +
                "        },\n" +
                "        \"age\" : {\n" +
                "          \"type\" : \"long\"\n" +
                "        },\n" +
                "        \"name\" : {\n" +
                "          \"type\" : \"keyword\"\n" +
                "        }\n" +
                "      }\n" +
                "    }";
        createIndexRequest.mapping(mapping,XContentType.JSON);

        CreateIndexResponse createIndexResponse = indices.create(createIndexRequest, RequestOptions.DEFAULT);
        //3.根据返回值判断结果
        System.out.println(createIndexResponse.isAcknowledged());
    }

7.3-查询、删除、判断索引

查询索引

    /**
     * 查询索引
     */
    @Test
    public void queryIndex() throws IOException {
        IndicesClient indices = client.indices();

        GetIndexRequest getRequest=new GetIndexRequest("test");
        GetIndexResponse response = indices.get(getRequest, RequestOptions.DEFAULT);
        Map<String, MappingMetaData> mappings = response.getMappings();
        //iter 提示foreach
        for (String key : mappings.keySet()) {
            System.out.println(key+"==="+mappings.get(key).getSourceAsMap());
        }
    }

删除索引

 /**
     * 删除索引
     */
    @Test
    public void deleteIndex() throws IOException {
         IndicesClient indices = client.indices();
        DeleteIndexRequest deleteRequest=new DeleteIndexRequest("ithe");
        AcknowledgedResponse delete = indices.delete(deleteRequest, RequestOptions.DEFAULT);
        System.out.println(delete.isAcknowledged());

    }

索引是否存在

 /**
     * 索引是否存在
     */
    @Test
    public void existIndex() throws IOException {
        IndicesClient indices = client.indices();

        GetIndexRequest getIndexRequest=new GetIndexRequest("ithe");
        boolean exists = indices.exists(getIndexRequest, RequestOptions.DEFAULT);

        System.out.println(exists);
    }

7.4-添加文档

1.添加文档,使用map作为数据

/**
 * 添加文档,使用map作为数据
 */
@Test
public void addDoc() throws IOException {
    //数据对象，map
    Map data = new HashMap();
    data.put("address", "北京昌平");
    data.put("name", "大胖");
    data.put("age", 20);

    //1.获取操作文档的对象
    IndexRequest request = new IndexRequest("test").id("1").source(data);
    //添加数据，获取结果
    IndexResponse response = client.index(request, RequestOptions.DEFAULT);

    //打印响应结果
    System.out.println(response.getId());
}

2.添加文档,使用对象作为数据

<!--fastjson依赖-->
<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.4</version>
</dependency>

//将对象转为json
String data = JSON.toJSONString(p);

/**
 * 添加文档,使用对象作为数据
 */
@Test
public void addDoc2() throws IOException {
    //数据对象，javaObject
    Person p = new Person();
    p.setId("2");
    p.setName("小胖2222");
    p.setAge(30);
    p.setAddress("陕西西安");

    //将对象转为json
    String data = JSON.toJSONString(p);

    //1.获取操作文档的对象
    IndexRequest request = new IndexRequest("test").id(p.getId()).source(data,
            XContentType.JSON);
    //添加数据，获取结果
    IndexResponse response = client.index(request, RequestOptions.DEFAULT);

    //打印响应结果
    System.out.println(response.getId());
}

7.5-修改、查询、删除文档

1.修改文档：添加文档时，如果id存在则修改，id不存在则添加

/**
     * 修改文档：添加文档时，如果id存在则修改，id不存在则添加
     */
@Test
public void UpdateDoc() throws IOException {
    Person person=new Person();
    person.setId("2");
    person.setName("李四");
    person.setAge(20);
    person.setAddress("北京三环车王");

    String data = JSON.toJSONString(person);

    IndexRequest request=new IndexRequest("test").id(person.getId()).source(data,XContentType.JSON);
    IndexResponse response = client.index(request, RequestOptions.DEFAULT);
    System.out.println(response.getId());
}

]

2.根据id查询文档

/**
   * 根据id查询文档
   */
@Test
public void getDoc() throws IOException {

    //设置查询的索引、文档
    GetRequest indexRequest=new GetRequest("test","2");

    GetResponse response = client.get(indexRequest, RequestOptions.DEFAULT);
    System.out.println(response.getSourceAsString());
}

3.根据id删除文档

/**
     * 根据id删除文档
     */
    @Test
    public void delDoc() throws IOException {

        //设置要删除的索引、文档
        DeleteRequest deleteRequest=new DeleteRequest("test","1");

        DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);
        System.out.println(response.getId());
    }

新增或修改：

IndexRequest request = new IndexRequest("index_name").id("")
client.index()

查询

GetRequest request = new GetRequest("index_name").id("")
client.get()

删除

DeleteRequest request = new DeleteRequest("index_name").id("")
client.delete()

原文链接：https://blog.csdn.net/qq_45181415/article/details/115186993

1-Elasticsearch入门

2-初识ElasticSearch

2.1-基于数据库查询的问题

2.2-倒排索引

2.3-ES存储和查询的原理

2.4-ES概念详解

2.5 ES和MySQL的区别

3-启动ElasticSearch

3.1-ES启动

3.2-ES辅助工具启动

4-核心概念(重点)

1 索引（index）

2 类型（type）

3 映射（mapping）

4 文档（document）

5 倒排索引

对比MySQL

5-脚本操作ES(重点)

5.1-复习RESTful风格

5.2-操作索引

5.3-ES数据类型

5.4-操作映射

5.4.1 添加

5.4.2 查看

5.4.3 索引+ 映射一起创建

5.5-操作文档

5.5.1添加/更新文档

5.5.2查看文档(简单查看)

5.5.3删除

6-分词器

6.1分词器-介绍

6.2-ik分词器

6.3-ik分词器使用

6.4使用IK分词器-查询文档

6.4.1 准备测试数据

6.4.2 term查询-关键词

6.4.3 match查询-全文

7-JavaAPI(重点)

7.1-SpringBoot整合ES

7.2-创建索引

7.3-查询、删除、判断索引

7.4-添加文档

7.5-修改、查询、删除文档

你可能也喜欢