一、概述

1.1 了解

ElasticSearch英语直译是弹性搜索，又叫官方分布式搜索和分析引擎。是一款基于Lucene的开源搜索引擎。

无论在开源还是专有领域，Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库。

Lucene只是一个库。想要发挥其强大的作用，你需使用Java并要将其集成到你的应用中。Lucene非常复杂，你需要深入的了解检索相关知识来理解它是如何工作的。

ElasticSearch也是使用Java编写并使用Lucene来建立索引并实现搜索功能，但是它的目的是通过简单连贯的RESTful API让全文搜索变得简单并隐藏Lucene的复杂性。

ElasticSearch使用案例

Github
cctv
维基百科
百度

ElasticSearch与Solr对比

Solr利用Zookeeper进行分布式管理；ElasticSearch自身带有分布式协调管理功能
Solr支持更多格式的数据；ElasticSearch只支持json格式
Solr自身提供的功能多；ElasticSearch自身注重于核心功能，高级功能一般由第三方提供
Solr传统搜索应用(已有数据搜索)效率高于ElasticSearch；ElasticSearch实时搜索效率高于Solr

不同场景选用不同

1.2 入门

安装

去官网下载，解压后执行bin下面的elasticsearch文件。Linux选sh，windows选bat

启动后，会开启9300和9200端口，访问9200，如下图

图形界面插件

elasticsearch可以通过图形界面来管理。

npm使用

步骤

下载elasticsearch-head插件

因为是用js开发的，所以需要运行在nodejs上，安装nodejs

node -v

进入head下执行命令，开启服务器

1	grunt server

如果没有grunt，那么就需要全局安装了，grunt是基于nodejs的项目构建工具

1	npm install -g grunt-cli

直接运行还是不行，因为还没有导入依赖，执行下面的命令，会自动下载package.json的依赖

1	npm install

此时再执行grunt server开启服务器就ok了。

开启之后点击连接还是不行。

原因是跨域了，需要在elasticsearch下设置允许跨域

config/elasticsearch.yml，增加以下两句，重启elasticsearch服务

1 2	http.cors.enabled: true http.cors.allow-origin: "*"

如果不想用允许跨域，可以将head安装到elasticsearch的插件里

大功告成

总结

下载elasticsearch-head插件
部署好node与grunt，以及下载好head需要的依赖
启动服务
开启跨域
访问

核心概念

elasticsearch是面向文档的，这意味着它可以存储整个对象或者文档，还会索引每个文档的内容使之可以被搜索。

在elasticsearch中，可以对文档进行索引、搜索、排序、过滤。

对比

传统数据库
1. Databases：数据库
2. Tables：表
3. Rows：行
4. Columns：列
elasticsearch
1. Indexs：索引库
2. Types：表
3. Documents：文档
4. Fields：字段

核心概念

index：索引库
type：表
field：字段、域
mapping：映射，比如某个字段的类型、默认值、分析器、是否被索引（是否可以通过生成的索引查询）
document：文档
nrt：接近实时。从索引一个文档到能够被搜索到，有一个轻微的延迟，通常是1秒以内，接近实时了。
cluster：集群。一个集群有若干个节点
node：节点。一个节点就是一台服务器
shards：分片，一个索引可以存储超出单个节点硬件限制的大量数据，将一个索引分成多份的能力就叫分片
replicas：备份，每个节点都会有宕机的情况，每个节点都会有个备份节点。

本机创建一个索引，分5片，每片都建立1份备份。

因为是单机的，所以没有备份

对elasticsearch进行管理，一般都是通过http发送json数据来进行控制。他有自带的复合查询的功能，但是我们一般使用postman，比较方便。

直接使用chrome扩展即可。如图

通过postman代替ElasticSearch提供的http查询功能

二、管理索引库

2.0 查询与删除

1 2	// GET http://localhost:9200/索引名称 // DELETE http://localhost:9200/索引名称

PUT：添加

DELETE：删除

GET：查询

POST：查询

2.1 创建索引库并添加mappings

模拟新建索引的流程，进行抓包，我们可以看到是个put请求，这就是RESTful API的规范了。如下内容

我们可以只创建索引，我们也可以创建索引时，顺便创建mappings

ElasticSearch7.4以前的mappings，它是可以指定type名称的，参考文章

// PUT http://localhost:9200/索引名称
{
	"settings": {
		"number_of_shards": 5,
		"number_of_replicas": 1
	},
	"mappings": {
		"article": { // type 可以指定type名称
			"properties": { // document
				"id": { // field
					"type": "long",
					"store": true
				},
				"title": {
					"type": "text",
					"store": true,
					"index": true,
					"analyzer": "standard"
				},
				"content": {
					"type": "text",
					"store": true,
					"index": true,
					"analyzer": "standard"
				}
			}
		}
	}
}

像我用的ElasticSearch7.12，默认不能指定type类型名称，统一使用_doc

// PUT http://localhost:9200/索引名称
{
	"settings": {
		"number_of_shards": 5,
		"number_of_replicas": 1
	},
	"mappings": {
		"properties": { // document
			"id": { // field
				"type": "long",
				"store": true
			},
			"title": {
				"type": "text",
				"store": true,
				"index": true,
				"analyzer": "standard"
			},
			"content": {
				"type": "text",
				"store": true,
				"index": true,
				"analyzer": "standard"
			}
		}
	}
}

如果7.x版本想要用自定义的type名称，可以在将url改成如下

1 2	// PUT http://localhost:9200/索引名称?include_type_name=true // 不过上面这种方式，后续也会被弃用的，所以就用默认值就好了

2.2 对已创建索引添加mappings

对于没有mappings的索引或者想要修改索引的mappings

// PUT http://localhost:9200/索引名称/_mappings
{
	"properties": {
		"content": {
			"type": "text",
			"store": true,
			"analyzer": "standard"
		},
		"id": {
			"type": "long",
			"store": true
		},
		"title": {
			"type": "text",
			"store": true,
			"analyzer": "standard"
		}
	}
}

2.3 创建、更新、查询文档

向索引添加（更新）文档，如果已经有文档了，就会执行更新，否则就会创建

// 创建更新 PUT http://localhost:9200/aa/_doc/1
{
	"id":1,
	"title":"嘎嘎",
	"content":"哟西"
}
// 查询 GET http://localhost:9200/aa/_doc/1

注意
一个是_id，一个是id，使用时，让他们保持一致即可。
如果状态栏里不指定id，那么_id就会变成随机串

2.4 根据关键字查询

本例中，使用的标准分析器，所以中文需要一个一个查才行

// POST http://localhost:9200/aa/_doc/_search
{
    "query":{
        "term":{
            "content":"西"
        }
    }
}

2.5 QueryString查询

先进行分词再进行查询，本质就是Lucene的QueryParser

// POST http://localhost:9200/aa/_doc/_search
{
	"query":{
		"query_string":{
			"default_field":"content",
			"query":"么西么西"
		}
	}
}

2.6 head自带基本查询

must：单条件时必须满足；多条件时相当于and

must_not：必须不满足，相当于取反

should：单条件时应该满足；多条件时相当于or

match_all：查询全部

_doc.id：指定type为_doc下的id字段

term：根据关键词查询

query_string：带分析的查询，可以将查询内容进行分析后再查询

range：范围查询

fuzzy：模糊查询

wildcard：通配符查询

prefix：前缀查询

2.7 在ElasticSearch中查看分析器分词效果

查看标准分析器分词效果

// POST http://localhost:9200/_analyze
{
	"analyzer":"standard",
	"text":"代码改变世界,hello world"
}

2.8 ElasticSearch集成IK分析器

下载elasticsearch-ikanalyzer

解压内容到elasticsearch的plugins文件夹下即可，插件名称任意取，取ik-analyzer比较直观

分词有两种模式

ik_smart
ik_max_word

// POST http://localhost:9200/_analyze
{
	"analyzer":"ik_smart",
	"text":"代码改变世界,hello world"
}

比较ik_smart与ik_max_word的分词效果

三、ElasticSearch集群

3.1 集群相关概念

集群cluster

一个集群cluster，就是由多个节点组织在一起，它们共同持有整个的数据，并一起提供索引和搜索功能。

一个集群由一个唯一的名字标识，这个名字默认就是elasticsearch。一个节点只能通过指定某个集群的名字，来加入这个集群。

节点node

一个节点是集群中的一个服务器，作为集群的一部分，它存储数据，参与集群的索引和搜索功能。

和集群类似，一个节点也是由一个名字来标识的，默认情况下，这个名字是一个随机的漫威漫画角色的名字，这个名字会在启动的时候赋予节点。

分片shards和复制replicas

一个索引可以存储超出单个节点硬件限制的大量数据，将一个索引分成多份的能力就叫分片。

每个节点都会有宕机的情况，每个节点都会有个备份节点。在集群中，备份节点不跟原节点分配在一台服务器上，防止服务器宕机不会导致数据丢失。

分片和复制可以看这个图理解

3.2 搭建集群

复制三份ElasticSearch，然后修改elasticsearch.yml配置文件，添加或者修改

# 节点1的配置信息
# 集群名称 同一集群保证唯一
cluster.name: ElasticSearch
# 节点名称
node.name: node-2
network.host: 127.0.0.1
# 同一服务器下，端口号不能一样
# 服务端口号
http.port: 9202
# 集群间通信端口号
transport.tcp.port: 9302
# 设置集群自动发现机器ip集合
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300","127.0.0.1:9301","127.0.0.1:9302"]

在集群中，head只需要连接一个集群中的ip地址即可。

在集群中，分片的备份存储在不同节点上，方框比较宽的是主分片，窄的是从分片，可以参照下图

四、ElasticSearch编程操作

操作es的方式有两种，restful api和java api，目前java api正逐渐被弃用。本文仅学习了解。已经将java包从现有版本7.12.1降级到了7.0.0

4.1 代码实现管理es

创建索引库

步骤

创建maven工程，导入依赖
编写测试方法创建索引库
1. 创建一个settings对象，相当于是一个配置信息，主要配置集群的名称。
2. 创建一个client对象，并创建索引库
3. 关闭client对象

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>top.meethigher</groupId>
    <artifactId>es-first</artifactId>
    <version>1.0</version>


    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch -->
        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch</artifactId>
            <version>7.12.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.elasticsearch.client/transport -->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>7.12.1</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-api -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.30</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-simple -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-simple</artifactId>
            <version>1.7.30</version>
            <scope>test</scope>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.logging.log4j/log4j-to-slf4j -->
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-to-slf4j</artifactId>
            <version>2.14.1</version>
        </dependency>
    </dependencies>
</project>

代码

public class ElasticSearchClientTest {
    @Test
    public void createIndex() throws UnknownHostException {
        //1. 创建一个settings对象，相当于是一个配置信息，主要配置集群的名称。
        Settings settings = Settings.builder()
                .put("cluster.name", "es")//集群名称
                .build();
        //2. 创建一个client对象
        PreBuiltTransportClient client = new PreBuiltTransportClient(settings);
        //防止挂掉
        client.addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"),9301));
        client.addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"),9302));
        client.addTransportAddress(new TransportAddress(InetAddress.getByName("127.0.0.1"),9302));
        //创建索引库
        client.admin().indices().prepareCreate("index-hello").get();
        //3. 关闭client对象
        client.close();
    }
}

五、Spring Data ElasticSearch

5.1 简介

Spring Data

Spring Data是一个用于简化数据库访问，并支持云服务的开源框架。

其主要目标是使得对数据的访问变得方便快捷，并支持mapreduce框架和云计算数据服务。

环境搭建

管理索引库

创建一个Entity对象：添加注解进行标注，映射到一个文档上
创建一个Dao，是一个接口，需要继承ElasticSearchRepository接口
编写测试代码

es各种版本问题，在百度搜索过程中耗费大量时间，未完成
感觉学习这个没太大营养，然后各种版本问题，本文放弃了。