ElasticSearch 分布式全文检索

文章目录

ElasticSearch 分布式全文检索一、ElasticSearch 概述二、ElasticSearch 安装

1. Windows 下使用2. 安装 Kibana3. IK 分词器插件三、Rest 风格操作

1. 索引的命令操作2. 文档的命令操作四、集成SpringBoot

1. 项目继承过程2. 索引的API操作2. 文档的API操作五、京东搜索（实战）

一、ElasticSearch 概述

谁在使用：

ElasticSearch 和 Solr 对比：

ElasticSearch 简介：

Solr 简介：

对比：

总结：

ES 核心概念：

ElasticSearch 本身就是一个集群：

核心概念总结：

索引字段类型（mapping）文档（documents）二、ElasticSearch 安装

ElasticSearch 官网

ElasticSearch 官方下载地址

1. Windows 下使用

bin  # 启动文件
config # 配置文件
	log4j2	日志配置文件
	jvm.options java 虚拟机相关的配置
	elasticsearch.yml	elasticsearch 的配置文件！默认 9200 端口！
lib	# 相关jar 包
logs # 日志
modules # 功能模块
plugins # 插件

启动 ElasticSearch 服务，双击该文件

测试访问 localhost:9200

安装 ElasticSearch 可视化界面 head 插件
下载地址

安装 head 插件需要 node.js 环境，所以先安装 node.js
Node.js 安装详解

安装node.js 后再 head 根目录下执行：cnpm install 命令

运行 head ：npm run start

测试访问 localhost:9100，发现无法连接上我们的 localhost:9200(ElasticSearch 服务)，需解决跨域问题

修改 ElasticSearch 里的配置

重启 ElasticSearch 发现可以连接了

2. 安装 Kibana

什么是Kibana？

Kibana 官网

Kibana 下载地址
注意：Kibana 版本需和 ElasticSearch 版本一致

下载后进行解压：

双击bin 目录下的 kibana.bat 文件即可启动（同样需要node.js 的环境）：

访问测试：http://localhost:5601

我们之后的所有操作都从这里编写：

我们也可以设置成中文的（默认是英文的），编辑此文件

添加这行配置即可：

测试重启 kibana，发现切换成中文了！！！

3. IK 分词器插件

什么是 IK 分词器？

IK 分词器下载地址

解压发送到我们的 ElasticSearch 的插件目录下即可：

重启 ES ,可以看到 IK 可以被加载了

使用 Kibana 进行测试：

发现我们输入的词被自动拆分了，那么我们如何添加自定义的分词呢 ? 编辑添加的 ik 插件下的文件

比如我们新建文件 myself.dic

重启 ES 测试效果：

三、Rest 风格操作

Rest 风格说明：

1. 索引的命令操作

创建一个索引：

PUT /索引（可以理解为数据库名）/类型名（未来没有这个了）/文档id
{请求体}

在访问我们的head，发现多了个 test1，表示索引 test1 创建成功

点击数据浏览，查看test1，发现多了我们创建的这条数据了

那么 name 这个字段用不用指定类型呢，毕竟在我们关系型数据库是要指定类型的。
ElasticSearch 的数据类型有：

新建索引规则：

获取索引命令：GET [ 索引 ]

创建索引中的默认类型：PUT [索引名称]type名称（可以不写，默认_doc）[][文档名称]

同时可以看出，如果自己的文档字段没有指定类型，那么es会给我们默认配置字段类型！！

通过 GET /_cat/indices?v 可以看到所有索引的详细信息

如何修改数据呢？

删除索引；DELETE [ 索引 ]

2. 文档的命令操作

添加一条数据：这里的索引若没有则新建

创建第二条数据：如果id 还会 1 就会对已有 id = 1 的那条数据进行更新

获取数据：GET /索引/TYPE/文档

根性数据，除了上述说的用PUT 的可以更新外，最好用 PST _update 进行更新：只会更新传输的字段，没有传输的字段保持原值。

搜索：

根据 id 搜索

GET test1/user/1

我们的查询结果中有一个分数栏，表示匹配度，匹配度越高，分数越高。

6.查询结构解析：

查询出来的结果，只显示我们想要的字段信息：

排序查询

分页查询（数据下标还是从0开始的，和学的所有数据结构是一样的！）

布尔值查询：must（and），所有的条件都要符合 where id = 1 and name = xxx

布尔值查询：should（or），所有的条件都要符合 where id = 1 or name = xxx

布尔值查询：must_not

过滤器

-	gt		大于
-	gte		大于等于
-	lt		小于
-	lte	小于等于

一个字段，多个条件查询

精确查找

总结：term 只支持精确查询，match 模糊查询（分词），match + keyword 类型支持精确查询

精确查询多个值：

高亮查询

高亮查询使用我们自定义标签

四、集成SpringBoot 1. 项目继承过程

创建 maven 项目 elasticsearch-study 项目添加 pom 依赖：



    4.0.0
    
        org.springframework.boot
        spring-boot-starter-parent
        2.6.2
         
    
    com.study
    elasticsearch-study
    0.0.1-SNAPSHOT
    elasticsearch-study
    Demo project for Spring Boot
    
        1.8
    
    
        
            org.springframework.boot
            spring-boot-starter-data-elasticsearch
        
        
            org.springframework.boot
            spring-boot-starter-web
        

        
            org.springframework.boot
            spring-boot-devtools
            runtime
            true
        
        
            org.springframework.boot
            spring-boot-configuration-processor
            true
        
        
            org.projectlombok
            lombok
            true
        
        
            org.springframework.boot
            spring-boot-starter-test
            test
        
        
            com.alibaba
            fastjson
            1.2.72
        
    

    
        
            
                org.springframework.boot
                spring-boot-maven-plugin
                
                    
                        
                            org.projectlombok
                            lombok

主启动类

package com.study.elasticsearchstudy;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class ElasticsearchStudyApplication {

    public static void main(String[] args) {
        SpringApplication.run(ElasticsearchStudyApplication.class, args);
    }

}

添加 RestHighLevelClient 配置类（配置连接信息）

package com.study.elasticsearchstudy.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){

        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")
                )
        );
        return client;
    }

}

编写测试类：创建索引 kuang_index

package com.study.elasticsearchstudy;

import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;

@SpringBootTest
class ElasticsearchStudyApplicationTests {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Test
    void contextLoads() throws IOException {

        CreateIndexRequest request = new CreateIndexRequest("kuang_index");

        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);

        System.out.println(createIndexResponse);
    }

}

6.测试结果：控制台打印输出和 head 页面出现新建的索引

2. 索引的API操作

创建索引

@Configuration
public class ElasticSearchConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){

        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")
                )
        );
        return client;
    }
}

@SpringBootTest
class ElasticsearchStudyApplicationTests {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Test
    void contextLoads() throws IOException {

        CreateIndexRequest request = new CreateIndexRequest("kuang_index");

        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);

        System.out.println(createIndexResponse);
    }

}

获取索引

	
    @Test
    public void testExisIndex() throws IOException {

        GetIndexRequest request = new GetIndexRequest("kuang_index");

        boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);

        System.out.println(exists);
    }

删除索引

	
    @Test
    public void testDeleteIndex() throws IOException{
        DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("test13");

        AcknowledgedResponse delete = restHighLevelClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
        //delete.isAcknowledged() 为 true 为删除成功
        System.out.println(delete.isAcknowledged());
    }

2. 文档的API操作

添加实体类：User

package com.study.elasticsearchstudy.pojo;

import lombok.AllArgsConstructor;
import lombok.Data;

@Data
@AllArgsConstructor
public class User {
    private String name;
    private int age;

}

添加文档

    @Test
    public void testAdddocument() throws IOException {
        User user = new User("狂神",3);
        IndexRequest request = new IndexRequest("kuang_index");
        request.id("1");
        request.timeout(Timevalue.timevalueSeconds(1));

        // 将我们的数据放入请求 json
        request.source(JSON.toJSONString(user), XContentType.JSON);

        // 客户端发送请求，获取相应结果
        IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);

        System.out.println(index.toString());
        System.out.println(index.status());

    }

判断文档是否存在

    @Test
    public void testIsExists() throws IOException {
        GetRequest getRequest = new GetRequest("kuang_index", "1");
        // 不获取返回的 _source 的上下文
        getRequest.fetchSourceContext(new FetchSourceContext(false));
        //不做排序
        getRequest.storedFields("_none_");

        boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);

        System.out.println(exists);

    }

获取文档信息

    @Test
    public void testGetdocument() throws IOException {
        GetRequest getRequest = new GetRequest("kuang_index", "1");
        GetResponse getResponse = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);
        System.out.println(getResponse.getSourceAsString());//打印文档的内容
        System.out.println(getRequest);

    }

更新文档

 
    @Test
    public void testUpdatedocument(){
        UpdateRequest updateRequest = new UpdateRequest("kuang_index", "1");
        updateRequest.timeout("1s");

        User user = new User("狂神说Java", 18);
        updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

        UpdateResponse update = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
        System.out.println(update.status());
    }

删除文档

    @Test
    void testDeleteRequest(){
        DeleteRequest request = new DeleteRequest("kuang_index", "1");
        request.timeout("1s");

        DeleteResponse delete = restHighLevelClient.delete(request, RequestOptions.DEFAULT);
        System.out.println(delete.status());

    }

批量导入数据


    @Test
    void testBulkRequest() throws IOException {
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("10s");

        ArrayList users = new ArrayList<>();
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));
        users.add(new User("kaungsheng",3));

        for (int i = 0 ; i < users.size();i++) {
            // 批量更新和批量删除，就在这里修改对应的请求就可以了
            bulkRequest.add(new IndexRequest("kuang_index")
                    .id(""+(i+1))//不设置id，会自动给一个随机的id，不会重复
                    .source(JSON.toJSONString(users.get(i)),XContentType.JSON));

        }
        BulkResponse bulkItemResponses = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);

        //是否失败 false=成功/true=失败
        System.out.println(bulkItemResponses.hasFailures());

    }

条件查询

    
    @Test
    void testSearch() throws IOException {
        SearchRequest searchRequest = new SearchRequest("kuang_index");
        // 构建搜索条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // 查询条件，我们可以使用 QueryBuilders 工具来实现
        // QueryBuilders.termQuery 精确
        // QueryBuilders.matchAllQuery() 匹配所有
        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "kaungsheng");
//        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
        searchSourceBuilder.query(termQueryBuilder);
        searchSourceBuilder.timeout(new Timevalue(60, TimeUnit.SECONDS));

        searchRequest.source(searchSourceBuilder);

        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
        System.out.println(JSON.toJSONString(search.getHits()));
        System.out.println("====================================");
        for (SearchHit documentFields : search.getHits().getHits()) {
            System.out.println(documentFields.getSourceAsMap());
        }
    }

五、京东搜索（实战）

创建 maven 项目 elasticsearch.jdpom


        
        
            org.jsoup
            jsoup
            1.10.2
        
        
            org.springframework.boot
            spring-boot-starter-data-elasticsearch
        
        
            org.springframework.boot
            spring-boot-starter-thymeleaf
        
        
            org.springframework.boot
            spring-boot-starter-web
        

        
            org.springframework.boot
            spring-boot-devtools
            runtime
            true
        
        
            org.springframework.boot
            spring-boot-configuration-processor
            true
        
        
            org.projectlombok
            lombok
            true
        
        
            org.springframework.boot
            spring-boot-starter-test
            test
        
        
            com.alibaba
            fastjson
            1.2.76
            compile

properties

server.port=9090
#关闭 thymeleaf 缓存
spring.thymeleaf.cache=false

主启动类

package com.study.elasticsearchjd;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class ElasticsearchJdApplication {

    public static void main(String[] args) {
        SpringApplication.run(ElasticsearchJdApplication.class, args);
    }
}

配置类

package com.study.elasticsearchjd.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ElasticSearchClientConfig {

    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")));
        return client;
    }
}

pojo

package com.study.elasticsearchjd.pojo;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@NoArgsConstructor
@AllArgsConstructor
public class Content {
    private String title;
    private String img;
    private String price;
}

工具类（用来爬取网站页面数据）

package com.study.elasticsearchjd.utils;

import com.study.elasticsearchjd.pojo.Content;
import org.jsoup.Jsoup;
import org.jsoup.nodes.document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

import java.io.IOException;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

public class HtmlParseUtil {

    public static void main(String[] args) throws IOException {
        HtmlParseUtil.parseJD("java").forEach(System.out::println);
    }
    public static List parseJD(String keyword) throws IOException {
        /// 使用前需要联网
        // 请求url
        String url = "http://search.jd.com/search?keyword=" + keyword;
        // 1.解析网页(jsoup 解析返回的对象是浏览器document对象)
        document document = Jsoup.parse(new URL(url), 30000);
        // 使用document可以使用在js对document的所有操作
        // 2.获取元素（通过id）
        Element j_goodsList = document.getElementById("J_goodsList");
        // 3.获取J_goodsList ul 每一个 li
        Elements lis = j_goodsList.getElementsByTag("li");
//        System.out.println(lis);
        // 4.获取li下的 img、price、name
        // list存储所有li下的内容
        List contents = new ArrayList();
        for (Element li : lis) {
            // 由于网站图片使用懒加载，将src属性替换为data-lazy-img
            String img = li.getElementsByTag("img").eq(0).attr("data-lazy-img");// 获取li下 第一张图片
            String name = li.getElementsByClass("p-name").eq(0).text();
            String price = li.getElementsByClass("p-price").eq(0).text();
            // 封装为对象
            Content content = new Content(name,img,price);
            // 添加到list中
            contents.add(content);
        }
//        System.out.println(contents);
        // 5.返回 list
        return contents;
    }

}

业务类

package com.study.elasticsearchjd.service;

import com.alibaba.fastjson.JSON;
import com.study.elasticsearchjd.pojo.Content;
import com.study.elasticsearchjd.utils.HtmlParseUtil;
import org.apache.lucene.util.QueryBuilder;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.core.Timevalue;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.query.TermQueryBuilder;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

//业务编写
@Service
public class ContentService {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    
    public Boolean parseContent(String keywords) throws IOException {
        List contents = HtmlParseUtil.parseJD(keywords);
        //把查询到的数据放入es中
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("2m");

        for (int i = 0; i < contents.size(); i++) {
            bulkRequest.add(
                    new IndexRequest("jd_good").source(JSON.toJSONString(contents.get(i)), XContentType.JSON));
        }
        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulk.hasFailures();
    }

    
    public List> searchPage(String keyword,int pageNo,int pageSize) throws IOException {
        if (pageNo<=1){
            pageNo = 1;
        }
        //条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_good");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //分页
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
        searchSourceBuilder.query(termQueryBuilder);
        searchSourceBuilder.timeout(new Timevalue(60, TimeUnit.SECONDS));

        //执行搜索
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        //解析结果
        ArrayList> list = new ArrayList<>();
        for (SearchHit hit : searchResponse.getHits().getHits()) {
            list.add(hit.getSourceAsMap());
        }

        return list;
    }

	
    public List> searchPageForHighLight(String keyword,int pageNo,int pageSize) throws IOException {
        if (pageNo<=1){
            pageNo = 1;
        }
        //条件搜索
        SearchRequest searchRequest = new SearchRequest("jd_good");
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        //分页
        searchSourceBuilder.from(pageNo);
        searchSourceBuilder.size(pageSize);

        TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
        searchSourceBuilder.query(termQueryBuilder);
        searchSourceBuilder.timeout(new Timevalue(60, TimeUnit.SECONDS));

        //高亮
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field("title");
        highlightBuilder.requireFieldMatch(false);//关闭多个高亮显示功能，即若出现多个词匹配了，只显示一个
        highlightBuilder.preTags("");
        highlightBuilder.postTags("");
        searchSourceBuilder.highlighter(highlightBuilder);

        //执行搜索
        searchRequest.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        //解析结果
        ArrayList> list = new ArrayList<>();
        for (SearchHit hit : searchResponse.getHits().getHits()) {
            Map highlightFields = hit.getHighlightFields();
            HighlightField title = highlightFields.get("title");
            Map sourceAsMap = hit.getSourceAsMap();//原来的结果
            // 解析高亮的字段，将原来的字段换为我们高亮的字段即可!
            if (title != null){
                Text[] fragments = title.fragments();
                String n_title = "";
                for (Text text : fragments) {
                    n_title += text;
                }
                sourceAsMap.put("title",n_title);
            }

            list.add(hit.getSourceAsMap());
        }

        return list;
    }



}

controller

package com.study.elasticsearchjd.controller;

import com.study.elasticsearchjd.pojo.Content;
import com.study.elasticsearchjd.service.ContentService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.PathVariable;
import org.springframework.web.bind.annotation.RestController;

import java.io.IOException;
import java.util.List;
import java.util.Map;

@RestController
public class ContentController {

    @Autowired
    private ContentService contentService;

    @GetMapping("/parse/{keyword}")
    public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {
        return contentService.parseContent(keyword);
    }

    @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
    public List> search(@PathVariable("keyword") String keyword,
                                           @PathVariable("pageNo") int pageNo,
                                           @PathVariable("pageSize") int pageSize
                                           ) throws IOException {
        return contentService.searchPage(keyword,pageNo,pageSize);
    }
}

vue解析高亮字段：

效果如下：即实现了高亮

ElasticSearch 分布式全文检索

大数据系统相关栏目本月热门文章