栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据

2021-09-29

大数据 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

2021-09-29

本人菜鸟一枚,此文章用于记录所学知识点,如有错误,望各位大佬指点,我及时更正!

核心概念
  • 索引
    相当于MySQL中数据库的概念

  • 文档

相当于MySQL中一行数据的概念

注意:在ElasticSearch中 所有的数据操作都是以JSON格式表示

架构

仍在学习当中。。。

IK分词器
  • 最少切分算法 ik_smart,示例
GET _analyze     //分词器
{
  "analyzer": "ik_smart",    //分词要求     
  "text": "罗老师喜欢讲张三"  //具体文本
}

//返回结果
{
  "tokens" : [
    {
      "token" : "罗",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "老师",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "喜欢",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "讲",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "张三",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}
  • 最细粒度划分算法 ik_max_word示例
GET _analyze
{
  "analyzer": "ik_max_word",   //除了输入文本,elasticSearch也会把出现在它词库里的词分开
  "text": "罗老师喜欢讲张三"
}

//返回结果
{
  "tokens" : [
    {
      "token" : "罗",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "老师",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "喜欢",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "讲",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "CN_CHAR",
      "position" : 3
    },
    {
      "token" : "张三",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "CN_WORD",
      "position" : 4
    },
    {
      "token" : "三",
      "start_offset" : 7,
      "end_offset" : 8,
      "type" : "TYPE_CNUM",
      "position" : 5
    }
  ]
}
基本操作命令 索引操作命令
PUT /test1             //创建索引 test1
DELETe /test1          //删除索引
GET /_cat/indices?v    //查看索引状态 
文档操作命令
PUT /test1/doc/1            //单挑数据插入    索引: test1    类型:doc   Id: 1
{
  "name": "张三"
}

GET /test1/doc/1            //查看数据

POST /test1/doc/1/_update    //修改数据
{ 
  "doc": {"name": "张三2号"}
}

DELETE /test1/doc/1          //删除数据


POST /test1/doc/_bulk        //批量插入
{"index":{"_id": "1"}}
{"name": "张三1号","age":1,"tel": 111,"father":"1号父亲","nickName":"小三三1号"}
{"index":{"_id": "2"}}
{"name": "张三2号","age":2,"tel": 222,"father":"2号父亲","nickName":"小三三2号"}
{"index":{"_id": "3"}}
{"name": "张三3号","age":3,"tel": 333,"father":"3号父亲","nickName":"小三三3号"}
{"index":{"_id": "4"}}
{"name": "张三4号","age":4,"tel": 444,"father":"4号父亲","nickName":"小三三4号"}
{"index":{"_id": "5"}}
{"name": "张三5号","age":5,"tel": 555,"father":"5号父亲","nickName":"小三三5号"}
{"index":{"_id": "6"}}
{"name": "张三6号","age":6,"tel": 666,"father":"6号父亲","nickName":"小三三6号"}
{"index":{"_id": "7"}}
{"name": "张三7号","age":7,"tel": 777,"father":"7号父亲","nickName":"小三三7号"}
{"index":{"_id": "8"}}
{"name": "张三8号","age":8,"tel": 888,"father":"8号父亲","nickName":"小三三8号"}
{"index":{"_id": "9"}}
{"name": "张三9号","age":9,"tel": 999,"father":"9号父亲","nickName":"小三三9号"}
数据搜索命令 基础数据搜索命令
  • 搜索全部
GET /test1/_search
{
  "query": {"match_all": {}},
}
  • 分页搜索全部
GET /test1/_search
{
  "query": {"match_all": {}},
  "from": 1,
  "size": 5
}
  • 按照指定字段降序排列
GET /test1/_search
{
  "query": {"match_all": {}},
  "sort": {"tel": "desc"}
}
  • 搜索并返回指定字段
GET /test1/_search
{
  "query": {"match_all": {}},
  "_source": ["name" , "tel"]
}
  • 匹配搜索
{
  "query": {
    "match": {
    "father": "三三3号"   //对于文本类型是模糊匹配
    "tel": 222           //对于数值类型是精准匹配
   }
  }
}
  • 短语匹配搜索
GET /test1/_search
{
  "query": {
    "match_phrase": {
      "father": "号 父亲"    
    }
  }
}
进阶数据搜索命令
  • 组合搜索
// must:同时满足
// should: 满足任意一个
// must_not: 同时不满足

GET /test1/_search
{
  "query": {
    "bool": {
      "must": [
        {"match":{"father": "号"}},
        {"match":{"father": "亲"}}
      ],
      "must_not": [
        {"match":{"name": "2"}}
      ]
    }
  }
}
  • 过滤搜索
GET /test1/_search            // 过滤出 tel在 300-500之间的数据
{
  "query": {
    "bool": {
      "must":{"match_all": {}},
      "filter":{
        "range": {
          "tel": {
            "gte": 300,
            "lte": 500
          }
        }
      }
    }
  }
}
SpringBoot集成ElasticSearch
  • 导入依赖

            org.springframework.boot
            spring-boot-starter-data-elasticsearch

  • 配置ElasticSearch配置类
@Configuration
public class ElasticSearchClientConfig {
    // elasticSearch 默认配置
    @Bean
    public RestHighLevelClient restHighLevelClient(){
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")));
        return  client;
    }
}
  • 爬取数据(或 从数据库数据获取)

  • 将数据放入ES索引中

public Boolean parseContent(String keywords) throws Exception {
    List contentList = new HtmlParseUtil().parseJD(keywords);
    //把查询出来的数据  放入es中
    //创建批量插入请求
    BulkRequest bulkRequest = new BulkRequest();
    //请求设置
    bulkRequest.timeout("2m");
    //向请求中插入数据
    for (int i = 0; i < contentList.size(); i++) {
        bulkRequest.add(new IndexRequest("jd_goods")     //指定索引
                            .source(JSON.toJSONString(contentList.get(i)), XContentType.JSON));
    }
    //客户端发送请求
    BulkResponse bulkResponse = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
    return !bulkResponse.hasFailures();
}
  • 获取数据并实现搜索功能 和高亮功能
public  List> searchPageHighlighter(String keyword,int pageNo,int pageSize) throws IOException {
    if(pageNo <= 1){
        pageNo = 1;
    }
    //执行搜索请求
    SearchRequest searchRequest = new SearchRequest("jd_goods");   //指定搜索的索引
    //创建一个搜索条件构造器
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    //分页
    searchSourceBuilder.from(pageNo);
    searchSourceBuilder.size(pageSize);
    //精准匹配
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyword);
    //将精准匹配构造器放入搜索条件构造器中
    searchSourceBuilder.query(termQueryBuilder);
    searchSourceBuilder.timeout(new Timevalue(60, TimeUnit.SECONDS));
    //高亮
    HighlightBuilder highlightBuilder = new HighlightBuilder();    // 创建一个高亮构造器
    highlightBuilder.field("title");
    highlightBuilder.requireFieldMatch(false);   //多个高亮显示
    highlightBuilder.preTags("");
    highlightBuilder.postTags("");
    //将高亮构造器放入搜索条件构造器中
    searchSourceBuilder.highlighter(highlightBuilder);
    //把条件构造器放入请求中
    searchRequest.source(searchSourceBuilder);
    //客户端发送搜索请求
    SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
    //解析结果
    //用一个list来接收
    ArrayList> list = new ArrayList<>();
    for (SearchHit documentFields:searchResponse.getHits().getHits()) {
        //有高亮字段的结果
        Map highlightFields = documentFields.getHighlightFields();
        //拿到需要高亮的字段
        HighlightField title = highlightFields.get("title");
        //没有高亮字段的结果
        Map sourceAsMap = documentFields.getSourceAsMap();
        //解析高亮字段,将原来的字段换为我们高亮的字段即可
        if (title != null){
            Text[] fragments = title.fragments();
            String new_title = "";
            for (Text text:fragments) {
                new_title += text;
            }
            sourceAsMap.put("title",new_title);   //将高亮字段 替换掉 原来没有高亮的字段
        }
        list.add(sourceAsMap);
    }
    return list;
}
源码地址

地址

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/278148.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号