搭建elasticsearch以及实际业务中分桶聚合后按照组内分数最大值排序接口

1安装elasticsearch 1.1.解压

tar -zxvf elasticsearch-7.6.1-linux-x86_64.tar.gz

1.2修改配置文件

ulimit -Hn 65536
vim /etc/security/limits.conf

soft nofile 65536
hard nofile 65536
st soft memlock unlimited
st hard memlock unlimited
vim /etc/sysctl.conf
vm.max_map_count=655360
vm.swappiness=0
sysctl -p

vim config/elasticsearch.yml
#集群名称
cluster.name: shunteng-test
#节点名称
node.name: test-node-1
#数据文件夹
path.data: /u01/install/elasticsearch/data
#日志文件夹
path.logs: /u01/install/elasticsearch/log
#内存锁
bootstrap.memory_lock: true
官方文档建议(your node will need to bind to a non-loopback address)
network.host: 10.10.**//这行需要替换成自己的机器地址
#内部通信端口
transport.tcp.port: 9700

#端口号
http.port: 9200

1.3完整配置信息

cluster.name: shunteng-test
node.name: test-node-2
path.data: /u01/install/elasticsearch/data
path.logs: /u01/install/elasticsearch/log
bootstrap.memory_lock: true
network.host: 10.10.**//当前机器地址
http.port: 9200
discovery.seed_hosts: ["10.10., "10.10.]//这行需要替换成自己的机器地址
cluster.initial_master_nodes: [“test-node-1”,“test-node-2”]
discovery.zen.minimum_master_nodes: 1
action.destructive_requires_name: true
#内部通信端口
transport.tcp.port: 9700
node.max_local_storage_nodes: 3

1.4启动方式：

./bin/elasticsearch -d

1.5检测是否正确启动

curl http://10.10.:9200/_cat/health?v//这行需要替换成自己的机器地址
1.6java版本问题
如果因为elasticsearch启动时有配置java_home且java版本不同与所需java版本
可以修改elasticsearch-env脚本
去掉
39 if [ ! -z “ J A V A H O M E " ] ; t h e n 40 J A V A = " JAVA_HOME" ]; then 40 JAVA=" JAVAHOME"];then40JAVA="JAVA_HOME/bin/java”
41 JAVA_TYPE=“JAVA_HOME”
42 else
#中间部分要保留
50 fi
2安装ik分词插件
cd elasticsearch-7.6.1/plugins/
mkdir ik
cd ik/
cp /u01/install/elasticsearch-analysis-ik-7.6.1.zip elasticsearch-analysis-ik-7.6.1.zip
unzip elasticsearch-analysis-ik-7.6.1.zip
curl http://10.10.:9200/_cat/plugins//这行需要替换成自己的机器地址

2安装kibana管理平台 2.1解压kibana

tar -xzf kibana-7.6.1-linux-x86_64.tar.gz
cd kibana-7.6.1-linux-x86_64/config/
vim kibana.yml

2.2kibana配置文件

server.port: 5601
server.host: “0.0.0.0”
server.name: “kibana-test”
#elasticsearch节点链接
elasticsearch.hosts: [“http://10.10.:9200","http://10.10.:9200”]
elasticsearch.requestTimeout: 99999
i18n.locale: “zh-CN”

2.3将kibana的日志存到 /u01/logs/kibana中

cd /u01/logs/
mkdir kibana
cd kibana/
启动方式：
nohup /u01/install/kibana-7.6.1-linux-x86_64/bin/kibana > /u01/logs/kibana/kibana.log 2>&1 &

查找kibana 进程
ps -auxf|grep kibana

kibana首页：http://10.10.*:5601/app/kibana#/home

3常见elasticSearch操作 3.1索引操作

添加索引
put index_test
查询索引
Get index_test
删除索引
Delete index_test
关闭索引
post index_test/_close
打开索引
post index_test/_open

3.2映射操作

新建映射
PUT index_test_2/_mappings
{ “properties”: {
“test”: {
“type”: “text”
}
}
}

ik分词类型有两种ik_max_word，ik_smart
新建索引并新建映射
PUT article_index
{
“settings”: {
“analysis”: {
“analyzer”: {
“comma”: {
“type”: “pattern”,
“pattern”:",|，"
}
}
}
},
“mappings”: {
“properties”: {
“upadta_time”:{
“type”: “date”
},
“title”:{
“type”: “text”,
“analyzer”: “ik_smart”,
“search_analyzer”: “ik_smart”
},
“keywords”: {
“type”: “text”,
“analyzer”: “comma”,
“search_analyzer”: “ik_smart”
},
“category_id”:{
“type”: “long”,
“null_value”: 0
}
}
}
}

3.3 添加文档

PUT index_test/_doc/1520
{
	  "id" :1520,
	  "content":"测试",
	  "title":"测试"
}

GET /index_test/_search
{
	 "query": {
		  "match": {
		   "id":1520
		  }
	 }
}

根据id删除文档
DELETE /index_test/_doc/152

4.分桶聚合

POST seller_settle_index/_search
{
    "size":0,
    "post_filter":{
        "bool":{
            "filter":[
                {
                    "term":{
                        "isSettle":{
                            "value":1,
                            "boost":1
                        }
                    }
                }
            ],
            "adjust_pure_negative":true,
            "boost":1
        }
    },
    "aggregations":{
        "agg":{
            "date_histogram":{
                "field":"updatetime",
                "format":"yyyy-MM-dd",
                "calendar_interval":"1d",
                "offset":0,
                "order":{
                    "_key":"asc"
                },
                "keyed":false,
                "min_doc_count":1
            },
            "aggregations":{
                "payAmount":{
                    "sum":{
                        "field":"payAmount"
                    }
                },
                "amount":{
                    "sum":{
                        "field":"amount"
                    }
                },
                "fee":{
                    "sum":{
                        "field":"fee"
                    }
                },
                "platformCharges":{
                    "sum":{
                        "field":"platformCharges"
                    }
                },
                "bucket_field":{
                    "bucket_sort":{
                        "from":0,
                        "size":10,
                        "gap_policy":"SKIP"
                    }
                }
            }
        }
    },
    "aggs": {
      "count": {
        "cardinality": {
          "field": ""
        }
      }
    }
}

4. 实际应用

用于产品搜索，实际需求是需要付费用户在前，但是不能出现一个用户太多产品，导致一页全是该用户的情况出现

4.1term查询问题

解决办法：
1 直接使用match查询，但会分词，查出其他不必要的

2 使用该字段的mapping属性".keyword",因为es对string类型的字段默认为text，fields表示对一个字段设置多种索引模式，同一个字段的值，一个分词，一个不分词，而keyword就是不分词。

3 在写入新字段钱先手动设置mapping，这可保证你需要字段的类型

4.2.取别名

处理中将付费产品和免费产品分作不同的索引,用于优化根据会员类型查询时的消耗
但是查询中可能需要同时查询两个索引,所以给两个索引取了别名

POST /_aliases
{
"actions": [
    {
      "add": {
      "index": "index_product_payed",
      "alias": "index_product_all"
      }
    },
    {
      "add": {
      "index": "index_product",
      "alias": "index_product_all"
      }
    }
  ]
}

4.3 实际应用上根据userId分组产品并自定义评分，按照评分排序

POST index_product_payed/_search
{
  "from": 0,
  "size": 0,
  "query": {
    "function_score": {//自定义评分，
      "query": {
        "bool": {
          "should": [
            {
              "term": {
                "productName": {
                  "value": "机械",
                  "boost": 300
                }
              }
            }
          ],
          "adjust_pure_negative": true,
          "boost": 1
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "productScore",//创建索引时会在后端判断评分,自己添加进去一个影响评分的字段
            "factor": 1,
            "missing": 1,
            "modifier": "sqrt"
          }
        }
      ],
      "score_mode": "multiply",
      "boost_mode": "multiply",
      "max_boost": 3.4028235e+38,
      "boost": 1
    }
  },
  "aggregations": {
    "group_by_userId": {
      "terms": {
        "field": "userId",
        "size": 400,//获取的桶数量
        "min_doc_count": 1,
        "shard_min_doc_count": 0,
        "show_term_doc_count_error": false,
        "order": [//按照分桶内最大值排序
          {
            "maxSource": "desc"
          },
          {
            "_key": "asc"
          }
        ]
      },
      "aggregations": {
        "maxSource": {
          "max": {//因为实际上_score是一个runtime字段所以想要取这个字段需要使用"script",
          				//如果需要使用文档内字段”price“可以直接使用 "field": "price"
            "script": {
              "source": "_score"
            }
          }
        },
        "onlyOne": {
          "top_hits": {
            "from": 0,
            "size": 1,//一个分桶内只获取一条数据
            "version": false,
            "seq_no_primary_term": false,
            "explain": false,//为true的话评分明细返回,会影响性能
            "_source": {//需要返回的字段
              "includes": [
                "productName",
                "companyName",
                "companyAddr",
                "imgs",
                "price",
                "majorProd",
                "majorProdProp",
                "manageModel",
                "id",
                "minordernum",
                "productAttr",
                "shopId",
                "userId",
                "enName",
                "isPromote",
                "updateTime",
                "categoryId",
                "minordernum",
                "calCeil"
              ],
              "excludes": []
            },
              "sort": [
                {//按照分桶内分数排序
                 "_score":{
                    "order": "desc"
                  }
                }
              ]
          }
        }
      }
    }
  }
}

4.4 java实现

@Service
public class IndexProductServiceImpl implements IndexProductService{

    protected Logger log = LoggerFactory.getLogger(getClass());

    static String listIncludes[] = {"productName","companyName","companyAddr","imgs","price","majorProd","majorProdProp","manageModel",
    		"id","minordernum","productAttr","shopId","userId","enName","isPromote","updateTime","categoryId","minordernum","calCeil"};
    
    static String imgIncludes[] = {"productName","companyName","companyAddr","imgs","price","manageModel","id","minordernum",
    		"shopId","userId","enName","isPromote","updateTime","categoryId","minordernum","calCeil"};
    
    //默认分类id评分300
    static Integer categoryIdBoost = 300;
    
    @Autowired
    private RestHighLevelClient restHighLevelClient;
    
    private static final String payIndexName = "index_product_payed";
    
    private static final String AllIndexName = "index_product_all";
    
    private static final String freeIndexName = "index_product";
    
    private static final Integer aggDefaultNum = 400;
    
    private static final Integer compAddrBoost = 0;
    
    private static final String CACHEKEY = "groupNum_kw_";
    
    private static final Integer freeUserType = 0;
    
    private static final Integer vipUserType = 1;
    
    @Autowired
    private JedisUtil searchGroupNumCache;
    
    
    private List matchIndexAll(ProductsQuery productsQuery) throws IOException {
    	SearchRequest productSearch = new SearchRequest(AllIndexName);
    	//搜索条件对象
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        Integer pageNum = null == productsQuery.getPageNum() ? 1 : productsQuery.getPageNum();
        Integer pageSize = null == productsQuery.getPageSize() ? 40 : productsQuery.getPageSize();
        Integer offset = pageNum - 1 * pageSize;
    	MatchQueryBuilder matchQuery = QueryBuilders.matchQuery("productName", productsQuery.getKw());
    	//非聚合查询
    	if(productsQuery.getOffset() != null) {
    		offset = productsQuery.getOffset() + offset;
    	}
    	sourceBuilder.from(offset);
    	sourceBuilder.size(pageSize);
    	//选取需要返回的字段
    	if(StringUtils.isBlank(productsQuery.getView()) || "list".equals(productsQuery.getView())){
    		sourceBuilder.fetchSource(listIncludes, null);
		}
		else {
			sourceBuilder.fetchSource(imgIncludes, null);
		}
    	sourceBuilder.query(matchQuery);
    	//发送请求
        SearchResponse response = restHighLevelClient.search(productSearch, RequestOptions.DEFAULT);
        List list=new ArrayList<>();
        //这里到时候调试的时候要转成对应的类型
        for (SearchHit hit : response.getHits().getHits()) {
        	IndexProduct index = BeanUtil.mapToBean(hit.getSourceAsMap(), IndexProduct.class, true);
        	index.setId(Long.valueOf(hit.getId()));
        	list.add(index);
        }
		return list;
    }

    @Override
    public List query(ProductsQuery productsQuery) throws IOException {
    	
    	if(CollectionUtils.isNotEmpty(productsQuery.getSearchWords()) && productsQuery.getSearchWords().size() > 5 && StringUtils.isNotBlank(productsQuery.getKw()) && productsQuery.getKw().length() > 12) {
    		matchIndexAll(productsQuery);
    	}
    	Long startime = System.currentTimeMillis();
        //创建搜索对象
    	SearchRequest productSearch;
    	if(productsQuery.getUserType() == null || productsQuery.getUserType() == vipUserType) {
    		productSearch = new SearchRequest(payIndexName);
    	}else {
    		productSearch = new SearchRequest(freeIndexName);
    		productsQuery.setQueryType(ProductSearchType.NO_AGG_SEARCH.getId());
    	}
        //搜索条件对象
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        //配置分页信息
    	Integer pageNum = null == productsQuery.getPageNum() ? 1 : productsQuery.getPageNum();
        Integer pageSize = null == productsQuery.getPageSize() ? 40 : productsQuery.getPageSize();
        Integer needCount = pageNum * pageSize;
        //拼接搜索条件
        BoolQueryBuilder query = buildBoolQuery(productsQuery);
        
        FieldValueFactorFunctionBuilder fieldQuery = new FieldValueFactorFunctionBuilder("productScore");
        // 额外分数=log(1+score)
        fieldQuery.factor(1).missing(1).modifier(FieldValueFactorFunction.Modifier.SQRT);
        FunctionScoreQueryBuilder functionScoreQuery = QueryBuilders.functionScoreQuery(query, fieldQuery);
        functionScoreQuery.boostMode(CombineFunction.MULTIPLY);
        
        if(productsQuery.getQueryType() == ProductSearchType.NO_AGG_SEARCH.getId()) {
        	//非聚合查询
        	Integer offset = (pageNum -1)*pageSize;
        	if(productsQuery.getOffset() != null) {
        		offset = productsQuery.getOffset() + offset;
        	}
        	sourceBuilder.from(offset);
        	sourceBuilder.size(pageSize);
        	
        	//选取需要返回的字段
        	if(StringUtils.isBlank(productsQuery.getView()) || "list".equals(productsQuery.getView())){
        		sourceBuilder.fetchSource(listIncludes, null);
    		}
    		else {
    			sourceBuilder.fetchSource(imgIncludes, null);
    		}
        }else {
        	//如果超出默认查询部分就从免费中获取数据
        	if(needCount > aggDefaultNum) {
        		productsQuery.setUserType(freeUserType);
            	productsQuery.setPageSize(pageSize);
            	Integer freePageNum = needCount - aggDefaultNum % 40 == 0 ? needCount - aggDefaultNum / 40 : needCount - aggDefaultNum / 40 + 1;
            	productsQuery.setPageNum(freePageNum);
            	Object numCache = searchGroupNumCache.getObj(CACHEKEY + productsQuery.getKw());
            	if(numCache == null) {
            		//如果没有付费数据查询数量的缓存进行一次查询
            		ProductsQuery onePageQuery = BeanUtil.copyProperties(productsQuery, ProductsQuery.class);
            		onePageQuery.setPageNum(1);
            		onePageQuery.setPageSize(1);
            		query(onePageQuery);
            		numCache = searchGroupNumCache.getObj(CACHEKEY + productsQuery.getKw());
            	}
            	Integer offset = (Integer)numCache;
        		productsQuery.setOffset(offset);
            	log.error("take time :"+ (System.currentTimeMillis()-startime));
            	return query(productsQuery);
        	}
        	//聚合查询 分组查询时不返回hits
        	sourceBuilder.from(0);
        	sourceBuilder.size(0);
        	AggregationBuilder aggregation = AggregationBuilders.terms("group_by_userId").field("userId").minDocCount(1).size(aggDefaultNum)
        			//排序,false是DESC,true是ASC
        			.order(BucketOrder.aggregation("maxSource", false));
        	TopHitsAggregationBuilder topHitsAggregation = AggregationBuilders.topHits("onlyOne").size(aggDefaultNum).size(1);
        	if(productsQuery.getExpain()) {
        		topHitsAggregation.explain(true);
            }
        	//选取需要返回的字段
        	if(StringUtils.isBlank(productsQuery.getView()) || "list".equals(productsQuery.getView())){
        		topHitsAggregation.fetchSource(listIncludes, null);
    		}
    		else {
    			topHitsAggregation.fetchSource(imgIncludes, null);
    		}
        	if(productsQuery.getQueryType() == ProductSearchType.AGG_BY_PRICE.getId()) {
        		//排序的额外字段为价格
        		aggregation.subAggregation(AggregationBuilders.max("maxSource").field("price"));
        	}else {
        		aggregation.subAggregation(AggregationBuilders.max("maxSource").script( new script("_score")));
        	}
        	aggregation.subAggregation(topHitsAggregation);
        	sourceBuilder.aggregation(aggregation);
        }
        
        //封装搜索条件
        if(productsQuery.getQueryType() == ProductSearchType.AGG_BY_PRICE.getId()) {
        	sourceBuilder.query(query);
        }else {
        	sourceBuilder.query(functionScoreQuery);
        }
        //封装搜索对象
        productSearch.source(sourceBuilder);
        //发送请求
        SearchResponse response = restHighLevelClient.search(productSearch, RequestOptions.DEFAULT);
        List list=new ArrayList<>();
        //这里到时候调试的时候要转成对应的类型
        if(productsQuery.getQueryType() == ProductSearchType.NO_AGG_SEARCH.getId()) {
        //非聚合查询的解析
        	for (SearchHit hit : response.getHits().getHits()) {
        		IndexProduct index = BeanUtil.mapToBean(hit.getSourceAsMap(), IndexProduct.class, true);
        		index.setId(Long.valueOf(hit.getId()));
        		list.add(index);
        	}
        }else {
        	Aggregations aggregations = response.getAggregations();
        	List aggList = aggregations.asList();
        	if(CollectionUtils.isEmpty(aggList)) {
        		log.error("take time :"+ (System.currentTimeMillis()-startime));
        		return list;
        	}else {
        		//聚合查询的解析
        		Terms terms = (Terms) aggregations.asList().get(0);
        		for (Bucket bucket : terms.getBuckets()) {
        			List aggs = bucket.getAggregations().asList();
        			TopHits topHits = (TopHits) aggs.get(0);
        			SearchHits hits = topHits.getHits();
        			if(hits.getHits() != null && hits.getHits().length > 0) {
        				for(SearchHit hit:hits.getHits()) {
        					IndexProduct index = BeanUtil.mapToBean(hit.getSourceAsMap(), IndexProduct.class, true);
        					if(productsQuery.getExpain()) {
        						Explanation expain = hit.getExplanation();
        						log.error("pid_"+ index.getId() + "_explanation:" + expain.toString());
        					}
                    		index.setId(Long.valueOf(hit.getId()));
                    		list.add(index);
            			}
        			}
        		}
        	}
        	searchGroupNumCache.setObj(CACHEKEY + productsQuery.getKw(), aggDefaultNum - list.size());
        	//如果付费数量不足从免费中抽取数量,补足400条
            if(list.size() < needCount) {
            	productsQuery.setUserType(freeUserType);
            	Integer pageSise = needCount - list.size(); 
            	productsQuery.setPageSize(pageSise);
            	productsQuery.setPageNum(1);
            	Integer offset = (Integer) searchGroupNumCache.getObj(CACHEKEY + productsQuery.getKw());
            	productsQuery.setOffset(offset);
            	list.addAll(query(productsQuery));
            }
        }
        log.error("take time :"+ (System.currentTimeMillis()-startime));
        if(list.size() > pageNum * pageSize) {
    		return list.subList((pageNum-1) * pageSize, pageNum * pageSize);
    	}
        return list;
    }

	@Override
	public Long queryCount(ProductsQuery productsQuery) throws IOException {
		//创建付费搜索对象
		CountRequest vipCountSearch = new CountRequest(payIndexName);
		BoolQueryBuilder query = buildBoolQuery(productsQuery);
		vipCountSearch.query(query);
		 //发送请求
        CountResponse vipResponse = restHighLevelClient.count(vipCountSearch, RequestOptions.DEFAULT);
        Long vipCount = vipResponse.getCount();
        //创建免费搜索对象
  		CountRequest freeCountSearch = new CountRequest(freeIndexName);
  		freeCountSearch.query(query);
  		//发送请求
        CountResponse freeResponse = restHighLevelClient.count(freeCountSearch, RequestOptions.DEFAULT);
        Long freeCount = freeResponse.getCount();
		return vipCount + freeCount;
	}
	
	private BoolQueryBuilder buildBoolQuery(ProductsQuery productsQuery){
		
        BoolQueryBuilder query = QueryBuilders.boolQuery();
        if (StringUtils.isNotBlank(productsQuery.getKw())){
            query.should(QueryBuilders.termQuery("productName", productsQuery.getKw()));
        }
        //产品名称搜索词数组
        if (CollectionUtils.isNotEmpty(productsQuery.getSearchWords())){
        	for(SearchWords kw : productsQuery.getSearchWords()) {
        		if(kw != null && StringUtils.isNotBlank(kw.getWord())) {
        			query.should(QueryBuilders.termQuery("productName", kw.getWord()).boost(kw.getBoost()));
        			query.should(QueryBuilders.termQuery("keywords", kw.getWord()).boost(kw.getBoost()));
        		}
        	}
        }
        //分类id数组
        if (CollectionUtils.isNotEmpty(productsQuery.getCategoryIds())){
        	for(Long id : productsQuery.getCategoryIds()) {
        		if(id != null) {
        			query.should(QueryBuilders.termQuery("categoryId", id).boost(categoryIdBoost));
        		}
        	}
        }
        //产品属性
        if (CollectionUtils.isNotEmpty(productsQuery.getFeatures())){
        	for(String featureWord : productsQuery.getFeatures()) {
        		if(StringUtils.isNotBlank(featureWord)) {
        			query.should(QueryBuilders.termQuery("productAttrValue", featureWord));
        		}
        	}
        }
        if (productsQuery.getUserSource() != null) {
        	query.must(QueryBuilders.termQuery("userSource", productsQuery.getUserSource()));
        }
        if (productsQuery.getCityId() != null) {
        	query.must(QueryBuilders.termQuery("cityId", productsQuery.getCityId()));
        }
        if (productsQuery.getProvinceId() != null) {
        	query.must(QueryBuilders.termQuery("provinceId", productsQuery.getProvinceId()));
        }
        if (productsQuery.getCategoryName() != null) {
        	query.must(QueryBuilders.termQuery("categoryName", productsQuery.getCategoryName()));
        }
        
        
        
        return query;
	}

}

5 所使用的性能优化

最开始这个分组的查询语句耗时甚至需要几千毫秒，显然是不可以接受的,此是就遇到了性能优化问题，

1将需要精准命中的数字字段修改成了keyword字段, 2 设置需要分组的字段

官方文档给这个字段的定义
默认情况下会在第一次搜索期间加载字典，但是设置eager_global_ordinals后，在每次refresh时会更新字典，常驻于内存会减少分组的耗时，但是有利有弊会增加新增索引以及refresh的开销，由于类似于groupBy的操作，所以是按照相同值进行分组，并不适合大量新增以及修改的索引，也不适合于大量不同值，很少出现相同值的字段

"userId" : {
          "type" : "keyword",
          "eager_global_ordinals" : true
        }

3 业务层面上减少搜索的条件避免一个条件会查询到大量文档

处理中将付费产品和免费产品分作不同的索引,用于优化根据会员类型查询时的消耗

进行优化后,当构造器被缓存之后,查询基本在50ms左右，如果查询结果被缓存后，查询时间会压缩到10ms以内

搭建elasticsearch以及实际业务中分桶聚合后按照组内分数最大值排序接口

大数据系统相关栏目本月热门文章