什么是ElasticSearch_大数据系统

ElasticSearch

1.什么是ElasticSearch
2.ElasticSearch的使⽤案例
3.ElasticSearch对⽐Solr
4.Elasticsearch 安装
5.ElasticSearch的客户端操作
6.IK分词器
- IK分词器简介
- IK分词器安装
7.Spring Data ElasticSearch
- 7.1 Spring Data ElasticSearch
- 7.2 入门案例

1.什么是ElasticSearch

Elaticsearch，简称为es， es是⼀个开源的⾼扩展的分布式全⽂检索引擎，它可以近乎实时的存储、检索数据；本⾝扩展性很好，可以扩展到上百台服务器，处理PB级别的数据。es也使⽤Java开发并使⽤Lucene作为其核⼼来实现所有索引和搜索的功能，但是它的⽬的是通过简单的RESTful API来隐藏Lucene的复杂性，从⽽让全⽂搜索变得简单。

2.ElasticSearch的使⽤案例

2013年初，GitHub抛弃了Solr，采取ElasticSearch 来做PB级的搜索。 “GitHub使⽤ElasticSearch搜索20TB的数据，包括13亿⽂件和1300亿⾏代码”
维基百科：启动以elasticsearch为基础的核⼼搜索架构
SoundCloud：“SoundCloud使⽤ElasticSearch为1.8亿⽤户提供即时⽽精准的⾳乐搜索服务”
百度：百度⽬前⼴泛使⽤ElasticSearch作为⽂本数据分析，采集百度所有服务器上的各类指标数据及⽤户⾃定义数据，通过对各种数据进⾏多维分析展⽰，辅助定位分析实例异常或业务层⾯异常。⽬前覆盖百度内部20多个业务线（包括casio、云分析、⽹盟、预测、⽂库、直达号、钱包、风控等），
单集群最⼤100台机器，200个ES节点，每天导⼊30TB+数据
新浪使⽤ES 分析处理32亿条实时⽇志
阿⾥使⽤ES 构建挖财⾃⼰的⽇志采集和分析体系

3.ElasticSearch对⽐Solr

Solr 利⽤ Zookeeper 进⾏分布式管理，⽽ Elasticsearch ⾃⾝带有分布式协调管理功能;
Solr ⽀持更多格式的数据，⽽ Elasticsearch 仅⽀持json⽂件格式；
Solr 官⽅提供的功能更多，⽽ Elasticsearch 本⾝更注重于核⼼功能，⾼级功能多有第三⽅插件提供；
Solr 在传统的搜索应⽤中表现好于 Elasticsearch，但在处理实时搜索应⽤时效率明显低于Elasticsearch

4.Elasticsearch 安装

(1)docker镜像安装
docker pull elasticsearch:5.6.8

(2)安装es容器
docker run -id --name=es -p 9200:9200 -p 9300:9300 elasticsearch:5.6.8

接下来开启远程连接
上⾯完成安装后，es并不能正常使⽤，elasticsearch从5版本以后默认不开启远程连接，程序直接连接会报如下错误:

failed to load elasticsearch nodes :
org.elasticsearch.client.transport.NoNodeAvailableException: None of the
configured nodes are available: [{#transport#-1}{5ttLpMhkRjKLkvoY7ltUWg}

我们需要修改es配置开启远程连接，代码如下：

docker exec -it es /bin/bash

输入命令:
cd config
vi elasticsearch.yml
vi命令不识别  在docker容器下载一下vim
把transport.host: 0.0.0.0 注释去掉
加上cluster.name: my-elasticsearch
然后保存退出
重启docker
docker restart es

最后加一个跨域配置
修改elasticsearch/config下的配置⽂件：elasticsearch.yml，增加以下三句命令，并重启:
http.cors.enabled: true
http.cors.allow-origin: "*"
network.host: 192.168.220.100
其中：
http.cors.enabled: true：此步为允许elasticsearch跨域访问，默认是false。
http.cors.allow-origin: ""：表⽰跨域访问允许的域名地址（表⽰任意）

最后再次重启es
docker restart es

5.ElasticSearch的客户端操作

ElasticSearch不同于Solr⾃带图形化界⾯，我们可以通过安装ElasticSearch的head插件，完成图形化界⾯的效果，完成索引数据的查看。安装插件的⽅式有两种，在线安装和本地安装。本⽂档采⽤本地安装⽅式进⾏head插件的安装。elasticsearch-5-*以上版本安装head需要安装node和grunt
1）下载head插件：https://github.com/mobz/elasticsearch-head
2）将elasticsearch-head-master压缩包解压到任意⽬录，但是要和elasticsearch的安装⽬录区别开
3）下载nodejs：https://nodejs.org/en/download/
4）将grunt安装为全局命令，Grunt是基于Node.js的项⽬构建⼯具
在cmd控制台中输⼊如下执⾏命令：

cnpm install -g grunt-cli

5）进⼊elasticsearch-head-master⽬录启动head，在命令提⽰符下输⼊命令

>npm install
>grunt server

6）打开浏览器，输⼊ http://localhost:9100，看到如下页⾯：

6.IK分词器 IK分词器简介

IKAnalyzer是⼀个开源的，基于java语⾔开发的轻量级的中⽂分词⼯具包。从2006年12⽉推出1.0版开始，IKAnalyzer已经推出了3个⼤版本。最初，它是以开源项⽬Lucene为应⽤主体的，结合词典分词和⽂法分析算法的中⽂分词组件。新版本的IKAnalyzer3.0则发展为⾯向Java的公⽤分词组件，独⽴于
Lucene项⽬，同时提供了对Lucene的默认优化实现。
IK分词器3.0的特性如下：
1）采⽤了特有的“正向迭代最细粒度切分算法“，具有60万字/秒的⾼速处理能⼒。
2）采⽤了多⼦处理器分析模式，⽀持：英⽂字母（IP地址、Email、URL）、数字（⽇期，常⽤中⽂数量词，罗马数字，科学计数法），中⽂词汇（姓名、地名处理）等分词处理。
3）对中英联合⽀持不是很好,在这⽅⾯的处理⽐较⿇烦.需再做⼀次查询,同时是⽀持个⼈词条的优化的词典存储，更⼩的内存占⽤。
4）⽀持⽤户词典扩展定义。
5）针对Lucene全⽂检索优化的查询分析器IKQueryParser；采⽤歧义分析算法优化查询关键字的搜索排列组合，能极⼤的提⾼Lucene检索的命中率。

IK分词器安装

(1)安装ik分词器
IK分词器下载地址https://github.com/medcl/elasticsearch-analysis-ik/releases
将ik分词器上传到服务器上，然后解压，并改名字为ik

unzip elasticsearch-analysis-ik-5.6.8.zip
mv elasticsearch ik

将ik⽬录拷贝到docker容器的plugins⽬录下

docker cp ./ik _es:/usr/share/elasticsearch/plugins

ik_max_word：会将⽂本做最细粒度的拆分

⽐如会将“中华⼈民共和国⼈民⼤会堂”拆分为“中华⼈民共和国、中华⼈民、
中华、华⼈、⼈民共和国、⼈民、共和国、⼤会堂、⼤会、会堂等词语。

ik_smart：会做最粗粒度的拆分

⽐如会将“中华⼈民共和国⼈民⼤会堂”拆分为中华⼈民共和国、⼈民⼤会堂。

7.Spring Data ElasticSearch

Spring Data是⼀个⽤于简化数据库访问，并⽀持云服务的开源框架。其主要⽬标是使得对数据的访问变得⽅便快捷，并⽀持map-reduce框架和云计算数据服务。 Spring Data可以极⼤的简化JPA的写法，可以在⼏乎不⽤写实现的情况下，实现对数据的访问和操作。除了CRUD外，还包括如分页、排序等⼀些常⽤的功能。
Spring Data的官⽹：http://projects.spring.io/spring-data/

7.1 Spring Data ElasticSearch

Spring Data ElasticSearch 基于 spring data API 简化 elasticSearch操作，将原始操作elasticSearch的客户端API 进⾏封装。Spring Data为Elasticsearch项⽬提供集成搜索引擎。Spring Data Elasticsearch POJO的关键功能区域为中⼼的模型与Elastichsearch交互⽂档和轻松地编写⼀个存储库数据访问层。
官⽅⽹站：http://projects.spring.io/spring-data-elasticsearch

7.2 入门案例

1）导⼊Spring Data ElasticSearch坐标



	 4.0.0
	 
		 org.springframework.boot
		 spring-boot-starter-parent
		 2.1.16.RELEASE
		  
	 
	 com.example
	 demo
	 0.0.1-SNAPSHOT
	 demo
	 Demo project for Spring Boot
	 
	 	1.8
	 
	 
		 
			 org.springframework.boot
			 spring-boot-starter-data-elasticsearch
		 
		 
			 org.springframework.boot
			 spring-boot-starter-test
			 test
		 
	 
	 
		 
			
				 org.springframework.boot
				 spring-boot-maven-plugin

2）启动器配置⽂件

package com.example.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class DemoApplication {
	 public static void main(String[] args) {
 		SpringApplication.run(DemoApplication.class, args);
	 }
}

spring:
 data:
 	elasticsearch:
		cluster-name: my-elasticsearch
		cluster-nodes: 192.168.220.100:9300

3）编写实体Article

@document(indexName = "lxs_blog", type = "article")
public class Article {
@Id
@Field(type = FieldType.Long, store = true)
private long id;
@Field(type = FieldType.Text, store = true, analyzer = "ik_smart")
private String title;
@Field(type = FieldType.Text, store = true, analyzer = "ik_smart")
private String content;

//getter/setter
}

4）编写Dao

package com.example.demo.repositories;

import com.example.demo.entity.Article;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
import java.util.List;
public interface ArticleRepository extends ElasticsearchRepository {
}

5）创建测试类

package com.example.demo;
import com.example.demo.entity.Article;
import com.example.demo.repositories.ArticleRepository;
import org.elasticsearch.index.query.QueryBuilders;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.core.ElasticsearchTemplate;
import org.springframework.data.elasticsearch.core.query.NativeSearchQuery;
import
org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.test.context.junit4.SpringRunner;
import java.util.List;
import java.util.Optional;
@RunWith(SpringRunner.class)
@SpringBootTest
public class DemoApplicationTests {
 @Autowired
 private ArticleRepository articleRepository;
 @Autowired
 private ElasticsearchTemplate template;
 @Test
 public void createIndex() throws Exception {
 //创建索引，并配置映射关系
 template.createIndex(Article.class);
 //配置映射关系
 //template.putMapping(Article.class);
 }
 @Test
 public void adddocument() throws Exception {
 for (int i = 10; i <= 20; i++) {
 //创建⼀个Article对象
 Article article = new Article();
 article.setId(i);
 article.setTitle("⼥护⼠路遇昏迷男⼦跪地抢救：救⼈是职责更是本能" + i);
 article.setContent("这是⼀个美丽的⼥护⼠妹妹" + i);
 //把⽂档写⼊索引库
 articleRepository.save(article);
 }
 }
 @Test
 public void deletedocumentById() throws Exception {
// articleRepository.deleteById(1l);
 //全部删除
 articleRepository.deleteAll();
 }
 @Test
 public void findAll() throws Exception {
 Iterable articles = articleRepository.findAll();
 articles.forEach(a-> System.out.println(a));
 }
 @Test
 public void testFindById() throws Exception {
 Optional optional = articleRepository.findById(10l);
 Article article = optional.get();
 System.out.println(article);
 }
 @Test
 public void testFindByTitle() throws Exception {
 List list = articleRepository.findByTitle("⼥护⼠");
 list.stream().forEach(a-> System.out.println(a));
 }
 @Test
 public void testFindByTitleOrContent() throws Exception {
 Pageable pageable = PageRequest.of(1, 5);
 articleRepository.findByTitleOrContent("title", "⼥护⼠", pageable)
 .forEach(a-> System.out.println(a));
 }
 @Test
 public void testNativeSearchQuery() throws Exception {
 //创建⼀个查询对象
 NativeSearchQuery query = new NativeSearchQueryBuilder()
 .withQuery(QueryBuilders.queryStringQuery("⼥护
⼠").defaultField("title"))
 .withPageable(PageRequest.of(0, 15))
 .build();
 //执⾏查询
 List articleList = template.queryForList(query,
Article.class);
 articleList.forEach(a-> System.out.println(a));
 }
}

最后我们的ElasticSearch介绍完毕了,它不仅可以做全文检索,也可以做海量日志采集,TB级的数据对于elasticsearch来说都是小意思.

什么是ElasticSearch

大数据系统相关栏目本月热门文章