1.1 head插件 安装华为云镜像
ElasticSearch: https://mirrors.huaweicloud.com/elasticsearch/?C=N&O=D(下载与es相同版本
logstash: https://mirrors.huaweicloud.com/logstash/?C=N&O=D(未用上)
kibana: https://mirrors.huaweicloud.com/kibana/?C=N&O=D(下载与es相同版本
IK分词器:https://github.com/medcl/elasticsearch-analysis-ik/releases(下载与es相同版本)
header插件:https://github.com/mobz/elasticsearch-head/archive/master.zip
git clone git://github.com/mobz/elasticsearch-head.git`
解压文件并进入
npm install
npm run start
open http://localhost:9100
访问结果
1.2 elasticsearch安装解压
1.3 kibana安装解压
elasticsearch.yml配置文件修改,配置跨域访问
http.cors.enabled: true http.cors.allow-origin: "*"
启动elasticsearch.bat
访问http://localhost:9200
进入kibana-7.9.2-windows-x86_64x-packpluginstranslationstranslations 复制中文配置名称
修改config目录下kibana.yml文件,添加配置
i18n.locale: "zh-CN"
启动kibana.bat
1.4 IK分词器安装解压
访问http://localhost:5601
在elasticsearch/plugins的新建Ik文件,并将IK分词器解压后的文件移入
重启elasticsearch
这里可能会出现重启闪退的情况,原因经过查找是因为Ik分词器中elasticsearch依赖和我们使用的elasticsearch版本不一致导致。最好还是查看elasticsearch版本,下载相对应版本ik。
1.4.1 分词器测试启动便会加载ik插件
打开kibana控制台
GET _analyze { "analyzer": "ik_smart", "text": "今天星期六" } GET _analyze { "analyzer": "ik_max_word", "text": "今天星期六" }分别运行会得到以下结果
2、 elasticsearchRest风格相关操作两个分词算法
ik_max_word: 会将文本做最细粒度的拆分,会穷尽各种可能的组合,适合 Term Query;
ik_smart:会做最粗粒度的拆分,只是大致的拆分了文本,适合 Phrase 查询。
- 在使用时我们可以在ik插件文件下添加自定义的字典文件xxx.dic,然后在IkAnalyer.cfg.xml添加自己的字典文件,这样在做分词是会保留自定义的词典
Rest风格说明
| method | url地址 | 描述 |
|---|---|---|
| get | http:localhost:9100/索引名称/类型名称/文档id | 查询指定文档 |
| put | http:localhost:9100/索引名称/类型名称/文档id | 创建指定id文档 |
| post | http:localhost:9100/索引名称/类型名称 | 创建随机id文档 |
| delete | http:localhost:9100/索引名称/类型名称/文档id | 删除指定文档 |
| post | http:localhost:9100/索引名称/类型名称/文档id_update | 更新指定文档 |
| post | http:localhost:9100/索引名称/类型名称_search | 查询所有数据 |
elasticsearch与数据库对应关系
| DB | elasticsearch |
|---|---|
| 数据库database | 索引indices |
| 表table | 类型type(弃用) |
| 表结构schema | 映射mapping |
| 行rows | 文档documents |
| 字段columns | fileds |
创建索引
PUT /索引名/类型名/文档id
{
请求体
}
-------------------------------------
PUT /test1/type1/1
{
"name":"周六",
"age":"3"
}
ctrl + enter 运行,运行成功后,在9100端口就可以看见索引test1
elasticsearch字段名称类型
| 类型 | 关键字 |
|---|---|
| 字符串 | keyword、text |
| 数值类型 | long、integer、short、byte、double、float、half、scaled |
| 日期类型 | data |
| 布尔类型 | boolean |
| 二进制类型 | binary |
指定字段类型创建索引,如果没有指定类型,es会默认配置字段类型
PUT /test2
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"birthday":{
"type": "date"
}
}
}
}
put修改索引信息
PUT /test1/type1/1
{
"name":"周六",
"age":"4"
}
2.2 post
更新索引信息
post修改字段信息,只许选择所需要修改的字段,就可以完成修改。
POST /test1/type1/1
{
"name":"张三"
}
//如果id存在,会默认为更新update操作自动完成对应修改,但是缺失的字段信息会为空,同PUT。!!!!!
------------------------------------
POST /test1/type1/1_updade
{
"doc":{
"name":"张三"
}
}
2.3 delete
删除索引
DELETE test2 ------------- DELETE test2/类型名/id2.4 get
获取索引信息
GET test1 ------------ GET /test1/type1/1 ------------ GET /test1/type1/_search?q=name:小明2.5 精确查询与模糊查询
(添加几条数据进行测试)
GET test2/user/_search
{
"query":{
"match": {
"name": "小明"
}
}
}
match 查找会进行模糊匹配,选取所有可能选项,每个可能都会有了一个_score分数,分数越高,匹配度也就越高,默认按分数排列
添加过滤条件
—过滤,只显示age和dec属性—
GET test2/user/_search
{
"query":{
"match": {
"name": "小明"
}
},
"_source":["age","dec"]
}
—排序(查询结果按年龄降序排列)—
GET test2/user/_search
{
"query":{
"match": {
"name": "小明"
}
},
"sort":[
{
"age":{
"order":"desc"
}
}
]
}
—分页—
GET test2/user/_search
{
"query":{
"match": {
"name": "小明"
}
},
"sort":[
{
"age":{
"order":"desc"
}
}
],
"from" :0,
"size":2
}
说明:es不支持对text类型的字段进行聚合操作,需要对age属性进行修改
PUT /test2/_mapping?pretty
{
"properties": {
"age": {
"type": "text",
"fielddata": true
}
}
boolean值查询 多条件查询 must 相当于and ,should相当于or
must /should 查询
GET test2/user/_search
{
"query":{
"bool": {
"must": [
{
"match": {
"name": "明"
}
},
{
"match": {
"age": "10"
}
}
]
}
}
}
添加过滤条件,年龄大于等于10,小于等于20。lt/gt 小/大于,加e表等于
GET test2/user/_search
{
"query":{
"bool": {
"should": [
{
"match": {
"name": "明"
}
}
],
"filter": [
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
]
}
}
}
模糊查询 ,tags中含有游泳 足的结果,空格隔开
GET /test2/user/_search
{
"query":{
"match": {
"tags": "游泳 足"
}
}
}
term 精确查询 ,通过倒排索引指定的词条进行精确查找,match会进行分词解析。
创建索引,适应不同类型创建字段
PUT /test3
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"dec":{
"type": "keyword"
}
}
}
}
插入数据
PUT /test3/_doc/2
{
"name":"qwe",
"dec":"qwe11"
}
-----------
PUT /test3/_doc/1
{
"name":"qwe",
"dec":"qwe1"
}
查询比较
GET _analyze
{
"analyzer": "keyword"
, "text": "qwe"
}
------------
GET _analyze
{
"analyzer": "standard"
, "text": "qwe"
}
standard 会分词,而keyword不会
term text查询
GET /test3/_search
{
"query": {
"term": {
"dec": {
"value": "qwe"
}
}
}
}
-----------------
GET /test3/_search
{
"query": {
"term": {
"name": {
"value": "qwe"
}
}
}
}
查询name会有结果,而dec没有。对于精确查找keyword类型不会被分词。
term 多条件查询
GET /test3/_search
{
"query":{
"bool":{
"should":{
{
"term":{
条件1:
}
},
{
"term":{
条件2:
}
}
}
}
}
}
2.6 高亮显示
GET test2/user/_search
{
"query":{
"match": {
"name": "小明"
}
},
"highlight":{
"pre_tags": "
7.9.2
1.8
org.springframework.boot
spring-boot-starter-data-elasticsearch
org.springframework.boot
spring-boot-starter-web
org.springframework.boot
spring-boot-devtools
runtime
true
org.projectlombok
lombok
true
org.springframework.boot
spring-boot-starter-test
test
com.alibaba
fastjson
1.2.76
3.2 创建配置类
向spring容器中注册restHighLevelClient
@Configuration
public class ElasticSearchConfig {
@Bean
public RestHighLevelClient restHighLevelClient() {
RestHighLevelClient restHighLevelClient = new RestHighLevelClient(
RestClient.builder(new HttpHost("127.0.0.1",9200,"http")));
return restHighLevelClient;
}
}
3.3 测试API
在test类中注入restHighLevelClient
@Autowired
@Qualifier("restHighLevelClient")
RestHighLevelClient client;
创建索引
@Test
void createIndex() throws IOException {
// 1、创建索引请求
CreateIndexRequest request = new CreateIndexRequest("tes");
// 2、 客户端执行请求
CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
System.out.println(createIndexResponse);
}
判断索引是否存在
@Test
void exsitIndex() throws IOException {
GetIndexRequest request = new GetIndexRequest("tes");
boolean exists = client.indices().exists(request,RequestOptions.DEFAULT);
System.out.println(exists);
}
删除索引
@Test
void deleteIndex() throws IOException {
DeleteIndexRequest request = new DeleteIndexRequest("tes");
AcknowledgedResponse ack = client.indices().delete(request, RequestOptions.DEFAULT);
System.out.println(ack);
}
}
关于文档相关API
- 创建User类
@Data
@AllArgsConstructor
@NoArgsConstructor
public class User {
private String name;
private int age;
}
测试添加文档
@Test
void addDoc() throws IOException {
//创建对象
User user = new User("qwe",11);
// 创建请求
IndexRequest request = new IndexRequest("test_api");
// 创建文档id,不设置会生成随机id
request.id("1");
//request.timeout("60s");
//将数据转成json对象,以及设置传递参数类型
request.source(JSON.toJSonString(user), XContentType.JSON);
IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);
System.out.println(indexResponse.toString());
System.out.println(indexResponse.status());
}
查看文档,是否存在
@Test
void existDoc() throws IOException {
GetRequest request = new GetRequest("test_api", "1");
boolean exists = client.exists(request, RequestOptions.DEFAULT);
System.out.println(exists);
}
@Test
void getDoc() throws IOException {
GetRequest request = new GetRequest("test_api", "1");
GetResponse response = client.get(request, RequestOptions.DEFAULT);
System.out.println(response.getSourceAsString());
}
删除文档
@Test
void docDel() throws IOException {
DeleteRequest deleteRequest = new DeleteRequest("test_api","1");
DeleteResponse response = client.delete(deleteRequest, RequestOptions.DEFAULT);
System.out.println(response.status());
}
批量添加
@Test
void docBluk() throws IOException {
BulkRequest request = new BulkRequest();
ArrayList userArrayList = new ArrayList<>();
userArrayList.add(new User("zhangsan1",1));
userArrayList.add(new User("zhangsan2",2));
userArrayList.add(new User("zhangsan3",3));
userArrayList.add(new User("zhangsan4",4));
for (int i = 0; i < userArrayList.size(); i++) {
request.add(new IndexRequest("test_api")
.id(""+(i+1))
.source(JSON.toJSonString(userArrayList.get(i)),XContentType.JSON));
}
BulkResponse response = client.bulk(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
条件查询
@Test
void docSearch() throws IOException {
SearchRequest request = new SearchRequest("test_api");
//构建查询条件
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 条件查询方法都在QueryBuilders中可以使用
// HighLightBuilder 高亮
// MatchAllQueryBuilder 匹配查找
TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", "zhangsan1");
searchSourceBuilder.query(termQueryBuilder);
searchSourceBuilder.timeout(Timevalue.timevalueSeconds(60));
request.source(searchSourceBuilder);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
System.out.println(JSON.toJSonString(response.getHits()));
System.out.println("=========================");
for (SearchHit hit : response.getHits().getHits()) {
System.out.println(hit.getSourceAsMap());
}
}
4、爬虫
4.1 新建工程,修改pom依赖同上
4.2 添加配置
server.port=9090 spring.thymeleaf.cache=false4.3 导入前段页面资源
4.4 导入jsoup网页解析包ES资料地址:链接:https://pan.baidu.com/s/1PT3jLvCksOhq7kgAKzQm7g 提取码:s824
4.5 创建解析工具类org.jsoup jsoup1.14.3
public class HTMLParseUtil {
public static void main(String[] args) throws IOException {
//1、获取请求 https://search.jd.com/Search?keyword=java
String url = "https://search.jd.com/Search?keyword=java";
//2、解析网页 (Jsoup返回的document就是浏览器的document对象)
document document = Jsoup.parse(new URL(url), 30000);
//3、根据网页标签Id获取指定内容
Element element = document.getElementById("J_goodsList");
//System.out.println(element.html());
// 获取所有li元素
Elements elements = element.getElementsByTag("li");
for (Element el : elements) {
//网站图片加载采用为懒加载方式
//String img = el.getElementsByTag("img").eq(0).attr("src");
String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0).text();
String name = el.getElementsByClass("p-name").eq(0).text();
System.out.println("==============================");
System.out.println(price);
System.out.println(name);
System.out.println(img);
}
}
}
4.6 创建实体类Content封装解析内容运行结果
@Data
@AllArgsConstructor
@NoArgsConstructor
public class Content {
private String name;
private String price;
private String img;
}
4.7 修改解析方法
public class HTMLParseUtil {
public static void main(String[] args) throws IOException {
// //1、获取请求 https://search.jd.com/Search?keyword=java
// String url = "https://search.jd.com/Search?keyword=java";
// //2、解析网页 (Jsoup返回的document就是浏览器的document对象)
// document document = Jsoup.parse(new URL(url), 30000);
// //3、根据网页标签Id获取指定内容
// Element element = document.getElementById("J_goodsList");
// //System.out.println(element.html());
// // 获取所有li元素
// Elements elements = element.getElementsByTag("li");
// for (Element el : elements) {
// //网站图片加载采用为懒加载方式
// //String img = el.getElementsByTag("img").eq(0).attr("src");
// String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
// String price = el.getElementsByClass("p-price").eq(0).text();
// String name = el.getElementsByClass("p-name").eq(0).text();
// System.out.println("==============================");
// System.out.println(price);
// System.out.println(name);
// System.out.println(img);
// }
new HTMLParseUtil().ParseJD("vue").forEach(System.out::println);
}
public List ParseJD(String keyword) throws IOException {
//1、获取请求 https://search.jd.com/Search?keyword=java
String url = "https://search.jd.com/Search?keyword=" + keyword;
//2、解析网页 (Jsoup返回的document就是浏览器的document对象)
document document = Jsoup.parse(new URL(url), 30000);
//3、根据网页标签Id获取指定内容
Element element = document.getElementById("J_goodsList");
//System.out.println(element.html());
ArrayList contents = new ArrayList<>();
// 获取所有li元素
Elements elements = element.getElementsByTag("li");
for (Element el : elements) {
//网站图片加载采用为懒加载方式
//String img = el.getElementsByTag("img").eq(0).attr("src");
String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img");
String price = el.getElementsByClass("p-price").eq(0).text();
String name = el.getElementsByClass("p-name").eq(0).text();
contents.add(new Content(name, price, img));
}
return contents;
}
}
4.7 将结果批量存入es中运行结果
service 层方法编辑
public Boolean parseContent(String keyword) throws IOException {
List contents = new HTMLParseUtil().ParseJD(keyword);
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.timeout("2m");
for (Content content : contents) {
bulkRequest.add(new IndexRequest("jd_goods")
.source(JSON.toJSonString(content),XContentType.JSON));
}
BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
return !response.hasFailures();
}
controller 实现调用
@RestController
public class ContentController {
@Autowired
ContentService contentService;
@GetMapping("/parse/{keyword}")
public Boolean contentAdd(@PathVariable("keyword") String keyword) throws IOException {
return contentService.parseContent(keyword);
}
}
4.8 搜索功能实现网页访问http://localhost:9090/parse/vue ,结果返回true,表示添加成功。
service 层方法编辑
public List
controller 层访问
@GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
public List> searchPage(@PathVariable String keyword,
@PathVariable int pageNo,
@PathVariable int pageSize) throws IOException {
return contentService.searchPage(keyword, pageNo, pageSize);
}
4.8.1 引入vue.min.js 和 axios.min.js
修改前端页面,引入vue并进行绑定
狂神说Java-ES仿京东实战
- 狂神说Java
- 狂神说前端
- 狂神说Linux
- 狂神说大数据
- 狂神聊理财
{{result.price}}
{{result.name}}
店铺: 狂神说Java月成交999笔 评价 3
4.9高亮查询修改搜索方法,将原来查询结果中的name进行es高亮之后标签的替换,在前端解析出来
public List5. 完结,撒花> searchPageHL(String keyword, int pageNo, int pageSize) throws IOException { if (pageNo < 1) { pageNo = 1; } //条件查询 SearchRequest searchRequest = new SearchRequest("jd_goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //分页 searchSourceBuilder.from(pageNo); searchSourceBuilder.size(pageSize); //高亮显示 HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("name"); highlightBuilder.preTags(""); highlightBuilder.postTags(""); highlightBuilder.requireFieldMatch(false);//是否全部高亮 searchSourceBuilder.highlighter(highlightBuilder); //精准查询 TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword); searchSourceBuilder.query(termQueryBuilder); searchSourceBuilder.timeout(Timevalue.timevalueSeconds(10)); //执行搜索 searchRequest.source(searchSourceBuilder); SearchResponse response = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); //封装结果 ArrayList > list = new ArrayList<>(); for (SearchHit hit : response.getHits().getHits()) { Map highlightFieldMap = hit.getHighlightFields(); HighlightField name = highlightFieldMap.get("name"); Map sourceAsMap = hit.getSourceAsMap();//原来的查询结果 if (name != null) { Text[] fragments = name.fragments(); String n_name = ""; for (Text t : fragments) { n_name += t; } sourceAsMap.put("name", n_name); } list.add(hit.getSourceAsMap()); } return list; }
本文学习于B站狂神说elasticsearch教程:传送门
转载注明出处



