elasticsearch基础_大数据系统

1.核心概念

索引
类似的数据放在一个索引，非类似的数据放不同索引，一个索引也可以理解成一个关系型数据
库。
类型
代表document属于index中的哪个类别（type）也有一种说法一种type就像是数据库的表，
比如dept表，user表。
注意ES每个大版本之间区别很大：
ES 5.x中一个index可以有多种type。
ES 6.x中一个index只能有一种type。
ES 7.x以后要逐渐移除type这个概念。
映射
mapping定义了每个字段的类型等信息。相当于关系型数据库中的表结构。
常用数据类型：text、keyword、number、array、range、boolean、date、geo_point、ip、nested、object

2.基础操作

Elasticsearch采用Rest风格API，因此其API就是一次http请求，你可以用任何工具发起http请求

2.1 索引操作 2.1.1 创建索引

PUT /索引名称 
{ "settings": { "属性名": "属性值" } }

2.1.2 判断索引是否存在

HEAD /索引名称

2.1.3 查看索引

GET /索引名称

2.1.4 批量查看索引

GET /索引名称1,索引名称2,索引名称3,...

2.1.5 关闭索引

POST /索引名称/_close

2.1.6 打开索引

POST /索引名称/_open

2.1.7 删除索引

DELETe /索引名称

2.2 映射操作 2.2.1 创建映射字段

PUT /索引库名/_mapping 
{
	"properties": {
		"字段名": {
			"type": "类型",
			"index": true， "store": true， "analyzer": "分词器"
		}
	}
}

字段名：任意填写，下面指定许多属性，例如：

type：类型，可以是text、long、short、date、integer、object等
index：是否索引，默认为true
store：是否存储，默认为false
analyzer：指定分词器

2.2.2 查看映射字段

GET /索引名称/_mapping

2.2.3 一次性创建索引和映射字段

put /索引库名称 
{
	"settings": {
		"索引库属性名": "索引库属性值"
	},
	"mappings": {
		"properties": {
			"字段名": {
				"映射属性名": "映射属性值"
			}
		}
	}
}

2.3 文档操作 2.3.1 新增文档（手动指定id）

POST /lagou-company-index/_doc/1 
{
	"name": "百度",
	"job": "小度用户运营经理",
	"payment": "30000",
	"logo": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbU AABJB7x9sm8374.png"
}

2.3.2 新增文档（自动生成id）

POST /索引名称/_doc
{ "field":"value" }

2.3.3 查看单个文档

GET /索引名称/_doc/{id}

2.3.4 查看所有文档

POST /索引名称/_search 
{
	"query": {
		"match_all": {}
	}
}

2.3.5 _source定制返回结果

GET /lagou-company-index/_doc/1?_source=name

2.3.6 更新文档（全部）

PUT /索引/_doc/{id}
{
	"name": "百度",
	"job": "小度用户运营经理",
	"payment": "30000",
	"logo": "http://www.lgstatic.com/thubnail_120x120/i/image/M00/21/3E/CgpFT1kVdzeAJNbU AABJB7x9sm8374.png"
}

2.3.7 更新文档（局部）

POST /索引名/_update/{id} 
{ 
	"doc":{ 
		"field":"value" 
	} 
}

2.3.8 删除文档

DELETE /索引名/_doc/{id}

query DSL

Elasticsearch提供了基于JSON的完整查询DSL（Domain Specific Language 特定域的语言）来定义查
询。将查询DSL视为查询的AST（抽象语法树），它由两种子句组成：

叶子查询子句
叶子查询子句在特定域中寻找特定的值，如 match，term或 range查询。
复合查询子句
复合查询子句包装其他叶子查询或复合查询，并用于以逻辑方式组合多个查询（例如 bool或dis_max查询），或更改其行为（例如 constant_score查询）。

查询所有(match_all query)

POST /lagou-company-index/_search 
{ 
	"query":{ "
		match_all": {} 
	} 
}

query : 代表查询对象
match_all : 代表查询所有

全文检索

全文搜索能够搜索已分析的文本字段，如电子邮件正文，商品描述等

匹配搜索

全文查询的标准查询，它可以对一个字段进行模糊、短语查询。 match queries 接收text/numerics/dates, 对它们进行分词分析, 再组织成一个boolean查询。可通过operator 指定bool组合操作（or、and 默认是 or ）。

or 关系
match 类型查询，会把查询条件进行分词，然后进行查询,多个词条之间是or的关系

POST /lagou-property/_search 
{ 
	"query":{ 
		"match":{ "title":"小米电视4A" } 
	} 
}

and 关系
某些情况下，我们需要更精确查找，我们希望这个关系变成 and ，可以这样做：

POST /lagou-property/_search 
{
	"query": {
		"match": {
			"title": {
				"query": "小米电视4A",
				"operator": "and"
			}
		}
	}
}

短语搜索(match phrase query)

GET /lagou-property/_search 
{
	"query": {
		"match_phrase": {
			"title": "小米电视"
		}
	}
}
GET /lagou-property/_search 
{
	"query": {
		"match_phrase": {
			"title": "小米 4A"
		}
	}
}
GET /lagou-property/_search 
{
	"query": {
		"match_phrase": {
			"title": {
				"query": "小米 4A",
				"slop": 2
			}
		}
	}
}

query_string 查询

Query String Query提供了无需指定某字段而对文档全文进行匹配查询的一个高级查询,同时可以指定在哪些字段上进行匹配。

# 默认 和 指定字段 
GET /lagou-property/_search 
{
	"query": {
		"query_string": {
			"query": "2699"
		}
	}
}
GET /lagou-property/_search 
{
	"query": {
		"query_string": {
			"query": "2699",
			"default_field": "title"
		}
	}
}
# 逻辑查询 
GET /lagou-property/_search 
{
	"query": {
		"query_string": {
			"query": "手机 OR 小米",
			"default_field": "title"
		}
	}
}
GET /lagou-property/_search 
{
	"query": {
		"query_string": {
			"query": "手机 AND 小米",
			"default_field": "title"
		}
	}
}
# 模糊查询 
GET /lagou-property/_search 
{
	"query": {
		"query_string": {
			"query": "大米~1",
			"default_field": "title"
		}
	}
}
# 多字段支持 
GET /lagou-property/_search 
{
	"query": {
		"query_string": {
			"query": "2699",
			"fields": ["title", "price"]
		}
	}
}

词条级搜索

可以使用term-level queries根据结构化数据中的精确值查找文档。结构化数据的值包括日期范围、IP地址、价格或产品ID。
与全文查询不同，term-level queries不分析搜索词。相反，词条与存储在字段级别中的术语完全匹配。

词条搜索

term 查询用于查询指定字段包含某个词项的文档

POST /book/_search 
{
	"query": {
		"term": {
			"name": "solr"
		}
	}
}

词条集合搜索(terms query)

GET /book/_search 
{
	"query": {
		"terms": {
			"name": ["solr", "elasticsearch"]
		}
	}
}

范围搜索

gte：大于等于
gt：大于
lte：小于等于
lt：小于
boost：查询权重

GET /book/_search 
{
	"query": {
		"range": {
			"price": {
				"gte": 10,
				"lte": 200,
				"boost": 2.0
			}
		}
	}
}
GET /book/_search 
{
	"query": {
		"range": {
			"timestamp": {
				"gte": "now-2d/d",
				"lt": "now/d"
			}
		}
	}
}
GET book/_search 
{
	"query": {
		"range": {
			"timestamp": {
				"gte": "18/08/2020",
				"lte": "2021",
				"format": "dd/MM/yyyy||yyyy"
			}
		}
	}
}

不为空搜索（exists query）

查询指定字段值不为空的文档。相当 SQL 中的 column is not null

GET /book/_search 
{
	"query": {
		"exists": {
			"field": "price"
		}
	}
}

词项前缀搜索（prefix query）

GET /book/_search 
{
	"query": {
		"prefix": {
			"name": "so"
		}
	}
}

通配符搜索(wildcard query)

GET /book/_search 
{
	"query": {
		"wildcard": {
			"name": "so*r"
		}
	}
}
GET /book/_search 
{
	"query": {
		"wildcard": {
			"name": {
				"value": "lu*",
				"boost": 2
			}
		}
	}
}

正则搜索（regexp query）

regexp允许使用正则表达式进行term查询.注意regexp如果使用不正确，会给服务器带来很严重的性能压力。比如.*开头的查询，将会匹配所有的倒排索引中的关键字，这几乎相当于全表扫描，会很慢。因此如果可以的话，最好在使用正则前，加上匹配的前缀。

GET /book/_search 
{
	"query": {
		"regexp": {
			"name": "s.*"
		}
	}
}
GET /book/_search 
{
	"query": {
		"regexp": {
			"name": {
				"value": "s.*",
				"boost": 1.2
			}
		}
	}
}

模糊搜索（fuzzy query）

GET /book/_search 
{
	"query": {
		"fuzzy": {
			"name": "so"
		}
	}
}
GET /book/_search 
{
	"query": {
		"fuzzy": {
			"name": {
				"value": "so",
				"boost": 1.0,
				"fuzziness": 2
			}
		}
	}
}
GET /book/_search 
{
	"query": {
		"fuzzy": {
			"name": {
				"value": "sorl",
				"boost": 1.0,
				"fuzziness": 2
			}
		}
	}
}

ids搜索(id集合查询)

GET /book/_search 
{
	"query": {
		"ids": {
			"type": "_doc",
			"values": ["1", "3"]
		}
	}
}

复合搜索 1) constant_score query

用来包装另一个查询，将查询匹配的文档的评分设为一个常值

GET / book / _search 
{
	"query": {
		"term": {
			"description": "solr"
		}
	}
}
GET / book / _search 
{
	"query": {
		"constant_score": {
			"filter": {
				"term": {
					"description": "solr"
				}
			},
			"boost": 1.2
		}
	}
}

2）布尔搜索

bool 查询用bool操作来组合多个查询字句为一个查询。可用的关键字：

must：必须满足
filter：必须满足，但执行的是filter上下文，不参与、不影响评分
should：或
must_not：必须不满足，在filter上下文中执行，不参与、不影响评分

POST / book / _search 
{
	"query": {
		"bool": {
			"must": {
				"match": {
					"description": "java"
				}
			},
			"filter": {
				"term": {
					"name": "solr"
				}
			},
			"must_not": {
				"range": {
					"price": {
						"gte": 200,
						"lte": 300
					}
				}
			},
			"minimum_should_match": 1,
			"boost": 1.0
		}
	}
}

minimum_should_match代表了最小匹配精度，如果设置minimum_should_match=1，那么should语句中至少需要有一个条件满足。

排序相关性评分排序

默认情况下，返回的结果是按照相关性进行排序的——最相关的文档排在最前。为了按照相关性来排序，需要将相关性表示为一个数值。在Elasticsearch 中，相关性得分由一个浮点数进行表示，并在搜索结果中通过 _score 参数返回，默认排序是 _score 降序，按照相
关性评分升序排序如下

POST /book/_search 
{
	"query": {
		"match": {
			"description": "solr"
		}
	}
}
POST /book/_search 
{
	"query": {
		"match": {
			"description": "solr"
		}
	},
	"sort": [{
		"_score": {
			"order": "asc"
		}
	}]
}

字段值排序

POST / book / _search 
{
	"query": {
		"match_all": {}
	},
	"sort": [{
		"price": {
			"order": "desc"
		}
	}]
}

多级排序

假定我们想要结合使用 price和 _score（得分）进行查询，并且匹配的结果首先按照价格排序，然后按照相关性得分排序：

POST / book / _search 
{
	"query": {
		"match_all": {}
	},
	"sort": [{
		"price": {
			"order": "desc"
		}
	}, {
		"timestamp": {
			"order": "desc"
		}
	}]
}

分页

POST / book / _search 
{
	"query": {
		"match_all": {}
	},
	"size": 2,
	"from": 0
}
POST / book / _search 
{
	"query": {
		"match_all": {}
	},
	"sort": [{
		"price": {
			"order": "desc"
		}
	}],
	"size": 2,
	"from": 2
}

size:每页显示多少条
from:当前页起始索引, int start = (pageNum - 1) * size

高亮

POST / book / _search 
{
	"query": {
		"match": {
			"name": "elasticsearch"
		}
	},
	"highlight": {
		"pre_tags": "",
		"post_tags": "",
		"fields": [{
			"name": {}
		}]
	}
}
POST / book / _search 
{
	"query": {
		"match": {
			"name": "elasticsearch"
		}
	},
	"highlight": {
		"pre_tags": "",
		"post_tags": "",
		"fields": [{
			"name": {}
		}, {
			"description": {}
		}]
	}
}
POST / book / _search 
{
	"query": {
		"query_string": {
			"query": "elasticsearch"
		}
	},
	"highlight": {
		"pre_tags": "",
		"post_tags": "",
		"fields": [{
			"name": {}
		}, {
			"description": {}
		}]
	}
}

elasticsearch基础

大数据系统相关栏目本月热门文章