Elasticsearch 基础_大数据系统

Elasticsearch 基础 Elasticsearch 的安装

linux 下安装 Elasticsearch：

解压

tar -xvf elasticsearch-7.3.0-linux-x86_64.tar.gz

修改配置文件

vim /app/elasticsearch-7.3.0/config/elasticsearch.yml

node.name: node-1
# 要设置成阿里云的私有ip
network.host: 127.0.0.1
http.port: 9200
cluster.initial_master_nodes: ["node-1"]

需要设置阿里云的私有ip：fix：org.elasticsearch.bootstrap.StartupException: BindTransportException[Failed to bind to [9300-9400]]; nested: BindException[Cannot assign requested address];

安全组配置开发端口 9200 入方向。

添加用户（Elasticsearch 不允许用 root 用户启动）

useradd zhangsan
passwd zhangsan
123456

fix：org.elasticsearch.bootstrap.StartupException: java.lang.RuntimeException: can not run elasticsearch as root

更改 Elasticserch 目录的拥有者

chown -R zhangsan /app/elasticsearch-7.3.0

修改 /etc/sysctl.conf

vim /etc/sysctl.conf

# 末尾添加：vm.max_map_count=655360
# 执行sysctl -p 让其生效
sysctl -p

fix：max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

启动

# 如果你想把 Elasticsearch 作为一个守护进程在后台运行，那么可以在后面添加参数 -d 
./elasticsearch -d

# 强制停止
# ps -ef | grep elasticsearch
# kill -9 进程号

kibana 的安装

linux 下安装 kibana：

解压

tar -xvf kibana-7.3.0-linux-x86_64.tar.gz

修改配置文件

vim /app/kibana-7.3.0-linux-x86_64/config/kibana.yml

server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://127.0.0.1:9200"]

安全组配置开发端口 5601 入方向。

更改 kibana 目录的拥有者

chown -R zhangsan /app/kibana-7.3.0-linux-x86_64/

启动

# 后台启动
nohup ./kibana &

# 直接启动
./kibana

# 强制停止
netstat -tunlp | grep 5601
# kill -9 进程号

为 Elasticsearch 和 kibana 设置密码

在 elasticsearch.yml 添加配置

xpack.security.enabled: true
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: true

执行命令默认需要设置 6 个账号的密码

elastic；kibana；logstash_system；beats_system；apm_system；remote_monitoring_user

./elasticsearch-setup-passwords interactive

在 kibana.yml 修改配置

elasticsearch.username: "elastic"
elasticsearch.password: "123456"

ElasticSearch 基本操作索引

一个索引就是一个拥有几分相似特征的文档集合。一个索引由一个名字来标识（必须全部都是小写字母）。

创建

PUT /users

number_of_shards：分片数

number_of_replicas：副本数

PUT /products
{
  "settings": {
    "number_of_shards": 1, 
    "number_of_replicas": 0
  }
}

查询查询单个索引

GET /users

查询多个索引

GET /users,products

查询所有索引

GET /_cat/indices?v

判断索引是否存在

HEAD /users

修改关闭索引

POST /users/_open

打开索引

POST /users/_open

删除删除单个索引

DELETe /users

删除多个索引

DELETE /users,products

映射

定义一个文档和它所包含的字段如何被存储和索引的过程。

基本类型

字符串：keyword 和 text数字：integer 和 long小数：float 和 double布尔：boolean日期：date

创建索引存在时

PUT /users/_mapping
{
  "properties": {
    "name": {
      "type": "keyword"
    },
    "address": {
      "type": "text"
    }
  }
}

索引不存在时

PUT /products
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
      "title": {
        "type": "keyword"
      },
      "price": {
        "type": "double"
      },
      "description": {
        "type": "text"
      }
    }
  }
}

查询查询单个映射

GET /products/_mapping

查询所有映射

GET /_mapping

修改

修改映射只能增加字段；若需要替换则需要先删除索引后再创建。

PUT /products/_mapping
{
  "properties": {
    "create_date": {
      "type": "date"
    }
  }
}

文档

文档是索引中的一条一条数据。一个文档是一个可被索引的最小单元。

创建指定 id

POST /products/_doc/1
{
  "id":1,
  "title":"小浣熊",
  "price":1.5,
  "description":"小浣熊真好吃"
}

自动生成 id

POST /products/_doc
{
  "id":2,
  "title":"辣条",
  "price":0.5,
  "description":"辣条真辣"
}

返回的结果中："_id" : “iiraon8BaSTE-aoRlJbb”。

基于 id 查询

GET /products/_doc/iiraon8BaSTE-aoRlJbb

修改覆盖原有文档

POST /products/_doc/1
{
  "title":"辣条"
}

指定字段更新

POST /products/_update/1
{
  "doc": {
    "title": "辣条",
    "description": "辣条真好吃"
  }
}

基于 id 删除

DELETE /products/_doc/iiraon8BaSTE-aoRlJbb

bulk 批量操作文档

每个json不能换行。相邻json必须换行。

每个操作互不影响。操作失败的行会返回其失败信息。

批量新增

POST /products/_bulk
{"index":{"_id":2}}
{"title":"小浣熊","price":1,"description":"小浣熊真好吃"}
{"index":{"_id":3}}
{"title":"薯片","price":5,"description":"薯片好吃"}
{"index":{"_id":4}}
{"title":"小瓜子","price":8,"description":"小瓜子真好吃"}

复合操作

新增 id=6 的文档；删除 id=3 的文档；修改 id=4 的文档的 title=瓜子

POST /products/_bulk
{"index":{"_id":6}}
{"title":"小浣熊","price":1,"description":"小浣熊真好吃"}
{"delete":{"_id":3}}
{"update":{"_id":4}}
{"doc":{"title":"瓜子"}}

Query DSL 常见查询查询所有 [match_all]

GET /products/_search
{
  "query": {
    "match_all": {}
  }
}

关键词查询 [term]

对于 keyword 类型的字符串需要全词条精确匹配

对于 text 类型的字符串会根据分词出来的词条匹配

在 ES 中除了 text 类型其余类型都不会分词

在 ES 中默认使用的标准分词器会将每个中文单字认为是一个词条

GET /products/_search
{
  "query": {
    "term": {
      "title": {
        "value": "薯片"
      }
    }
  }
}

范围查询 [range]

GET /products/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 0,
        "lte": 10
      }
    }
  }
}

前缀查询 [prefix]

GET /products/_search
{
  "query": {
    "prefix": {
      "title": {
        "value": "小"
      }
    }
  }
}

通配符查询 [wildcard]

代表任意字符；? 代表一个字符

GET /products/_search
{
  "query": {
    "wildcard": {
      "title": {
        "value": "小*"
      }
    }
  }
}

多 id 查询 [ids]

GET /products/_search
{
  "query": {
    "ids": {
      "values": [1,2,3]
    }
  }
}

模糊查询 [fuzzy]

默认情况下：

搜索关键字长度为 2 不允许存在模糊

搜索关键字长度为 3-5 允许1个词条的模糊

搜索关键字长度大于 5 允许2个词条的模糊

可以设置 fuzziness 来指定可以模糊多少个词条

GET /products/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value": "薯片",
        "fuzziness": 2
      }
    }
  }
}

正则查询 [regexp]

GET /products/_search
{
  "query": {
    "regexp": {
      "title": "小.*"
    }
  }
}

布尔查询 [bool]

must 必须满足

must_not 必须不满足

should 或

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {"ids": {"values": [1,2,3]}},
        {"term": {"title": {"value": "小浣熊"}}}
      ]
    }
  }
}

GET /products/_search
{
  "query": {
    "bool": {
      "should": [
        {"ids": {"values": [1,2,3]}},
        {"term": {"title": {"value": "小瓜子"}}}
      ]
    }
  }
}

匹配查询 [match]

将要查询的字段进行分词后去 field 中进行匹配

GET /products/_search
{
  "query": {
    "match": {
      "description": "薯片"
    }
  }
}

多字段查询 [multi_match]

将 query 后的字符串进行分词后去每一个 fields 中进行匹配

GET /products/_search
{
  "query": {
    "multi_match": {
      "query": "小辣",
      "fields": ["title","description"]
    }
  }
}

默认字段分词查询 [query_string]

无需指定某字段对文档进行匹配查询。也可以指定在那些字段上进行匹配

GET /products/_search
{
  "query": {
    "query_string": {
      "fields": ["description","title"], 
      "query": "薯片"
    }
  }
}

高亮 [highlight]

所有匹配字段高亮 “fields”: {"*":{}}

GET /products/_search
{
  "query": {
    "term": {
      "description": {
        "value": "薯"
      }
    }
  },
  "highlight": {
    "fields": {"description": {}},
    "pre_tags": "",
    "post_tags": ""
  }
}

返回指定条数 [size]

ES 默认返回 10 条

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "size": 2
}

分页查询 [from]

from 从 0 开始

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "size": 2,
  "from": 1
}

指定字段排序 [sort]

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

指定字段返回 [_source]

GET /products/_search
{
  "query": {
    "match_all": {}
  },
  "_source": ["title","description"]
}

Filter DSL

ES 中的查询分两种：query 和 filter。

query 默认会计算每个返回文档的得分，然后根据得分排序。

filter 只会筛选出符合的文档，并不计算得分，而且可以缓存文档。从性能考虑，filter 更快。

关键词过滤 [term]

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "term": {
          "title": "薯片"
        }
      }
    }
  }
}

多关键词过滤 [terms]

没每个关键词之间是或关系

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "terms": {
          "title": [
            "薯片",
            "辣条"
          ]
        }
      }
    }
  }
}

范围过滤 [range]

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "range": {
          "price": {
            "gte": 0,
            "lte": 5
          }
        }
      }
    }
  }
}

多 id 过滤 [ids]

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "ids": {
          "values": [
            1,2,3
          ]
        }
      }
    }
  }
}

不为空搜索 [exists]

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        {"match_all": {}}
      ],
      "filter": {
        "exists": {
          "field": "title"
        }
      }
    }
  }
}

聚合查询根据某个字段分组统计数量 [aggs]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 1
          },
          {
            "from": 1,
            "to": 5
          },
          {
            "from": 5,
            "to": 10
          }
        ]
      }
    }
  }
}

实现 group by 后的 having

先按照 price 进行分组；然后查询每一组的平均值；最后查询每一组的平均值 >=1 的文档

GET /products/_search
{
  "size": 0,
  "aggs": {
    "price_group": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 0,
            "to": 1
          },
          {
            "from": 1,
            "to": 5
          },
          {
            "from": 5,
            "to": 10
          }
        ]
      },
      "aggs": {
        "custom_params": {
          "avg": {
            "field": "price"
          }
        },
        "having": {
          "bucket_selector": {
            "buckets_path": {
              "custom_avg_price": "custom_params"
            },
            "script": {
              "source": "params.custom_avg_price >=1"
            }
          }
        }
      }
    }
  }
}

求最大值 [max]

GET /products/_search
{
  "aggs": {
    "custom_params": {
      "max": {
        "field": "price"
      }
    }
  }
}

求最小值 [min]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "min": {
        "field": "price"
      }
    }
  }
}

求和 [sum]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "sum": {
        "field": "price"
      }
    }
  }
}

求平均值 [avg]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "avg": {
        "field": "price"
      }
    }
  }
}

求总数 [value_count]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "value_count": {
        "field": "price"
      }
    }
  }
}

同时查询(max,min,sum,avg,count) [stats]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "stats": {
        "field": "price"
      }
    }
  }
}

去重 [cardinality]

GET /products/_search
{
  "size": 0, 
  "aggs": {
    "custom_params": {
      "cardinality": {
        "field": "price"
      }
    }
  }
}

分词器内置分词器

POST /_analyze
{
  "analyzer": "...",
  "text":"汉语 is an Ancient Language !"
}

analyzer	备注
standard	默认分词器。英文按单词分词。中文按单字分词。大写转小写。去除符号。
simple	英文按单词分词。中文按空格分词。大写转小写。去除符号。
whitespace	英文和中文按空格分词。大小写不转换。去除符号。
stop	英文按单词分词。中文按空格分词。大写转小写。去除符号。去除is an a the等等停用词。
keyword	不分词。

IK 分词器安装

./elasticsearch-plugin

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.3.0/elasticsearch-analysis-ik-7.3.0.zip

简单使用

ik_max_word

最细粒度拆分

分词后：【薯】【片】【真好】【好吃】

POST _analyze
{
  "analyzer": "ik_max_word",
  "text": "薯片真好吃"
}

ik_smart

最粗粒度拆分

分词后：【薯】【片】【真】【好吃】

POST _analyze
{
  "analyzer": "ik_smart",
  "text": "薯片真好吃"
}

扩展词

vim custom_ext_word.dic

添加【薯片】到该文件

停用词

vim custom_stop_word.dic

添加【真】到该文件

/app/elasticsearch-7.3.0/config/analysis-ik

vim IKAnalyzer.cfg.xml

将自定义的文件配置到该 xml 文件中

custom_ext_word.dic

custom_stop_word.dic

ik_max_word 分词后：【薯片】【真好】【好吃】

ik_smart 分词后：【薯片】【好吃】

同义词

vim custom_synonym_word.dic

添加【薯片,马铃薯片】到该文件

扩展词和停用词是在索引的时候使用，而同义词是检索时候使用

所以需要在检索的时候才能看出效果

PUT /snacks
{
  "settings": {
    "analysis": {
      "filter": {
        "word_sync": {
          "type": "synonym",
          "synonyms_path": "analysis-ik/custom_synonym_word.dic"
        }
      },
      "analyzer": {
        "ik_sync_max_word": {
          "filter": [
            "word_sync"
          ],
          "type": "custom",
          "tokenizer": "ik_max_word"
        },
        "ik_sync_smart": {
          "filter": [
            "word_sync"
          ],
          "type": "custom",
          "tokenizer": "ik_smart"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "ik_sync_max_word"
      }
    }
  }
}

GET /snacks/_search
{
  "query": {
    "match": {
      "name": "马铃薯片"
    }
  }
}

Elasticsearch 基础

大数据系统相关栏目本月热门文章