栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Elasticsearch进阶知识

Elasticsearch进阶知识

文章目录

聚合查询 - aggs

数据准备求和查询 - sum平均值查询 - avg去重统计 - cardinality多条件聚合查询简单统计聚合工具方法 - stats分组统计集合工具方法 - terms顶层命中 - top_hits区间查询 - range 推荐搜索自动补全,前缀搜索 - suggest - prefix高亮显示

聚合查询 - aggs 数据准备

建立索引:

PUT /employee
{
  "mappings": {
    "properties": {
      "id": {
        "type": "integer"
      },
       "name": {
        "type": "keyword"
      },
       "job": {
        "type": "keyword"
      },
       "age": {
        "type": "integer"
      },
       "gender": {
        "type": "keyword"
      }
    }
  }
}

插入文档数据:

POST /employee/_bulk
{"index": {"_id": 1}}
{"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "female"}
{"index": {"_id": 2}}
{"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"}
{"index": {"_id": 3}}
{"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"}
{"index": {"_id": 4}}
{"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"}
{"index": {"_id": 5}}
{"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"}
{"index": {"_id": 6}}
{"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"}
{"index": {"_id": 7}}
{"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"}
{"index": {"_id": 8}}
{"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"}
{"index": {"_id": 9}}
{"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"}
{"index": {"_id": 10}}
{"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"}
{"index": {"_id": 11}}
{"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"}
{"index": {"_id": 12}}
{"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"}
{"index": {"_id": 13}}
{"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"}
{"index": {"_id": 14}}
{"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"}
{"index": {"_id": 15}}
{"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"}
{"index": {"_id": 16}}
{"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
求和查询 - sum

查询 - 员工的工作总和

GET /employee/_search
{
  "size": 0, 
  "aggs": {
    "sum_sal": {
      "sum": {
        "field": "sal"
      }
    }
  }
}

aggs:aggregations(聚合)这个字段是固定值,就和query一样,表示这是一个聚合查询

sum_sal:这个字段是我们自己起的名字,用来表示我们这个聚合查询后的值的名称

sum:固定值,可以理解为函数,es内置了很多函数

size:设置为0是因为聚合查询会附带查询出文档数据,而聚合查询的结果在最下面,方便我们看结果,所以设置0,不显示文档

查询结果:

在最下面 aggregations 中,就是我们查询的结果,sum_sal = 212500

平均值查询 - avg

查询 - 员工的平均工资

GET /employee/_search
{
  "size": 0, 
  "aggs": {
    "avg_sal": {
      "avg": {
        "field": "sal"
      }
    }
  }
}

avg:和mysql的聚合函数相同,取平均值

查询过于简单,结果不再贴图。。。

去重统计 - cardinality

查询 - 一共有多少岗位

GET /employee/_search
{
  "size": 0, 
  "aggs": {
    "cardi_job": {
      "cardinality": {
        "field": "job"
      }
    }
  }
}

cardinality:去重再求和,相当于mysql的 count(distinct)

多条件聚合查询

查询 - kibana提供的样板航班数据中,各个航班的平均机票最大值,最小值,平均值

GET /kibana_sample_data_flights/_search
{
  "size": 0, 
  "aggs": {
    "max_ticket_price": {
      "max": {
        "field": "AvgTicketPrice"
      }
    },
    "min_ticket_price": {
      "min": {
        "field": "AvgTicketPrice"
      }
    },
    "avg_ticket_price": {
      "avg": {
        "field": "AvgTicketPrice"
      }
    }
  }
}

在kibana中可以添加官方给出的三个索引,分别是电商、日志、航班

ps:索引为什么有的是green有的是yellow?http://www.jwsblog.com/archives/59.html

简单统计聚合工具方法 - stats

一个方法,一次性查询出总和、平均、最大、最小

GET /employee/_search
{
  "size": 0, 
  "aggs": {
    "sal_info": {
      "stats": {
        "field": "sal"
      }
    }
  }
}

查询结果:

注意:stats 只能处理数值类型的字段,非数值类型的字段不能使用stats

分组统计集合工具方法 - terms

查询:航班到达国家数量统计(分组统计)相当于mysql的 count(group by)

GET /kibana_sample_data_flights/_search
{
  "size": 0, 
  "aggs": {
    "count_dest_country": {
      "terms": {
        "field": "DestCountry",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

注意:aggs 中的 terms 和 query 中的 terms 有很大的区别

Es聚合之Terms:https://www.cnblogs.com/xing901022/p/4947436.html

嵌套查询1:查询目的地航班次数以及天气统计

GET /kibana_sample_data_flights/_search
{
  "size": 0, 
  "aggs": {
    "count_dest_country": {
      "terms": {
        "field": "DestCountry",
        "order": {
          "_count": "desc"
        }
      },
      "aggs": {
        "weather_count": {
          "terms": {
            "field": "DestWeather"
          }
        }
      }
    }
  }
}

嵌套查询2:查询不同岗位的男女比例以及薪资信息

GET /employee/_search
{
  "size": 0, 
  "aggs": {
    "job_info": {
      "terms": {
        "field": "job"
      },
      "aggs": {
        "gender_info": {
          "terms": {
            "field": "gender"
          },
          "aggs": {
            "sal_info": {
              "stats": {
                "field": "sal"
              }
            }
          }
        }
      }
    }
  }
}
顶层命中 - top_hits

查询:查询员工中年龄最大的2个人

方法1:

GET /employee/_search
{
  "size": 2, 
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

方法2:

GET /employee/_search
{
  "size": 0, 
  "aggs": {
    "top_age": {
      "top_hits": {
        "size": 2,
        "sort": [
          {
            "age": {
              "order": "desc"
            }
          }
        ]
      }
    }
  }
}
区间查询 - range
GET employee/_search
{
  "size": 0,
  "aggs": {
    "sal_info": {
      "range": {
        "field": "sal",
        "ranges": [
          {
            "key": "0 <= sal <= 5000",
            "from": 0,
            "to": 5000
          },
          {
            "key": "5001 <= sal <= 10000",
            "from": 5001,
            "to": 10000
          },
          {
            "key": "10001 <= sal <= 15000",
            "from": 10001,
            "to": 15000
          }
        ]
      }
    }
  }
}
推荐搜索

在搜索过程中,因为单词拼写错误,导致我们没有任何的搜索结果,希望es能够给我们一个推荐搜索

GET /es_jd_goods/_search
{
  "suggest": {
    "title_suggestion": {
      "text": "elasticsearh",
      "term": {
        "field": "name",
        "suggest_mode": "popular"
      }
    }
  }
}

suggest:推荐搜索的固定写法

text:我们输入的原始搜索词

suggest_mode,有三个值:popular、missing、always

    popular 是推荐词频更高的一些搜索。missing 是当没有要搜索的结果的时候才推荐。always无论什么情况下都进行推荐(默认)。

注意,推荐搜索只有在倒排索引中没有词条的时候才会给出建议

自动补全,前缀搜索 - suggest - prefix

自动补全的功能对性能的要求极高,用户每发送输入一个字符就要发送一个请求去查找匹配项。
ES采取了不同的数据结构来实现,并不是通过倒排索引来实现的;

注意:需要将对应的数据类型设置为completion ; 所以在将数据索引进ES之前需要先定义 mapping 信息。

PS:Es中不能直接对mapping进行修改,如果字段的类型需要变动,只能重建索引,然后填充数据,具体操作见下:

# 查看当前索引的mapping,复制
GET /es_jd_goods/_mapping
GET /es_jd_goods/_search

# 创建一个零时的索引,修改对应的字段属性
put /es_jd_temp
{
  "mappings": {
    "properties": {
      "createTime": {
        "type": "long"
      },
      "id": {
        "type": "long"
      },
      "imgUrl": {
        "type": "keyword"
      },
      "modifyTime": {
        "type": "long"
      },
      "name": {
        "type": "text",
        "term_vector": "with_positions_offsets",
        "analyzer": "ik_max_word"
      },
      "price": {
        "type": "double"
      },
      "shopName": {
        "type": "completion"
      },
      "valid": {
        "type": "boolean"
      }
    }
  }
}

# 查看临时索引的mapping,拷贝
GET /es_jd_temp/_mapping
GET /es_jd_temp/_search

# 拷贝数据到零时索引
POST _reindex
{
  "source": {
    "index": "es_jd_goods"
  },
  "dest": {
    "index": "es_jd_temp"
  }
}

# 删除原来的索引
DELETE es_jd_goods

# 重建索引
PUT /es_jd_goods
{
  "mappings": {
    "properties": {
      "createTime": {
        "type": "long"
      },
      "id": {
        "type": "long"
      },
      "imgUrl": {
        "type": "keyword"
      },
      "modifyTime": {
        "type": "long"
      },
      "name": {
        "type": "text",
        "term_vector": "with_positions_offsets",
        "analyzer": "ik_max_word"
      },
      "price": {
        "type": "double"
      },
      "shopName": {
        "type": "completion"
      },
      "valid": {
        "type": "boolean"
      }
    }
  }
}

GET /es_jd_goods/_mapping
GET /es_jd_goods/_search

# 将零时索引中的数据拷贝到正式索引
POST _reindex
{
  "source": {
    "index": "es_jd_temp"
  },
  "dest": {
    "index": "es_jd_goods"
  }
}

# 删除临时索引
DELETE es_jd_temp

非常常用的业务场景,类似百度、京东在搜索的时候,只要打出前几个字,会自动在搜索栏下方提示你可能感兴趣的关键字

GET /es_jd_goods/_search
{
  "_source": [
    "shopName"
  ],
  "suggest": {
    "prefix_suggestion": {
      "prefix": "电子",
      "completion": {
        "field": "shopName",
        "skip_duplicates": true,
        "size": 10
      }
    }
  }
}
高亮显示
GET /es_jd_goods/_search
{
  "query": {
    "multi_match": {
      "query": "设计",
      "fields": ["name","shopName"]
    }
  },
  "highlight": {
    "pre_tags": "",
    "post_tags": "", 
    "fields": {
      "name": {}, 
      "shopName": {
        "pre_tags": "",
        "post_tags": ""
      }
    }
  }
}

highlight:高亮显示,只能作用域text类型的字段

pre_tags:标签前缀

post_tags:标签后缀

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/748006.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号