这实际上是预期的。如果您使用analytics
api分析文档,则可以更好地了解正在发生的事情。
GET suggest_index/_analyze?text=coding like a master&analyzer=suggests_analyzer
这是输出
{ "tokens": [ { "token": "coding", "start_offset": 0, "end_offset": 6, "type": "word", "position": 1 }, { "token": "coding like", "start_offset": 0, "end_offset": 11, "type": "shingle", "position": 1 }, { "token": "coding like a", "start_offset": 0, "end_offset": 13, "type": "shingle", "position": 1 }, { "token": "like", "start_offset": 7, "end_offset": 11, "type": "word", "position": 2 }, { "token": "like a", "start_offset": 7, "end_offset": 13, "type": "shingle", "position": 2 }, { "token": "like a master", "start_offset": 7, "end_offset": 20, "type": "shingle", "position": 2 }, { "token": "a", "start_offset": 12, "end_offset": 13, "type": "word", "position": 3 }, { "token": "a master", "start_offset": 12, "end_offset": 20, "type": "shingle", "position": 3 }, { "token": "master", "start_offset": 14, "end_offset": 20, "type": "word", "position": 4 } ]}如您所见,为文本生成了一个令牌“编码”,因此它在您的索引中。这 并不是在
建议您不要在索引中。如果您严格地想要短语搜索,那么您可能要考虑使用关键字标记器。例如,如果您将映射更改为类似
{ "settings": { "index": { "analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "lowercase", "filter": [ "lowercase", "asciifolding", "shingle_filter" ], "type": "custom" }, "raw_analyzer": { "tokenizer": "keyword", "filter": [ "lowercase", "asciifolding" ] } }, "filter": { "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 3, "type": "shingle" } } } } }, "mappings": { "my_type": { "properties": { "suggest_field": { "analyzer": "suggests_analyzer", "type": "string", "fields": { "raw": { "analyzer": "raw_analyzer", "type": "string" } } } } } }}那么此查询将为您提供预期的结果
{ "DidYouMean": { "text": "codning lke a master", "phrase": { "field": "suggest_field.raw", "size": 1, "gram_size": 1 } }}它不会显示 “像老板一样编码”的 任何内容。
编辑1
2)从您的评论以及在我自己的数据集上运行一些短语建议中,我觉得更好的方法是使用
collate选项
phrasesuggester提供,以便我们可以针对a检查每个建议
query并仅在返回时才给出建议索引中的任何文档。我还添加
stemmers了映射以仅考虑词根。我正在使用,
light_english因为它的攻击性较小。关于更多。
映射器的分析器部分现在看起来像这样
"analysis": { "analyzer": { "suggests_analyzer": { "tokenizer": "standard", "filter": [ "lowercase", "english_possessive_stemmer", "light_english_stemmer", "asciifolding", "shingle_filter" ], "type": "custom" } }, "filter": { "light_english_stemmer": { "type": "stemmer", "language": "light_english" }, "english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" }, "shingle_filter": { "min_shingle_size": 2, "max_shingle_size": 4, "type": "shingle" } } }现在,此查询将为您提供所需的结果。
{ "suggest" : { "text" : "appel on the tabel", "simple_phrase" : { "phrase" : { "field" : "suggest_field", "size" : 5, "collate": {"query": { "inline" : { "match_phrase": { "{{field_name}}" : "{{suggestion}}" } }},"params": {"field_name" : "suggest_field"}, "prune": false } } } }, "size": 0 }这将使您回到 桌上的苹果。 这里使用
match_phrase查询,它将对索引运行每个建议的短语。
"prune" :true无论匹配如何,您都可以查看并建议所有结果。您可能要考虑使用
stop过滤器来避免停用词。
希望这可以帮助!!



