lucene-距标题更近的术语赋予更大的权重

我将使用

SpanFirstQuery

，它匹配字段开头附近的字词。由于所有跨度查询都依赖于位置，在对Lucene进行索引时默认启用。

让我们对其进行独立测试：您只需要提供您

SpanTermQuery

可以找到该术语的最大位置即可（在我的示例中为一个）。

SpanTermQuery spanTermQuery = new SpanTermQuery(new Term("title", "lucene"));SpanFirstQuery spanFirstQuery = new SpanFirstQuery(spanTermQuery, 1);

给定您的两个文档，如果您使用进行了分析，则该查询将仅找到标题为“ Lucene：Homepage”的第一个文档

StandardAnalyzer

。

现在，我们可以将上述内容

SpanFirstQuery

与普通的文本查询结合起来，并使第一个仅影响得分。您可以轻松地使用a

BooleanQuery

并将span查询作为应子句放置，如下所示：

Term term = new Term("title", "lucene");TermQuery termQuery = new TermQuery(term);SpanFirstQuery spanFirstQuery = new SpanFirstQuery(new SpanTermQuery(term), 1);BooleanQuery booleanQuery = new BooleanQuery();booleanQuery.add(new BooleanClause(termQuery, BooleanClause.Occur.MUST));booleanQuery.add(new BooleanClause(spanFirstQuery, BooleanClause.Occur.SHOULD));

可能有不同的方法可以达到相同的目的，可能使用

CustomScoreQuery

过分或自定义代码来实现评分，但是在我看来，这是最简单的方法。

我用于测试的代码将打印以下输出（包括分数），该输出

TermQuery

首先执行，然后执行唯一

SpanFirstQuery

，最后执行合并

BooleanQuery

：

------ TermQuery --------Total hits: 2title: I have a question about lucene - score: 0.26010898title: Lucene: I have a really hard question about it - score: 0.22295055------ SpanFirstQuery --------Total hits: 1title: Lucene: I have a really hard question about it - score: 0.15764984------ BooleanQuery: TermQuery (MUST) + SpanFirstQuery (SHOULD) --------Total hits: 2title: Lucene: I have a really hard question about it - score: 0.26912516title: I have a question about lucene - score: 0.09196242

这是完整的代码：

public static void main(String[] args) throws Exception {        Directory directory = FSDirectory.open(new File("data"));        index(directory);        IndexReader indexReader = DirectoryReader.open(directory);        IndexSearcher indexSearcher = new IndexSearcher(indexReader);        Term term = new Term("title", "lucene");        System.out.println("------ TermQuery --------");        TermQuery termQuery = new TermQuery(term);        search(indexSearcher, termQuery);        System.out.println("------ SpanFirstQuery --------");        SpanFirstQuery spanFirstQuery = new SpanFirstQuery(new SpanTermQuery(term), 1);        search(indexSearcher, spanFirstQuery);        System.out.println("------ BooleanQuery: TermQuery (MUST) + SpanFirstQuery (SHOULD) --------");        BooleanQuery booleanQuery = new BooleanQuery();        booleanQuery.add(new BooleanClause(termQuery, BooleanClause.Occur.MUST));        booleanQuery.add(new BooleanClause(spanFirstQuery, BooleanClause.Occur.SHOULD));        search(indexSearcher, booleanQuery);    }    private static void index(Directory directory) throws Exception {        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_41, new StandardAnalyzer(Version.LUCENE_41));        IndexWriter writer = new IndexWriter(directory, config);        FieldType titleFieldType = new FieldType();        titleFieldType.setIndexOptions(FieldInfo.IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);        titleFieldType.setIndexed(true);        titleFieldType.setStored(true);        document document = new document();        document.add(new Field("title","I have a question about lucene", titleFieldType));        writer.adddocument(document);        document = new document();        document.add(new Field("title","Lucene: I have a really hard question about it", titleFieldType));        writer.adddocument(document);        writer.close();    }    private static void search(IndexSearcher indexSearcher, Query query) throws Exception {        TopDocs topDocs = indexSearcher.search(query, 10);        System.out.println("Total hits: " + topDocs.totalHits);        for (ScoreDoc hit : topDocs.scoreDocs) { document result = indexSearcher.doc(hit.doc); for (IndexableField field : result) {     System.out.println(field.name() + ": " + field.stringValue() +  " - score: " + hit.score); }        }    }

lucene-距标题更近的术语赋予更大的权重

面试问答相关栏目本月热门文章