此解决方案需要预处理您的语料库。但是一旦完成,这将是一个非常快速的字典查找。
from collections import defaultdictfrom stemming.porter2 import stemwith open('/usr/share/dict/words') as f: words = f.read().splitlines()stems = defaultdict(list)for word in words: word_stem = stem(word) stems[word_stem].append(word)if __name__ == '__main__': word = 'leukocyte' word_stem = stem(word) print(stems[word_stem])对于
/usr/share/dict/words语料库,这将产生结果
['leukocyte', "leukocyte's", 'leukocytes']
它使用
stemming可以安装的模块
pip install stemming



