首先,这就是我将如何生成
cnt您要执行的操作(以减少内存开销)
def findWords(filepath): with open(filepath) as infile: for line in infile: words = re.findall('w+', line.lower()) yield from wordscnt = collections.Counter(findWords('02.2003.BenBernanke.txt'))现在,关于您的短语问题:
from itertools import teephrases = {'central bank', 'high inflation'}fw1, fw2 = tee(findWords('02.2003.BenBernanke.txt')) next(fw2)for w1,w2 in zip(fw1, fw2)): phrase = ' '.join([w1, w2]) if phrase in phrases: cnt[phrase] += 1希望这可以帮助



