不要在内存中排序1000万行。分批拆分:
运行100 100k行排序(使用文件作为迭代器,结合
islice()
或类似地选择批处理)。写出单独的文件放在其他位置。合并排序的文件。这是一个合并生成器,您可以传递100个打开的文件,并按排序的顺序显示行。逐行写入新文件:
import operator
def mergeiter(iterables, *kwargs):
“”“Given a set of sorted iterables, yield the next value in merged orderTakes an optional `key` callable to compare values by."""iterables = [iter(it) for it in iterables]iterables = {i: [next(it), i, it] for i, it in enumerate(iterables)}if 'key' not in kwargs: key = operator.itemgetter(0)else: key = lambda item, key=kwargs['key']: key(item[0])while True: value, i, it = min(iterables.values(), key=key) yield value try: iterables[i][0] = next(it) except StopIteration: del iterables[i] if not iterables: raise



