读取巨大的.csv文件

您正在将所有行读入列表，然后处理该列表。 不要那样做 。

在生成行时对其进行处理。如果需要先过滤数据，请使用生成器函数：

import csvdef getstuff(filename, criterion):    with open(filename, "rb") as csvfile:        datareader = csv.reader(csvfile)        yield next(datareader)  # yield the header row        count = 0        for row in datareader: if row[3] == criterion:     yield row     count += 1 elif count:     # done when having read a consecutive series of rows      return

我还简化了您的过滤器测试；逻辑相同但更简洁。

因为只匹配与条件匹配的单个行序列，所以还可以使用：

import csvfrom itertools import dropwhile, takewhiledef getstuff(filename, criterion):    with open(filename, "rb") as csvfile:        datareader = csv.reader(csvfile)        yield next(datareader)  # yield the header row        # first row, plus any subsequent rows that match, then stop        # reading altogether        # Python 2: use `for row in takewhile(...): yield row` instead        # instead of `yield from takewhile(...)`.        yield from takewhile( lambda r: r[3] == criterion, dropwhile(lambda r: r[3] != criterion, datareader))        return

您现在可以

getstuff()

直接循环。在

getdata()

：

def getdata(filename, criteria):    for criterion in criteria:        for row in getstuff(filename, criterion): yield row

现在直接

getdata()

在您的代码中循环：

for row in getdata(somefilename, sequence_of_criteria):    # process row

现在，您只在内存中保留一行，而不是每个条件存储数千行。

yield

使函数成为生成器函数，这意味着直到开始循环它之前，它不会做任何工作。

读取巨大的.csv文件

面试问答相关栏目本月热门文章