如果看一下的源代码
np.savetxt,您会看到,虽然有很多代码可以处理参数以及Python 2和Python
3之间的差异,但最终它还是一个简单的遍历行的python循环,其中每行被格式化并写入文件。因此,如果您自己编写,就不会失去任何性能。例如,这是一个精简的函数,它写入紧凑的零:
def savetxt_compact(fname, x, fmt="%.6g", delimiter=','): with open(fname, 'w') as fh: for row in x: line = delimiter.join("0" if value == 0 else fmt % value for value in row) fh.write(line + 'n')例如:
In [70]: xOut[70]: array([[ 0. , 0. , 0. , 0. , 1.2345 ], [ 0. , 9.87654321, 0. , 0. , 0. ], [ 0. , 3.14159265, 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ]])In [71]: savetxt_compact('foo.csv', x, fmt='%.4f')In [72]: !cat foo.csv0,0,0,0,1.23450,9.8765,0,0,00,3.1416,0,0,00,0,0,0,00,0,0,0,00,0,0,0,0然后,只要编写自己的
savetxt函数,就可以使其处理稀疏矩阵,因此不必在保存之前将其转换为(密集)numpy数组。(我假设稀疏数组是使用from中的稀疏表示形式实现的
scipy.sparse。)在以下函数中,唯一的变化是from
...for value in row到
... for value in row.A[0]。
def savetxt_sparse_compact(fname, x, fmt="%.6g", delimiter=','): with open(fname, 'w') as fh: for row in x: line = delimiter.join("0" if value == 0 else fmt % value for value in row.A[0]) fh.write(line + 'n')例:
In [112]: aOut[112]: <6x5 sparse matrix of type '<type 'numpy.float64'>' with 3 stored elements in Compressed Sparse Row format>In [113]: a.AOut[113]: array([[ 0. , 0. , 0. , 0. , 1.2345 ], [ 0. , 9.87654321, 0. , 0. , 0. ], [ 0. , 3.14159265, 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0. ]])In [114]: savetxt_sparse_compact('foo.csv', a, fmt='%.4f')In [115]: !cat foo.csv0,0,0,0,1.23450,9.8765,0,0,00,3.1416,0,0,00,0,0,0,00,0,0,0,00,0,0,0,0


