您可以通过仅将中间存储空间减少到对角线元素来改进@Bill的解决方案:
from numpy.core.umath_tests import inner1dm, n = 1000, 500a = np.random.rand(m, n)b = np.random.rand(n, m)# They all should give the same resultprint np.trace(a.dot(b))print np.sum(a*b.T)print np.sum(inner1d(a, b.T))%timeit np.trace(a.dot(b))10 loops, best of 3: 34.7 ms per loop%timeit np.sum(a*b.T)100 loops, best of 3: 4.85 ms per loop%timeit np.sum(inner1d(a, b.T))1000 loops, best of 3: 1.83 ms per loop
另一种选择是使用
np.einsum并且根本没有显式的中间存储:
# Will print the same as the others:print np.einsum('ij,ji->', a, b)在我的系统上,它的运行速度比使用慢
inner1d,但可能不适用于所有系统,请参见以下问题:
%timeit np.einsum('ij,ji->', a, b)100 loops, best of 3: 1.91 ms per loop


