栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

使用Numpy和Cython加速距离矩阵计算

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

使用Numpy和Cython加速距离矩阵计算

Cython的关键是避免使用Python对象和函数调用,包括对numpy数组进行矢量化操作。这通常意味着手动写出所有循环并一次对单个数组元素进行操作。

这里有一个非常有用的教程,涵盖了将numpy代码转换为Cython并对其进行优化的过程。

这是您距离功能的更优化Cython版本的快速入门:

import numpy as npcimport numpy as npcimport cython# don't use np.sqrt - the sqrt function from the C standard library is much# fasterfrom libc.math cimport sqrt# disable checks that ensure that array indices don't go out of bounds. this is# faster, but you'll get a segfault if you mess up your indexing.@cython.boundscheck(False)# this disables 'wraparound' indexing from the end of the array using negative# indices.@cython.wraparound(False)def dist(double [:, :] A):    # declare C types for as many of our variables as possible. note that we    # don't necessarily need to assign a value to them at declaration time.    cdef:        # Py_ssize_t is just a special platform-specific type for indices        Py_ssize_t nrow = A.shape[0]        Py_ssize_t ncol = A.shape[1]        Py_ssize_t ii, jj, kk        # this line is particularly expensive, since creating a numpy array        # involves unavoidable Python API overhead        np.ndarray[np.float64_t, ndim=2] D = np.zeros((nrow, nrow), np.double)        double tmpss, diff    # another advantage of using Cython rather than broadcasting is that we can    # exploit the symmetry of D by only looping over its upper triangle    for ii in range(nrow):        for jj in range(ii + 1, nrow): # we use tmpss to accumulate the SSD over each pair of rows tmpss = 0 for kk in range(ncol):     diff = A[ii, kk] - A[jj, kk]     tmpss += diff * diff tmpss = sqrt(tmpss) D[ii, jj] = tmpss D[jj, ii] = tmpss  # because D is symmetric    return D

我将其保存在名为的文件中

fastdist.pyx
。我们可以
pyximport
用来简化构建过程:

import pyximportpyximport.install()import fastdistimport numpy as npA = np.random.randn(100, 200)D1 = np.sqrt(np.square(A[np.newaxis,:,:]-A[:,np.newaxis,:]).sum(2))D2 = fastdist.dist(A)print np.allclose(D1, D2)# True

因此,至少有效。让我们使用

%timeit
魔术进行一些基准测试:

%timeit np.sqrt(np.square(A[np.newaxis,:,:]-A[:,np.newaxis,:]).sum(2))# 100 loops, best of 3: 10.6 ms per loop%timeit fastdist.dist(A)# 100 loops, best of 3: 1.21 ms per loop

约9倍的加速比不错,但不能真正改变游戏规则。正如您所说,广播方法的最大问题是构造中间阵列的内存需求。

A2 = np.random.randn(1000, 2000)%timeit fastdist.dist(A2)# 1 loops, best of 3: 1.36 s per loop

我不建议尝试使用广播…

我们可以做的另一件事是使用以下

prange
函数在最外层的循环中将其并行化:

from cython.parallel cimport prange...for ii in prange(nrow, nogil=True, schedule='guided'):...

为了编译并行版本,您需要告诉编译器启用OpenMP。我还没有弄清楚如何使用来执行此操作

pyximport
,但是如果您正在使用
gcc
,则可以像这样手动进行编译:

$ cython fastdist.pyx$ gcc -shared -pthread -fPIC -fwrapv -fopenmp -O3    -Wall -fno-strict-aliasing  -I/usr/include/python2.7 -o fastdist.so fastdist.c

具有并行性,使用8个线程:

%timeit D2 = fastdist.dist_parallel(A2)1 loops, best of 3: 509 ms per loop


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/623727.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号