形如有一个矩阵ashape形如(4,99999)和bshape形如(32,99999),a的每一行去和b的每一行做余弦相似度计算,output一个shape形如(4,32)的余弦相似度矩阵。
a = np.random.random(size = (4,99999)).tolist() b = np.random.random(size = (32,99999)).tolist()1. Scipy
from scipy import spatial
t1 = time.time()
cos_mat_sci = list(map(lambda x:list(map(lambda y:1-spatial.distance.cosine(x,y), b)),a))
t2 = time.time()
print('scipy time cost',t2-t1,'s')
#time cost 1.325453519821167 s
2. Sklearn
time_start=time.time()
cos_mat_sk = list(map(lambda x:list(map(lambda y:cosine_similarity([x,y])[0][1], b)),a))
time_end=time.time()
print('sklearn time cost',time_end-time_start,'s')
#time cost 1.7672739028930664 s
3. Numpy手撸
def cos_sim(a, b):
a_norm = np.linalg.norm(a)
b_norm = np.linalg.norm(b)
cos = np.dot(a,b)/(a_norm * b_norm)
return cos
time_start=time.time()
cos_mat_np = list(map(lambda x:list(map(lambda y:cos_sim(x,y), b)),a))
time_end=time.time()
print('numpy time cost',time_end-time_start,'s')
#time cost 3.1022114753723145 s
4. torch
import torch
import torch.nn.functional as F
a_tf = torch.FloatTensor(a)
b_tf = torch.FloatTensor(b)
time_start=time.time()
cos_mat_torch = list(map(lambda x:list(map(lambda y:F.cosine_similarity(x,y, dim=0), b_tf)),a_tf))
time_end=time.time()
print('torch time cost',time_end-time_start,'s')
# time cost 0.028922557830810547 s
输出结果
scipy time cost 1.325453519821167 s sklearn time cost 1.7672739028930664 s numpy time cost 3.1022114753723145 s torch time cost 0.028922557830810547 s >>> cos_mat_sci[0][:3] [0.7521347867015734, 0.7507867569813598, 0.7505274811256897] >>> cos_mat_sk[0][:3] [0.7521347867015696, 0.7507867569813551, 0.7505274811256867] >>> cos_mat_np[0][:3] [0.7521347867015734, 0.7507867569813594, 0.7505274811256898] >>> cos_mat_torch[0][:3] [tensor(0.7521), tensor(0.7508), tensor(0.7505)]
综合而言,torch用时最短,但在默认情况下其内部计算精度较低;综合而言scipy在相对要求精度的前提下是比较优秀的。np.linalg根据定义计算相对是最耗时的。



