您可以直接使用sklearn在稀疏矩阵的行上计算成对的余弦相似度。从0.17版开始,它还支持稀疏输出:
from sklearn.metrics.pairwise import cosine_similarityfrom scipy import sparseA = np.array([[0, 1, 0, 0, 1], [0, 0, 1, 1, 1],[1, 1, 0, 1, 0]])A_sparse = sparse.csr_matrix(A)similarities = cosine_similarity(A_sparse)print('pairwise dense output:n {}n'.format(similarities))#also can output sparse matricessimilarities_sparse = cosine_similarity(A_sparse,dense_output=False)print('pairwise sparse output:n {}n'.format(similarities_sparse))结果:
pairwise dense output:[[ 1. 0.40824829 0.40824829][ 0.40824829 1. 0.33333333][ 0.40824829 0.33333333 1. ]]pairwise sparse output:(0, 1) 0.408248290464(0, 2) 0.408248290464(0, 0) 1.0(1, 0) 0.408248290464(1, 2) 0.333333333333(1, 1) 1.0(2, 1) 0.333333333333(2, 0) 0.408248290464(2, 2) 1.0
如果您希望按列余弦相似,则只需事先转置输入矩阵即可:
A_sparse.transpose()



