这是我由Adirio学会的巧妙技巧。您可以使用
itertools.product,然后循环计算编辑距离。
from itertools import productdist = np.empty(df.shape[0]**2, dtype=int) for i, x in enumerate(product(df.Name, repeat=2)): dist[i] = editdistance.eval(*x)dist_df = pd.Dataframe(dist.reshape(-1, df.shape[0]))dist_df 0 1 2 3 4 5 6 7 8 9 10 11 12 13 140 0 8 6 4 5 7 5 5 5 6 4 5 6 5 61 8 0 7 7 7 6 8 8 7 8 7 7 8 8 82 6 7 0 3 4 5 5 6 6 6 6 5 5 5 43 4 7 3 0 4 6 5 5 5 6 4 4 6 4 54 5 7 4 4 0 6 5 5 5 6 5 3 5 4 45 7 6 5 6 6 0 6 6 6 7 6 5 7 7 66 5 8 5 5 5 6 0 2 6 6 5 5 3 6 57 5 8 6 5 5 6 2 0 6 6 5 5 4 6 68 5 7 6 5 5 6 6 6 0 1 1 5 5 5 69 6 8 6 6 6 7 6 6 1 0 2 5 6 6 610 4 7 6 4 5 6 5 5 1 2 0 4 5 4 511 5 7 5 4 3 5 5 5 5 5 4 0 4 4 312 6 8 5 6 5 7 3 4 5 6 5 4 0 4 413 5 8 5 4 4 7 6 6 5 6 4 4 4 0 114 6 8 4 5 4 6 5 6 6 6 5 3 4 1 0
np.empty初始化一个空数组,然后在每次调用时将其填充
editdistance.eval。
从senderle
cartesian_product借用,我们可以实现一些速度提升:
def cartesian_product(*arrays): la = len(arrays) dtype = np.result_type(*arrays) arr = np.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(np.ix_(*arrays)): arr[...,i] = a return arr.reshape(-1, la)v = np.apply_along_axis(func1d=lambda x: editdistance.eval(*x), arr=cartesian_product(df.Name, df.Name), axis=1).reshape(-1, df.shape[0])dist_df = pd.Dataframe(v)
另外,您可以定义一个函数来计算编辑距离并将其矢量化:
def f(x, y): return editdistance.eval(x, y)v = np.vectorize(f)arr = cartesian_product(df.Name, df.Name).Tarr = v(arr[0, :], arr[1, :])dist_df = pd.Dataframe(arr.reshape(-1, df.shape[0]))
如果需要带注释的索引和列,则可以在构造时添加它
dist_df:
dist_df = pd.Dataframe(..., index=df.Name, columns=df.Name)dist_dfName John Mrinmayee rituja ritz divya priyanka chetna chetan Name John 0 8 6 4 5 7 5 5 Mrinmayee 8 0 7 7 7 6 8 8 rituja 6 7 0 3 4 5 5 6 ritz 4 7 3 0 4 6 5 5 divya 5 7 4 4 0 6 5 5 priyanka 7 6 5 6 6 0 6 6 chetna 5 8 5 5 5 6 0 2 chetan 5 8 6 5 5 6 2 0 mansi 5 7 6 5 5 6 6 6 mansvi 6 8 6 6 6 7 6 6 mani 4 7 6 4 5 6 5 5 aliya 5 7 5 4 3 5 5 5 shelia 6 8 5 6 5 7 3 4 Dilip 5 8 5 4 4 7 6 6 Dilipa 6 8 4 5 4 6 5 6Name mansi mansvi mani aliya shelia Dilip Dilipa Name John5 6 4 5 6 5 6 Mrinmayee 7 8 7 7 8 8 8 rituja 6 6 6 5 5 5 4 ritz5 6 4 4 6 4 5 divya 5 6 5 3 5 4 4 priyanka 6 7 6 5 7 7 6 chetna 6 6 5 5 3 6 5 chetan 6 6 5 5 4 6 6 mansi 0 1 1 5 5 5 6 mansvi 1 0 2 5 6 6 6 mani1 2 0 4 5 4 5 aliya 5 5 4 0 4 4 3 shelia 5 6 5 4 0 4 4 Dilip 5 6 4 4 4 0 1 Dilipa 6 6 5 3 4 1 0



