栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

sklearn聚集聚类链接矩阵

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

sklearn聚集聚类链接矩阵

我做了一个步骤,无需修改sklearn和递归函数。使用前请注意:

  • 合并距离有时会相对于子级合并距离减小。我添加了三种处理这些情况的方法:以最大,什么都不做或以l2范数增加。l2规范逻辑尚未验证。请检查自己最适合您的。

导入软件包:

from sklearn.cluster import AgglomerativeClusteringimport numpy as npimport matplotlib.pyplot as pltfrom scipy.cluster.hierarchy import dendrogram

计算重量和距离的功能:

def get_distances(X,model,mode='l2'):    distances = []    weights = []    children=model.children_    dims = (X.shape[1],1)    distCache = {}    weightCache = {}    for childs in children:        c1 = X[childs[0]].reshape(dims)        c2 = X[childs[1]].reshape(dims)        c1Dist = 0        c1W = 1        c2Dist = 0        c2W = 1        if childs[0] in distCache.keys(): c1Dist = distCache[childs[0]] c1W = weightCache[childs[0]]        if childs[1] in distCache.keys(): c2Dist = distCache[childs[1]] c2W = weightCache[childs[1]]        d = np.linalg.norm(c1-c2)        cc = ((c1W*c1)+(c2W*c2))/(c1W+c2W)        X = np.vstack((X,cc.T))        newChild_id = X.shape[0]-1        # How to deal with a higher level cluster merge with lower distance:        if mode=='l2':  # Increase the higher level cluster size suing an l2 norm added_dist = (c1Dist**2+c2Dist**2)**0.5  dNew = (d**2 + added_dist**2)**0.5        elif mode == 'max':  # If the previrous clusters had higher distance, use that one dNew = max(d,c1Dist,c2Dist)        elif mode == 'actual':  # Plot the actual distance. dNew = d        wNew = (c1W + c2W)        distCache[newChild_id] = dNew        weightCache[newChild_id] = wNew        distances.append(dNew)        weights.append( wNew)    return distances, weights

使用2个子群集制作2个群集的样本数据:

# Make 4 distributions, two of which form a bigger clusterX1_1 = np.random.randn(25,2)+[8,1.5]X1_2 = np.random.randn(25,2)+[8,-1.5]X2_1 = np.random.randn(25,2)-[8,3]X2_2 = np.random.randn(25,2)-[8,-3]# Merge the four distributionsX = np.vstack([X1_1,X1_2,X2_1,X2_2])# Plot the clusterscolors = ['r']*25 + ['b']*25 + ['g']*25 + ['y']*25plt.scatter(X[:,0],X[:,1],c=colors)

样本数据:

拟合聚类模型

model = AgglomerativeClustering(n_clusters=2,linkage="ward")model.fit(X)

调用该函数以查找距离,并将其传递给树状图

distance, weight = get_distances(X,model)linkage_matrix = np.column_stack([model.children_, distance, weight]).astype(float)plt.figure(figsize=(20,10))dendrogram(linkage_matrix)plt.show()

Ouput树状图:



转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/646960.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号