栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Python

Cluster Analysis with Iris Dataset

Python 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Cluster Analysis with Iris Dataset

Data Science Day 19:

In Supervised Learning, we specify the possible categorical values and train the models for pattern recognition.  However, *what if we don’t have the existing classified data model to learn from? *

[caption id=“attachment_1074” align=“alignnone” width=“750”]

Radfotosonn / Pixabay[/caption]

The case we model the data in order to discover the way it clusters, based on certain attributes is Unsupervised Learning.

Clustering Analysis in one of the Unsupervised Techniques, it rather than learning by example, learn by observation.

There are 3 types of clustering methods in general, Partitioning, Hierarchical, and Density-based clustering.

1.Partitioning: n objects is grouped into k ≤ n disjoint clusters.
   Partitioning methods are based on a distance measure, it applies iterative relocation until some distance-based error metric is minimized.

2.Hierarchical: either combining(agglomerative) or splitting(divisive) cluster based on some measure (distance, density or continuity), in a stepwise fashion.

Agglomerative starts with each point in its own cluster and combine them in steps, and divisive starts with the data in one cluster and divide it up

3. The density-based method is based on its density; it measures the cluster “goodness”.

Example with Iris Dataset
  1. Partitioning: K-Means=3
#Iris dataset
iris=datasets.load_iris()
x=iris.data
y=iris.target

#Plotting
fig = plt.figure(1, figsize=(7,7))
ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
ax.scatter(x[:, 3], x[:, 0], x[:, 2],
   c=labels.astype(np.float), edgecolor="k", s=50)
ax.set_xlabel("Petal width")
ax.set_ylabel("Sepal length")
ax.set_zlabel("Petal length")
plt.title("Iris Clustering K Means=3", fontsize=14)
plt.show()

2.   **Hierarchical **

#Hierachy Clustering 
hier=linkage(x,"ward")
max_d=7.08
plt.figure(figsize=(25,10))
plt.title('Iris Hierarchical Clustering Dendrogram')
plt.xlabel('Species')
plt.ylabel('distance')
dendrogram(
    hier,
    truncate_mode='lastp',  
    p=50,    
    leaf_rotation=90.,      
    leaf_font_size=8.,     
)
plt.axhline(y=max_d, c='k')
plt.show()

3. Density-based method DBSCAN

dbscan=DBSCAN()
dbscan.fit(x)
pca=PCA(n_components=2).fit(x)
pca_2d=pca.transform(x)

for i in range(0, pca_2d.shape[0]):
    if dbscan.labels_[i] == 0:
 c1 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='r', marker='+')
    elif dbscan.labels_[i] == 1:
 c2 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='g', marker='o')
    elif dbscan.labels_[i] == -1:
 c3 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='b', marker='*')

plt.legend([c1, c2, c3], ['Cluster 1', 'Cluster 2', 'Noise'])
plt.title('DBSCAN finds 2 clusters and Noise')
plt.show()

Thanks very much to Dr.Rumbaugh’s clustering analysis notes!

Happy studying! 

转载请注明:文章转载自 www.mshxw.com
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号