栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

sparkling(sparkler是什么意思)

sparkling(sparkler是什么意思)

spark_ML_聚类

1,K均值聚类2,高斯混合模型 3, 二分K均值 Bisecting k-means
Mllib支持的聚类模型较少,主要有K均值聚类,高斯混合模型GMM,以及二分的K均值,隐含狄利克雷分布LDA模型等。

1,K均值聚类
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import Clusteringevaluator

# 载入数据
dfdata = spark.read.format("libsvm").load("data/sample_kmeans_data.txt")

# 训练Kmeans模型
kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(dfdata)

# 进行预测
dfpredictions = model.transform(dfdata)

# 评估模型
evaluator = Clusteringevaluator()
silhouette = evaluator.evaluate(dfpredictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))

# 打印中心点
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
    print(center)
Silhouette with squared euclidean distance = 0.9997530305375207
Cluster Centers: 
[9.1 9.1 9.1]
[0.1 0.1 0.1]
2,高斯混合模型
from pyspark.ml.clustering import GaussianMixture

dfdata = spark.read.format("libsvm").load("data/sample_kmeans_data.txt")

gmm = GaussianMixture().setK(2).setSeed(538009335)
model = gmm.fit(dfdata)

print("Gaussians shown as a Dataframe: ")
model.gaussiansDF.show(truncate=True)


aussians shown as a Dataframe: 
+--------------------+--------------------+
|                mean|                 cov|
+--------------------+--------------------+
|[0.10000000000001...|0.006666666666806...|
|[9.09999999999998...|0.006666666666812...|
+--------------------+--------------------+
3, 二分K均值 Bisecting k-means

Bisecting k-means是一种自上而下的层次聚类算法。所有的样本点开始时属于一个cluster,然后不断通过K均值二分裂得到多个cluster。

from pyspark.ml.clustering import BisectingKMeans


dfdata = spark.read.format("libsvm").load("data/sample_kmeans_data.txt")

bkm = BisectingKMeans().setK(2).setSeed(1)
model = bkm.fit(dfdata)

cost = model.computeCost(dfdata)
print("Within Set Sum of Squared Errors = " + str(cost))


print("Cluster Centers: ")
centers = model.clusterCenters()
for center in centers:
    print(center)
    
Within Set Sum of Squared Errors = 0.11999999999994547
Cluster Centers: 
[0.1 0.1 0.1]
[9.1 9.1 9.1]

资源链接下载

sample_kmeans_data.txt

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/772225.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号