作者:放ch养奶牛 | 来源:互联网 | 2023-09-14 09:49
我使用kmeans对数据进行分类。
然后我发现了更好的k聚类,它使用Elbow方法和轮廓来验证决策。
那么现在我该如何分类数据并绘制dist图表?
您能帮我吗?
这是我的代码。
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.cluster import KMeans
from sklearn import preprocessing
import matplotlib.pyplot as plt
from sklearn.metrics import silhouette_score
%matplotlib inline
df_diabetes = pd.read_csv('diabetes.csv')
#Deletando a coluna "Classe"
df_noclass = df_diabetes.drop('Classe',axis=1)
df_noclass.head()
nomes = df_diabetes_noclass.columns
valores = df_diabetes_noclass.values
escala_min_max = preprocessing.MinmaxScaler()
valores_normalizados = escala_min_max.fit_transform(valores)
df_diabetes_normalizado = pd.DataFrame(valores_normalizados)
df_diabetes_normalizado.columns = nomes
df_diabetes_normalizado.head(5)
sse = {}
for k in range(1,10):
kmeans = KMeans(n_clusters=k,max_iter=1000).fit(data)
df_diabetes_normalizado["clusters"] = kmeans.labels_
sse[k] = kmeans.inertia_
plt.figure(figsize=(14,9))
plt.plot(list(sse.keys()),list(sse.values()))
plt.xlabel("Numero de Clusters")
plt.ylabel("SSE")
plt.show()
X = df_diabetes_normalizado
y = df_diabetes_normalizado
for n_cluster in range(2,11):
kmeans = KMeans(n_clusters=n_cluster).fit(X)
label = kmeans.labels_
sil_coeff = silhouette_score(X,label,metric='euclidean')
print("Para n_clusters={},O Coeficiente de silueta é {}".format(n_cluster,sil_coeff))
我现在需要对数据进行分类,并创建如下图所示的图。
如果要预测新数据属于哪个群集,则需要使用预测方法:
kmeans.predict(newData)
这是predict方法的文档链接:
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans.predict