作者:久居我心入我怀 | 来源:互联网 | 2024-11-02 21:07
通过对收支数据进行聚类分析,研究发现聚类结果的解释和验证是关键步骤。为了确保分群的合理性和有效性,需要结合业务背景和实际需求,灵活选择合适的聚类数量。该研究利用Python中的Pandas和Matplotlib库对数据进行了预处理和可视化,为决策提供了科学依据。
聚类方法仍需要对分群结果进行解读,通过业务合理性来选择分群的数量
import pandas as pd
import matplotlib.pyplot as plt
dataset=pd.read_csv("customers.csv")
dataset.head()CustomerID Genre Age Annual Income (k$) Spending Score (1-100)
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
dataset.describe()CustomerID Age Annual Income (k$) Spending Score (1-100)
count 200.000000 200.000000 200.000000 200.000000
mean 100.500000 38.850000 60.560000 50.200000
std 57.879185 13.969007 26.264721 25.823522
min 1.000000 18.000000 15.000000 1.000000
25% 50.750000 28.750000 41.500000 34.750000
50% 100.500000 36.000000 61.500000 50.000000
75% 150.250000 49.000000 78.000000 73.000000
max 200.000000 70.000000 137.000000 99.000000
X = dataset.iloc[:, [3, 4]].values
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)
plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Standard')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Traditional')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Normal')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Youth')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'TA')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'black', label = 'Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()
import matplotlib.pyplot as pltwcss = []for i in range(1, 11): kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)kmeans.fit(X)wcss.append(kmeans.inertia_) plt.plot(range(1, 11), wcss)plt.title('The Elbow Method')plt.xlabel('Number of clusters')plt.ylabel('WCSS')plt.show()
结论: 5个分群比较好