上一期的机器学习算法更新到岭回归,然后看完了非监督的聚类算法就进入到深度学习了,机器学习剩下几章没更新,这几天上课时候老师讲到Kmeans算法,于是我就觉得手写一下Kmeans算法,加深对聚类算法的理解。
1. K-means算法原理
K-means算法的思想:
- 首先从数据集中随机选取K个点作为初始中心点。
- 然后分别计算所有点到这K个点的距离,每个点选取距离最小的中心点将他们归成一类。
- 重现计算各个类中所有点的平均值,选出新的中心点
- 再次计算所有点到新的中心的距离,然后归类。
- 循环多次,当中心点变动较小或不变时,算法结束。
2. 算法实现
- 初始化初始点,计算距离的函数,分类的函数
def distance(a,b):dis&#61;np.sqrt(((a[0]-b[0])**2)&#43;((a[1]-b[1])**2))return disdef initCenterpoint(k):return np.random.random(k*2).reshape(k,2)def owner(dot,centerpoint):kclass&#61;0min_distance &#61; np.inffor i in range(len(centerpoint)):d&#61;distance(dot,centerpoint[i])if d<min_distance:kclass&#61;imin_distance&#61;dreturn kclass
def newCenter2(point,new_kclass):k&#61;max(new_kclass)&#43;1newCenter&#61;np.zeros((k,2))point&#61;pd.DataFrame(point,columns&#61;["x","y"])new_kclass&#61;pd.DataFrame(new_kclass,columns&#61;["kcalss"])point2&#61;point.join(new_kclass)point2.columns&#61;["x","y","kclass"]for i in range(k):point3&#61;point2[point2["kclass"]&#61;&#61;i]newCenter[i]&#61;point3[["x","y"]].mean(axis&#61;0)return newCenter
def update_kcalss(point,centerpoint):n&#61;len(point)new_kclass &#61; np.zeros((n))for i,point in enumerate(point):new_kclass[i]&#61;owner(point,centerpoint)new_kclass&#61;new_kclass.astype(int)return new_kclass
def kmeans(point):centerpoint&#61;initCenterpoint(2)kclass&#61;update_kcalss(point,centerpoint)old_kclsss &#61; kclassfor i in range(200):centerpoint&#61;newCenter2(point,kclass)kclass&#61;update_kcalss(point,centerpoint)if np.array_equal(old_kclsss,kclass):print("循环多少次&#xff1a;",i)breakelse:old_kclsss&#61;kclassreturn centerpoint,kclass
c &#61; np.array([[1,2],[1,1],[2,2],[5,5],[-0.10,-2.10],[-0.8,-1.8],[-2.9,-0.9],[-3.1,-2.2],[2,6],[7,10]])
a,b&#61;kmeans(c)
c2&#61;pd.DataFrame(c)
plt.plot(c2[0],c2[1],"or")
print(b)
a1&#61;pd.DataFrame(a)
print(a1)
plt.plot(a1[0],a1[1],"xb")
plt.show()
- 结果
3.全部代码
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
def distance(a,b):dis&#61;np.sqrt(((a[0]-b[0])**2)&#43;((a[1]-b[1])**2))return disdef initCenterpoint(k):return np.random.random(k*2).reshape(k,2)def owner(dot,centerpoint):kclass&#61;0min_distance &#61; np.inffor i in range(len(centerpoint)):d&#61;distance(dot,centerpoint[i])if d<min_distance:kclass&#61;imin_distance&#61;dreturn kclassdef newCenter2(point,new_kclass):k&#61;max(new_kclass)&#43;1newCenter&#61;np.zeros((k,2))point&#61;pd.DataFrame(point,columns&#61;["x","y"])new_kclass&#61;pd.DataFrame(new_kclass,columns&#61;["kcalss"])point2&#61;point.join(new_kclass)point2.columns&#61;["x","y","kclass"]for i in range(k):point3&#61;point2[point2["kclass"]&#61;&#61;i]newCenter[i]&#61;point3[["x","y"]].mean(axis&#61;0)return newCenterdef update_kcalss(point,centerpoint):n&#61;len(point)new_kclass &#61; np.zeros((n))for i,point in enumerate(point):new_kclass[i]&#61;owner(point,centerpoint)new_kclass&#61;new_kclass.astype(int)return new_kclassdef kmeans(point):centerpoint&#61;initCenterpoint(2)kclass&#61;update_kcalss(point,centerpoint)old_kclsss &#61; kclassfor i in range(200):centerpoint&#61;newCenter2(point,kclass)kclass&#61;update_kcalss(point,centerpoint)if np.array_equal(old_kclsss,kclass):print("循环多少次&#xff1a;",i)breakelse:old_kclsss&#61;kclassreturn centerpoint,kclass
c &#61; np.array([[1,2],[1,1],[2,2],[5,5],[-0.10,-2.10],[-0.8,-1.8],[-2.9,-0.9],[-3.1,-2.2],[2,6],[7,10]])
a,b&#61;kmeans(c)
c2&#61;pd.DataFrame(c)
plt.plot(c2[0],c2[1],"or")
print(b)
a1&#61;pd.DataFrame(a)
print(a1)
plt.plot(a1[0],a1[1],"xb")
plt.show()