算法流程
- 选取某一类样本X
- 计算样本类中心
- 采用欧式距离测度计算待测样品到类中心的距离
- 距离最小的就是待测样品的类别
算法实现
计算距离
def euclid(x_train,y_train,sample):""":function: 基于类中心的模板匹配法:param x_train:训练集 M*N M为样本个数 N为特征个数:param y_train:训练集标签 1*M:param sample: 待识别样品:return: 返回判断类别"""disMin = np.inflabel = 0target = np.unique(y_train)for i in target:trainId =([j for j,y in enumerate(y_train) if y==i])train = x_train[trainId,:]trainMean = np.mean(train, axis=0)dis = np.dot((sample-trainMean),(sample - trainMean).T)if(disMin>dis):disMin = dislabel = ireturn label
划分数据集
def train_test_split(x,y,ratio = 3):""":function: 对数据集划分为训练集、测试集:param x: m*n维 m表示数据个数 n表示特征个数:param y: 标签:param ratio: 产生比例 train:test = 3:1(默认比例):return: x_train y_train x_test y_test"""n_samples , n_train = x.shape[0] , int(x.shape[0]*(ratio)/(1+ratio))train_id = random.sample(range(0,n_samples),n_train)x_train = x[train_id,:]y_train = y[train_id]x_test = np.delete(x,train_id,axis = 0)y_test = np.delete(y,train_id,axis = 0)return x_train,y_train,x_test,y_test
测试
from sklearn import datasets
from Include.chapter3 import function
import numpy as np
digits = datasets.load_digits()
x , y = digits.data,digits.target
x_train, y_train, x_test, y_test = function.train_test_split(x,y)
testId = np.random.randint(0, x_test.shape[0])
sample = x_test[testId, :]ans = function.euclid(x_train,y_train,sample)
print(ans==y_test[testId])
算法结果
True