作者:孤独秀风_328 | 来源:互联网 | 2023-08-06 10:19
关于SVM的原理可参考https:zhuanlan.zhihu.comp24638007其中的KKT条件和强对偶性的互相推导可参考我之前的博客:https:blog
关于SVM的原理可参考https://zhuanlan.zhihu.com/p/24638007
其中的KKT条件和强对偶性的互相推导可参考我之前的博客:https://blog.csdn.net/qq_35985044/article/details/85324714
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsdata = pd.read_csv('breast_cancer_01/data.csv')
print(data.columns)
print(data.head(5))
print(data.describe())
部分结果如下:
Index(['id', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean','area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean','concave points_mean', 'symmetry_mean', 'fractal_dimension_mean','radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se','compactness_se', 'concavity_se', 'concave points_se', 'symmetry_se','fractal_dimension_se', 'radius_worst', 'texture_worst','perimeter_worst', 'area_worst', 'smoothness_worst','compactness_worst', 'concavity_worst', 'concave points_worst','symmetry_worst', 'fractal_dimension_worst'],dtype='object')id diagnosis radius_mean texture_mean perimeter_mean area_mean \
0 842302 M 17.99 10.38 122.80 1001.0
1 842517 M 20.57 17.77 132.90 1326.0
2 84300903 M 19.69 21.25 130.00 1203.0
3 84348301 M 11.42 20.38 77.58 386.1
4 84358402 M 20.29 14.34 135.10 1297.0
#columns取列名,index取行名
features_mean = list(data.columns[2:12])
features_se = list(data.columns[12:22])
features_worst = list(data.columns[22:32])
features_worst
['radius_worst','texture_worst','perimeter_worst','area_worst','smoothness_worst','compactness_worst','concavity_worst','concave points_worst','symmetry_worst','fractal_dimension_worst']
#删除ID列
data.drop("id",axis=1,inplace=True)
#将B良性替换成0, M恶性替换成1
data['diagnosis'] = data['diagnosis'].map({'B':0, 'M':1})
corr = data[features_mean].corr()
plt.figure(figsize=(10,10))sns.heatmap(corr, annot=True)
plt.show()
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import metrics
from sklearn.preprocessing import StandardScaler# 特征选择
features_remain = ['radius_mean','texture_mean', 'smoothness_mean','compactness_mean','symmetry_mean', 'fractal_dimension_mean'] # 抽取30%的数据作为测试集,其余作为训练集
train, test = train_test_split(data, test_size = 0.3)# in this our main data is splitted into train and test
# 抽取特征选择的数值作为训练和测试数据
train_X = train[features_remain]
train_y=train['diagnosis']
test_X= test[features_remain]
test_y =test['diagnosis']# 采用Z-Score规范化数据,保证每个特征维度的数据均值为0,方差为1
ss = StandardScaler()
#提取训练集数据的均值和方差,并利用这两个参数对训练集进行标准化
train_X = ss.fit_transform(train_X)
#利用训练集的均值和方差对测试集进行标准化
test_X = ss.transform(test_X)model = svm.SVC()
model.fit(train_X,train_y)
prediction = model.predict(test_X)
print('准确率: ', metrics.accuracy_score(prediction,test_y))
准确率: 0.9415204678362573
还有用PCA降维的,可参考https://blog.csdn.net/Vincent_Chu/article/details/90046985