机器学习：sklearn实现心脏病预测

作者：mzyzzyk | 来源：互联网 | 2023-07-16 14:29

数据集：链接：https:pan.baidu.coms1KVRkkRp-E-W0tS4Q9qU7Ag提取码：a9wl补充：快

数据集&＃xff1a;链接&＃xff1a;https://pan.baidu.com/s/1KVRkkRp-E-W0tS4Q9qU7Ag 提取码&＃xff1a;a9wl

补充&＃xff1a;快捷显示比较图操作

离散变量

ax1 &＃61; plt.subplot(121) ax2 &＃61; plt.subplot(122) death_df.thal.value_counts().sort_index().plot(kind&＃61;"bar",ax &＃61; ax1) living_df.thal.value_counts().sort_index().plot(kind&＃61;"bar",ax &＃61; ax2)

连续变量

plt.figure(figsize&＃61;(20, 10)) ax1 &＃61; plt.subplot(221) ax2 &＃61; plt.subplot(222)ejectionFraction_groups&＃61;pd.cut(living_df["ejectionFraction"],bins&＃61;[0,0.35,0.5,0.7,0.8]) ejectionFraction_target_df &＃61; pd.concat([ejectionFraction_groups,living_df.target],axis&＃61;1) sns.countplot(x&＃61;"ejectionFraction",hue&＃61;&＃39;target&＃39;,data&＃61;ejectionFraction_target_df,ax&＃61;ax1)ejectionFraction_groups&＃61;pd.cut(death_df["ejectionFraction"],bins&＃61;[0,0.35,0.5,0.7,0.8]) ejectionFraction_target_df &＃61; pd.concat([ejectionFraction_groups,death_df.target],axis&＃61;1) sns.countplot(x&＃61;"ejectionFraction",hue&＃61;&＃39;target&＃39;,data&＃61;ejectionFraction_target_df,ax&＃61;ax2)

56.心脏病预测-数据集介绍
57.心脏病预测-性别与患病分析
58.1.心脏病预测-特征相关性分析
58.心脏病预测-特征预处理
59.心脏病预测-K近邻预测
60.心脏病预测-精准率召回率以及ROC曲线
61.心脏病预测-决策树算法评估
62.心脏病预测-随机森林算法评估
63.心脏病预测-逻辑回归算法评估
64.心脏病预测-SGD分类算法评估
65.心脏病预测-特征重要性分析

心脏病预测-数据集介绍

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as snsplt.rcParams[&＃39;font.sans-serif&＃39;] &＃61; [&＃39;SimHei&＃39;]

heart_df &＃61; pd.read_csv("./data/heart.csv") heart_df.head() # heart_df.info()

在这里插入图片描述

age - 年龄 sex - (1 &＃61; male(男性); 0 &＃61; (女性)) cp - chest pain type(胸部疼痛类型)&＃xff08;1&＃xff1a;典型的心绞痛-typical&＃xff0c;2&＃xff1a;非典型心绞痛-atypical&＃xff0c;3&＃xff1a;没有心绞痛-non-anginal&＃xff0c;4&＃xff1a;无症状-asymptomatic&＃xff09; trestbps - 静息血压 (in mm Hg on admission to the hospital) chol - 胆固醇 in mg/dl fbs - (空腹血糖 > 120 mg/dl) (1 &＃61; true; 0 &＃61; false) restecg - 静息心电图测量&＃xff08;0&＃xff1a;普通&＃xff0c;1&＃xff1a;ST-T波异常&＃xff0c;2&＃xff1a;可能左心室肥大&＃xff09; thalach - 最高心跳率 exang - 运动诱发心绞痛 (1 &＃61; yes; 0 &＃61; no) oldpeak - 运动相对于休息引起的ST抑制 slope - 运动ST段的峰值斜率&＃xff08;1&＃xff1a;上坡-upsloping&＃xff0c;2&＃xff1a;平的-flat&＃xff0c;3&＃xff1a;下坡-downsloping&＃xff09; ca - 主要血管数目(0-4) thal - 一种叫做地中海贫血的血液疾病&＃xff08;3 &＃61; normal; 6 &＃61; 固定的缺陷-fixed defect; 7 &＃61; 可逆的缺陷-reversable defect&＃xff09; target - 是否患病 (1&＃61;yes, 0&＃61;no)

心脏病预测-性别与患病分析

# 患病的分布情况 fig,axes &＃61; plt.subplots(1,2,figsize&＃61;(10,5)) ax &＃61; heart_df.target.value_counts().plot(kind&＃61;"bar",ax&＃61;axes[0]) ax.set_title("患病分布") ax.set_xlabel("1&＃xff1a;患病&＃xff0c;0&＃xff1a;未患病")heart_df.target.value_counts().plot(kind&＃61;"pie",autopct&＃61;"%.2f%%",labels&＃61;[&＃39;患病&＃39;,&＃39;未患病&＃39;],ax&＃61;axes[1])

在这里插入图片描述

# 性别和患病的分布 ax1 &＃61; plt.subplot(121) ax &＃61; sns.countplot(x&＃61;"sex",hue&＃61;&＃39;target&＃39;,data&＃61;heart_df,ax&＃61;ax1) ax.set_xlabel("0&＃xff1a;女性&＃xff0c;1&＃xff1a;男性")ax2 &＃61; plt.subplot(222) heart_df[heart_df[&＃39;target&＃39;] &＃61;&＃61; 0].sex.value_counts().plot(kind&＃61;"pie",autopct&＃61;"%.2f%%",labels&＃61;[&＃39;男性&＃39;,&＃39;女性&＃39;],ax&＃61;ax2) ax2.set_title("未患病性别比例")ax2 &＃61; plt.subplot(224) heart_df[heart_df[&＃39;target&＃39;] &＃61;&＃61; 1].sex.value_counts().plot(kind&＃61;"pie",autopct&＃61;"%.2f%%",labels&＃61;[&＃39;男性&＃39;,&＃39;女性&＃39;],ax&＃61;ax2) ax2.set_title("患病性别比例")

在这里插入图片描述

fig,axes &＃61; plt.subplots(2,1,figsize&＃61;(20,10)) sns.countplot(x&＃61;"age",hue&＃61;"target",data&＃61;heart_df,ax&＃61;axes[0])# 0-45&＃xff1a;青年人&＃xff0c;45-59&＃xff1a;中年人&＃xff0c;60-100&＃xff1a;老年人 age_type &＃61; pd.cut(heart_df.age,bins&＃61;[0,45,60,100],include_lowest&＃61;True,right&＃61;False,labels&＃61;[&＃39;青年人&＃39;,&＃39;中年人&＃39;,&＃39;老年人&＃39;]) age_target_df &＃61; pd.concat([age_type,heart_df.target],axis&＃61;1) sns.countplot(x&＃61;"age",hue&＃61;&＃39;target&＃39;,data&＃61;age_target_df)

在这里插入图片描述

心脏病预测-特征相关性分析

# 统一看下所有特征的分布情况 fig,axes &＃61; plt.subplots(7,2,figsize&＃61;(10,20)) for x in range(0,14):plt.subplot(7,2,x&＃43;1)sns.distplot(heart_df.iloc[:,x],kde&＃61;True)plt.tight_layout()

在这里插入图片描述

plt.figure(figsize&＃61;(8,5)) sns.heatmap(heart_df.corr(),cmap&＃61;"Blues",annot&＃61;True)

在这里插入图片描述

心脏病预测-特征预处理

# 数据预处理 features &＃61; heart_df.drop(columns&＃61;[&＃39;target&＃39;]) targets &＃61; heart_df[&＃39;target&＃39;]

# 将离散型数据&＃xff0c;从普通的0,1,2这些&＃xff0c;转换成真正的字符串表示# sex features.loc[features[&＃39;sex&＃39;]&＃61;&＃61;0,&＃39;sex&＃39;] &＃61; &＃39;female&＃39; features.loc[features[&＃39;sex&＃39;]&＃61;&＃61;1,&＃39;sex&＃39;] &＃61; &＃39;male&＃39;# cp features.loc[features[&＃39;cp&＃39;] &＃61;&＃61; 1,&＃39;cp&＃39;] &＃61; &＃39;typical&＃39; features.loc[features[&＃39;cp&＃39;] &＃61;&＃61; 2,&＃39;cp&＃39;] &＃61; &＃39;atypical&＃39; features.loc[features[&＃39;cp&＃39;] &＃61;&＃61; 3,&＃39;cp&＃39;] &＃61; &＃39;non-anginal&＃39; features.loc[features[&＃39;cp&＃39;] &＃61;&＃61; 4,&＃39;cp&＃39;] &＃61; &＃39;asymptomatic&＃39;# fbs features.loc[features[&＃39;fbs&＃39;] &＃61;&＃61; 1,&＃39;fbs&＃39;] &＃61; &＃39;true&＃39; features.loc[features[&＃39;fbs&＃39;] &＃61;&＃61; 0,&＃39;fbs&＃39;] &＃61; &＃39;false&＃39;# exang features.loc[features[&＃39;exang&＃39;] &＃61;&＃61; 1,&＃39;exang&＃39;] &＃61; &＃39;true&＃39; features.loc[features[&＃39;exang&＃39;] &＃61;&＃61; 0,&＃39;exang&＃39;] &＃61; &＃39;false&＃39;# slope features.loc[features[&＃39;slope&＃39;] &＃61;&＃61; 1,&＃39;slope&＃39;] &＃61; &＃39;true&＃39; features.loc[features[&＃39;slope&＃39;] &＃61;&＃61; 2,&＃39;slope&＃39;] &＃61; &＃39;true&＃39; features.loc[features[&＃39;slope&＃39;] &＃61;&＃61; 3,&＃39;slope&＃39;] &＃61; &＃39;true&＃39;# thal features.loc[features[&＃39;thal&＃39;] &＃61;&＃61; 3,&＃39;thal&＃39;] &＃61; &＃39;normal&＃39; features.loc[features[&＃39;thal&＃39;] &＃61;&＃61; 3,&＃39;thal&＃39;] &＃61; &＃39;fixed&＃39; features.loc[features[&＃39;thal&＃39;] &＃61;&＃61; 3,&＃39;thal&＃39;] &＃61; &＃39;reversable&＃39;# restecg # 0&＃xff1a;普通&＃xff0c;1&＃xff1a;ST-T波异常&＃xff0c;2&＃xff1a;可能左心室肥大 features.loc[features[&＃39;restecg&＃39;] &＃61;&＃61; 0,&＃39;restecg&＃39;] &＃61; &＃39;normal&＃39; features.loc[features[&＃39;restecg&＃39;] &＃61;&＃61; 1,&＃39;restecg&＃39;] &＃61; &＃39;ST-T abnormal&＃39; features.loc[features[&＃39;restecg&＃39;] &＃61;&＃61; 2,&＃39;restecg&＃39;] &＃61; &＃39;Left ventricular hypertrophy&＃39;# ca features[&＃39;ca&＃39;].astype("object")# thal features.thal.astype("object")features.head()

在这里插入图片描述

from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_splitfeatures &＃61; pd.get_dummies(features) features_temp &＃61; StandardScaler().fit_transform(features) # features_temp &＃61; StandardScaler().fit_transform(pd.get_dummies(features))X_train,X_test,y_train,y_test &＃61; train_test_split(features_temp,targets,test_size&＃61;0.25)

K近邻决策树随机森林逻辑回归 SGD分类

心脏病预测-K近邻预测

from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import cross_val_score from sklearn.metrics import precision_score,recall_score,f1_score from sklearn.metrics import precision_recall_curve,roc_curve,average_precision_score,auc # https://www.jianshu.com/p/c61ae11cc5f6

def plotting(estimator,y_test):fig,axes &＃61; plt.subplots(1,2,figsize&＃61;(10,5))y_predict_proba &＃61; estimator.predict_proba(X_test)precisions,recalls,thretholds &＃61; precision_recall_curve(y_test,y_predict_proba[:,1])axes[0].plot(precisions,recalls)axes[0].set_title("平均精准率&＃xff1a;%.2f"%average_precision_score(y_test,y_predict_proba[:,1]))axes[0].set_xlabel("召回率")axes[0].set_ylabel("精准率")fpr,tpr,thretholds &＃61; roc_curve(y_test,y_predict_proba[:,1])axes[1].plot(fpr,tpr)axes[1].set_title("AUC值&＃xff1a;%.2f"%auc(fpr,tpr))axes[1].set_xlabel("FPR")axes[1].set_ylabel("TPR")

# 1. K近邻 knn &＃61; KNeighborsClassifier(n_neighbors&＃61;5) scores &＃61; cross_val_score(knn,features_temp,targets,cv&＃61;5) print("准确率&＃xff1a;",scores.mean())knn.fit(X_train,y_train)y_predict &＃61; knn.predict(X_test) # 精准率 print("精准率&＃xff1a;",precision_score(y_test,y_predict)) # 召回率 print("召回率&＃xff1a;",recall_score(y_test,y_predict)) # F1-Score print("F1得分&＃xff1a;",f1_score(y_test,y_predict))plotting(knn,y_test)

在这里插入图片描述

心脏病预测-精准率召回率以及ROC曲线

心脏病预测-决策树算法评估

# 决策树 from sklearn.tree import DecisionTreeClassifier tree &＃61; DecisionTreeClassifier(max_depth&＃61;10) tree.fit(X_train,y_train)plotting(tree,y_test)

在这里插入图片描述

心脏病预测-随机森林算法评估

# 随机森林 from sklearn.ensemble import RandomForestClassifier rf &＃61; RandomForestClassifier(n_estimators&＃61;100) rf.fit(X_train,y_train) plotting(rf,y_test)

在这里插入图片描述

心脏病预测-逻辑回归算法评估

# 逻辑回归 from sklearn.linear_model import LogisticRegression logic &＃61; LogisticRegression(tol&＃61;1e-10) logic.fit(X_train,y_train) plotting(logic,y_test)

在这里插入图片描述

心脏病预测-SGD分类算法评估

# SGD分类 from sklearn.linear_model import SGDClassifier sgd &＃61; SGDClassifier(loss&＃61;"log") sgd.fit(X_train,y_train) plotting(sgd,y_test)

在这里插入图片描述

心脏病预测-特征重要性分析

importances &＃61; pd.Series(data&＃61;rf.feature_importances_,index&＃61;features.columns).sort_values(ascending&＃61;False) sns.barplot(y&＃61;importances.index,x&＃61;importances.values,orient&＃61;&＃39;h&＃39;)

在这里插入图片描述

推荐阅读

string
将.o文件链接到.elf文件时

我有一个从C项目编译的.o文件，该文件引用了名为init_static_pool ... [详细]

蜡笔小新 2024-11-14 10:07:21
request
GTK+2: 实现透明背景下的小部件叠加绘制

本文介绍了如何在GTK+2中实现透明背景下的小部件叠加绘制，类似于GTK3中的GtkOverlay功能。 ... [详细]

蜡笔小新 2024-11-16 21:57:13
function
深入探讨C++中的GCD函数与队列

在iOS开发中，多线程技术的应用非常广泛，能够高效地执行多个调度任务。本文将重点介绍GCD（Grand Central Dispatch）在多线程开发中的应用，包括其函数和队列的实现细节。 ... [详细]

蜡笔小新 2024-11-16 14:59:50
string
短视频app源码，Android开发底部滑出菜单

短视频app源码，Android开发底部滑出菜单首先依赖三方库implementationandroidx.appcompat:appcompat:1.2.0im ... [详细]

蜡笔小新 2024-11-15 15:35:01
stream
OpenGLPBO

PBO(PixelBufferObject),将像素数据存储在显存中。优点：1、快速的像素数据传递，它采用了一种叫DMA（DirectM ... [详细]

蜡笔小新 2024-11-15 14:56:34
stream
centos 7.0 lnmp成功安装过程（很乱）

下载nginx[rootlocalhostsrc]#wgethttp:nginx.orgdownloadnginx-1.7.9.tar.gz--2015-01-2412:55:2 ... [详细]

蜡笔小新 2024-11-15 14:20:54
tree
嵌入式Linux工程师笔试题精选

本文整理了一份基础的嵌入式Linux工程师笔试题，涵盖填空题、编程题和简答题，旨在帮助考生更好地准备考试。 ... [详细]

蜡笔小新 2024-11-15 10:42:13
string
使用Tkinter构建51Ape无损音乐爬虫UI

本文介绍了如何使用Python的内置模块Tkinter来构建一个简单的用户界面，用于爬取51Ape网站上的无损音乐百度云链接。虽然Tkinter入门相对简单，但在实际开发过程中由于文档不足可能会带来一些不便。 ... [详细]

蜡笔小新 2024-11-15 10:31:11
object
iOS 多线程技术之 GCD

本文将深入探讨 iOS 中的 Grand Central Dispatch (GCD)，并介绍如何利用 GCD 进行高效多线程编程。如果你对线程的基本概念还不熟悉，建议先阅读相关基础资料。 ... [详细]

蜡笔小新 2024-11-14 15:57:40
object
Android 自定义 RecycleView 左滑上下分层示例代码

为了满足项目需求，需要在多个场景中实现左滑删除功能，并且后续可能在列表项中增加其他功能。虽然网络上有很多左滑删除的示例，但大多数封装不够完善。因此，我们尝试自己封装一个更加灵活和通用的解决方案。 ... [详细]

蜡笔小新 2024-11-13 17:43:59
function
利用 Node.js 和 Express（4.x 及以上版本）构建高效文件上传功能

本文介绍了如何使用 Node.js 和 Express（4.x 及以上版本）构建高效的文件上传功能。通过引入 `multer` 中间件，可以轻松实现文件上传。首先，需要通过 `npm install multer` 安装该中间件。接着，在 Express 应用中配置 `multer`，以处理多部分表单数据。本文详细讲解了 `multer` 的基本用法和高级配置，帮助开发者快速搭建稳定可靠的文件上传服务。 ... [详细]

蜡笔小新 2024-11-11 18:02:17
header
如何将Python与Excel高效结合：常用操作技巧解析

本文深入探讨了如何将Python与Excel高效结合，涵盖了一系列实用的操作技巧。文章内容详尽，步骤清晰，注重细节处理，旨在帮助读者掌握Python与Excel之间的无缝对接方法，提升数据处理效率。 ... [详细]

蜡笔小新 2024-11-11 15:18:30
request
使用 ListView 浏览安卓系统中的回收站文件

使用 ListView 浏览安卓系统中的回收站文件 ... [详细]

蜡笔小新 2024-11-09 16:34:55
function
C++ 编程指南：第16条——在对应的新建和删除操作中使用相同的形式

当使用 `new` 表达式（即通过 `new` 动态创建对象）时，会发生两件事：首先，内存被分配用于存储新对象；其次，该对象的构造函数被调用以初始化对象。为了确保资源管理的一致性和避免内存泄漏，建议在使用 `new` 和 `delete` 时保持形式一致。例如，如果使用 `new[]` 分配数组，则应使用 `delete[]` 来释放内存；同样，如果使用 `new` 分配单个对象，则应使用 `delete` 来释放内存。这种一致性有助于防止常见的编程错误，提高代码的健壮性和可维护性。 ... [详细]

蜡笔小新 2024-11-09 12:21:19

mzyzzyk

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章