python数据可视化seaborn（四）——分类数据可视化

作者：diy2099_d94639 | 来源：互联网 | 2023-10-12 14:56

之前的文章关注的是两个变量都是数值变量的情况,当有一个变量是分类变量的时候&＃xff0c;我们就需要其他类型的图形来展示分析数据。在seaborn中有多种类型的图形且非常易于上手。import num

之前的文章关注的是两个变量都是数值变量的情况,当有一个变量是分类变量的时候&＃xff0c;我们就需要其他类型的图形来展示分析数据。在seaborn中有多种类型的图形且非常易于上手。

import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns %matplotlib inlinesns.set(style&＃61;"whitegrid",font_scale&＃61;1.4,context&＃61;"paper") # 设置风格、尺度 import warnings warnings.filterwarnings(&＃39;ignore&＃39;) # 不发出警告

seaborn中&＃xff0c;分类图主要分为三个部分&＃xff1a;

分类散点图&＃xff1a;
- stripplot(默认&＃xff0c;kind &＃61; “strip”)
- swarmplot(kind &＃61; “swarm”)
分类分布图&＃xff1a;
- boxplot(kind&＃61;“box”)
- violinplot(kind&＃61;“violin”)
- boxenplot(kind&＃61;“boxen”)
分类估计图&＃xff1a;
- pointplot(kind&＃61;“point”)
- barplot(kind&＃61;“bar”)
- countplot(kind&＃61;“count”)

以上三种系列分别代表了不同粒度级别的数据。当然&＃xff0c;在实际使用的过程中&＃xff0c;其实没有必要记住这么多&＃xff0c;因为seaborn中的分类系列有统一的图形界面catplot(),只需要这一个函数&＃xff0c;就能访问所有分类图像类型。

分类散点图

seaborn.stripplot(x&＃61;None, y&＃61;None, hue&＃61;None, data&＃61;None, order&＃61;None, hue_order&＃61;None, jitter&＃61;True, dodge&＃61;False, orient&＃61;None, color&＃61;None, palette&＃61;None, size&＃61;5, edgecolor&＃61;‘gray’, linewidth&＃61;0, ax&＃61;None, **kwargs)

jitter : 是否抖动&＃xff0c;True&＃xff0c;false or float
dodge : 当有hue参数时&＃xff0c;是否沿轴分离不同颜色
orient : 图形方向&＃xff0c;垂直&＃xff08;“v”&＃xff09;或者水平(“h”)

# 1、catplot() 默认情况下&＃xff0c;kind&＃61;&＃39;strip&＃39; # 按照不同类别对样本数据进行分布散点图绘制tips &＃61; sns.load_dataset("tips") print(tips.head()) # 加载数据sns.catplot(x&＃61;"day", # x → 设置分组统计字段y&＃61;"total_bill", # y → 数据分布统计字段# 这里xy数据对调&＃xff0c;将会使得散点图横向分布data&＃61;tips, # data → 对应数据jitter &＃61; True, height&＃61;6, #当点数据重合较多时&＃xff0c;jitter可以控制点抖动&＃xff0c;也可以设置间距如&＃xff1a;jitter &＃61; 0.1s &＃61; 6, edgecolor &＃61; &＃39;w&＃39;,linewidth&＃61;1,marker &＃61; &＃39;o&＃39; , # 设置点的大小、描边颜色或宽度、点样式)

total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4

在这里插入图片描述

# 1、stripplot() # 通过kind&＃61;&＃39;swarm&＃39; 来调整点防止重合sns.catplot(x&＃61;"day", y&＃61;"total_bill",kind&＃61;&＃39;swarm&＃39;,hue&＃61;&＃39;sex&＃39;,data&＃61;tips,height&＃61;5,s&＃61;5.5) # 通过让点沿轴分布来防止重合&＃xff0c;这只使用与较小数据集

在这里插入图片描述

# 1、stripplot() # 设置调色盘sns.catplot(x&＃61;"sex", y&＃61;"total_bill", hue&＃61;"day",data&＃61;tips, jitter&＃61;True,palette&＃61;"Set2", # 设置调色盘dodge&＃61;True, # 是否拆分)

在这里插入图片描述

# 排序 print(tips[&＃39;day&＃39;].value_counts()) # 查看day字段的唯一值sns.catplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,order &＃61; [&＃39;Sun&＃39;,&＃39;Sat&＃39;]) # order → 筛选类别,控制排序

Sat 87 Sun 76 Thur 62 Fri 19 Name: day, dtype: int64

在这里插入图片描述

分类分布图

箱线图 boxplot()

seaborn.boxplot(x&＃61;None, y&＃61;None, hue&＃61;None, data&＃61;None, order&＃61;None, hue_order&＃61;None, orient&＃61;None, color&＃61;None, palette&＃61;None, saturation&＃61;0.75, width&＃61;0.8, dodge&＃61;True, fliersize&＃61;5, linewidth&＃61;None, whis&＃61;1.5, notch&＃61;False, ax&＃61;None, **kwargs)

saturation : float,颜色饱和度
fliersize : 异常值标记的大小
whis : float,超出IQR多少比例被视为异常值&＃xff0c;默认1.5
notch : 是否用中位数设置凹槽

# 箱线图 catplot(kind&＃61;&＃39;box&＃39;) sns.catplot(x&＃61;&＃39;day&＃39;, y&＃61;&＃39;total_bill&＃39;, data&＃61;tips,kind&＃61;&＃39;box&＃39;,linewidth&＃61;2, # 线宽width&＃61;0.6, # 箱之间的间隔比例fliersize&＃61;5, # 异常点大小palette&＃61;&＃39;hls&＃39;, # 调色板whis&＃61;1.5, # 设置IQRnotch&＃61;True, # 设置是否用中位数做凹槽order&＃61;[&＃39;Thur&＃39;, &＃39;Fri&＃39;, &＃39;Sat&＃39;, &＃39;Sun&＃39;], #筛选类别)

在这里插入图片描述

# 通过hue参数再分类 # 多种类型图混合# 绘制箱型图 sns.catplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,kind&＃61;&＃39;box&＃39;,hue &＃61; &＃39;smoker&＃39;,height&＃61;6)# 绘制散点图 sns.swarmplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,color &＃61;&＃39;k&＃39;,s&＃61; 3,alpha &＃61; 0.8) # 添加分类散点图&＃xff0c;这里添加散点图要用各自的函数swarmplot() # 不能再用高级端口catplot() 否则就是两个图了

在这里插入图片描述

对于数据量较大的数据集&＃xff0c;散点图会显的很拥挤&＃xff0c;这时我们可以使用boxenplot(),这种图表类似箱线图&＃xff0c;既能够展示数据的分布也可以如箱线图展示数据的统计信息

diamonds &＃61; sns.load_dataset("diamonds") print(diamonds.head(3)) sns.catplot(x&＃61;&＃39;color&＃39;,y&＃61;&＃39;price&＃39;,kind&＃61;&＃39;boxen&＃39;,data&＃61;diamonds.sort_values("color"),height&＃61;6)

carat cut color clarity depth table price x y z 0 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43 1 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31 2 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31

在这里插入图片描述

提琴图

小提琴图将核密度估计和箱线图结合起来

seaborn.violinplot(x&＃61;None, y&＃61;None, hue&＃61;None, data&＃61;None, order&＃61;None, hue_order&＃61;None, bw&＃61;‘scott’, cut&＃61;2, scale&＃61;‘area’, scale_hue&＃61;True, gridsize&＃61;100, width&＃61;0.8, inner&＃61;‘box’, split&＃61;False, dodge&＃61;True, orient&＃61;None, linewidth&＃61;None, color&＃61;None, palette&＃61;None, saturation&＃61;0.75, ax&＃61;None, **kwargs)

bw : (“scott”,“silverman”,float),核大小的比例因子&＃xff0c;实际效果是越大越平滑。
cut : float,用于将密度扩展到极端数据点之外的距离&＃xff0c;设置为0以将小提琴范围限制在观测数据的范围内。
scale : 小提琴图的宽度&＃xff1a;area-面积相同&＃xff0c;count-按照样本数量决定宽度&＃xff0c;width-宽度一样
scale_hue : bool,当有hue时&＃xff0c;决定实在分组内还是图上所有小提琴计算缩放比例
gridsize : 和必读估计离散网格中的点数&＃xff0c;越高越平滑
inner : &＃xff08;“box”, “quartile”, “point”, “stick”, None&＃xff09;&＃xff0c;内部显示样式
split : 当有颜色嵌套是&＃xff0c;是否分别绘制每侧的小提琴。

# 2、violinplot() # 小提琴图sns.catplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,kind&＃61;&＃39;violin&＃39;,linewidth &＃61; 2, # 线宽width &＃61; 0.8, # 箱之间的间隔比例height&＃61;6,palette &＃61; &＃39;hls&＃39;, # 设置调色板order &＃61; [&＃39;Thur&＃39;,&＃39;Fri&＃39;,&＃39;Sat&＃39;,&＃39;Sun&＃39;], # 筛选类别scale &＃61; &＃39;area&＃39;, # 测度小提琴图的宽度&＃xff1a;# area-面积相同&＃xff0c;count-按照样本数量决定宽度&＃xff0c;width-宽度一样gridsize &＃61; 30, # 设置小提琴图边线的平滑度&＃xff0c;越高越平滑inner &＃61; &＃39;box&＃39;, bw &＃61; .5 # 控制拟合程度&＃xff0c;一般可以不设置)

在这里插入图片描述

# 2、violinplot() # 通过hue参数再分类sns.catplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,kind&＃61;&＃39;violin&＃39;,hue &＃61; &＃39;smoker&＃39;,palette&＃61;"muted", split&＃61;True, # 设置是否拆分小提琴图inner&＃61;"quartile",height&＃61;6)

在这里插入图片描述

# 2、violinplot() # 结合散点图sns.catplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,kind&＃61;&＃39;violin&＃39;,palette &＃61; &＃39;hls&＃39;,inner &＃61; None,height&＃61;6,cut&＃61;0 # 设置为0&＃xff0c;将图限制在观测数据范围内。)# 插入散点图 sns.swarmplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,color&＃61;"k", alpha&＃61;.5)

在这里插入图片描述

统计图

seaborn.barplot(x&＃61;None, y&＃61;None, hue&＃61;None, data&＃61;None, order&＃61;None, hue_order&＃61;None, estimator&＃61;, ci&＃61;95, n_boot&＃61;1000, units&＃61;None, orient&＃61;None, color&＃61;None, palette&＃61;None, saturation&＃61;0.75, errcolor&＃61;’.26’, errwidth&＃61;None, capsize&＃61;None, dodge&＃61;True, ax&＃61;None, **kwargs)

estimator : 分类箱内使用的统计函数
ci : &＃xff08;float,“sd”,None&＃xff09;
units : 变量名称&＃xff0c;对变量的每个采样单独绘制&＃xff0c;可用于绘制重复数据
errwidth : 误差线宽度
capsize : 误差条帽的宽度

# 1、barplot() # 置信区间&＃xff1a;样本均值 &＃43; 抽样误差 titanic &＃61; sns.load_dataset("titanic") # print(titanic.head()) # 加载数据sns.catplot(x&＃61;"sex", y&＃61;"survived", data&＃61;titanic,kind&＃61;&＃39;bar&＃39;,palette &＃61; &＃39;hls&＃39;, hue&＃61;"class",order &＃61; [&＃39;male&＃39;,&＃39;female&＃39;], # 筛选类别capsize &＃61; 0.05, # 误差线横向延伸宽度saturation&＃61;.8, # 颜色饱和度errcolor &＃61; &＃39;gray&＃39;,errwidth &＃61; 2, # 误差线颜色&＃xff0c;宽度height&＃61;6,ci &＃61; &＃39;sd&＃39;# 置信区间误差 → 0-100内值、&＃39;sd&＃39;、None) print(titanic.groupby([&＃39;sex&＃39;,&＃39;class&＃39;]).mean()[&＃39;survived&＃39;]) print(titanic.groupby([&＃39;sex&＃39;,&＃39;class&＃39;]).std()[&＃39;survived&＃39;]) # 计算数据

sex class female First 0.968085Second 0.921053Third 0.500000 male First 0.368852Second 0.157407Third 0.135447 Name: survived, dtype: float64 sex class female First 0.176716Second 0.271448Third 0.501745 male First 0.484484Second 0.365882Third 0.342694 Name: survived, dtype: float64

在这里插入图片描述

# 1、barplot() # 柱状图 - 置信区间估计 # 可以这样子改变风格sns.catplot(x&＃61;"day", y&＃61;"total_bill", data&＃61;tips,linewidth&＃61;2.5,facecolor&＃61;(1,1,1,0),kind&＃61;&＃39;bar&＃39;,edgecolor &＃61; &＃39;k&＃39;,)

在这里插入图片描述

# 1、barplot()crashes &＃61; sns.load_dataset("car_crashes").sort_values("total", ascending&＃61;False) print(crashes.head()) # 加载数据f, ax &＃61; plt.subplots(figsize&＃61;(10, 15)) # 创建图表# sns.set_color_codes("pastel") sns.barplot(x&＃61;"total", y&＃61;"abbrev", data&＃61;crashes,label&＃61;"Total", color&＃61;"b",edgecolor &＃61; &＃39;w&＃39;) # 设置第一个柱状图# sns.set_color_codes("muted") sns.barplot(x&＃61;"alcohol", y&＃61;"abbrev", data&＃61;crashes,label&＃61;"Alcohol-involved", color&＃61;"y",edgecolor &＃61; &＃39;w&＃39;) # 设置第二个柱状图ax.legend(ncol&＃61;2, loc&＃61;"lower right") sns.despine(left&＃61;True, bottom&＃61;True)

total speeding alcohol not_distracted no_previous ins_premium \ 40 23.9 9.082 9.799 22.944 19.359 858.97 34 23.9 5.497 10.038 23.661 20.554 688.75 48 23.8 8.092 6.664 23.086 20.706 992.61 3 22.4 4.032 5.824 21.056 21.280 827.34 17 21.4 4.066 4.922 16.692 16.264 872.51 ins_losses abbrev 40 116.29 SC 34 109.72 ND 48 152.56 WV 3 142.39 AR 17 137.13 KY

在这里插入图片描述

# 2、countplot() # 计数柱状图sns.catplot(x&＃61;"class", hue&＃61;"who", data&＃61;titanic,kind&＃61;&＃39;count&＃39;,palette &＃61; &＃39;magma&＃39;)sns.catplot(y&＃61;"class", hue&＃61;"who", data&＃61;titanic,kind&＃61;&＃39;count&＃39;,palette &＃61; &＃39;magma&＃39;) # x/y → 以x或者y轴绘图&＃xff08;横向&＃xff0c;竖向&＃xff09; # 用法和barplot相似

在这里插入图片描述

# 3、pointplot()sns.catplot(x&＃61;"time", y&＃61;"total_bill", hue &＃61; &＃39;smoker&＃39;,data&＃61;tips,kind&＃61;&＃39;point&＃39;,palette &＃61; &＃39;hls&＃39;,height&＃61;7,dodge &＃61; True, # 设置点是否分开join &＃61; True, # 是否连线markers&＃61;["o", "x"], linestyles&＃61;["-", "--"], # 设置点样式、线型) # 计算数据 # # 用法和barplot相似

在这里插入图片描述

推荐阅读

ip
DNN Community 和 Professional 版本的主要差异

本文详细解析了 DotNetNuke (DNN) 的两种主要版本：Community 和 Professional。通过对比两者的功能和附加组件，帮助用户选择最适合其需求的版本。 ... [详细]

蜡笔小新 2024-12-27 13:14:08
ip
编写有趣的VBScript恶作剧脚本

本文将介绍如何编写一些有趣的VBScript脚本，这些脚本可以在朋友之间进行无害的恶作剧。通过简单的代码示例，帮助您了解VBScript的基本语法和功能。 ... [详细]

蜡笔小新 2024-12-28 09:46:23
ip
利用决策树预测NBA比赛胜负的Python数据挖掘实践

本文通过使用2013-14赛季NBA赛程与结果数据集以及2013年NBA排名数据，结合《Python数据挖掘入门与实践》一书中的方法，展示如何应用决策树算法进行比赛胜负预测。我们将详细讲解数据预处理、特征工程及模型评估等关键步骤。 ... [详细]

蜡笔小新 2024-12-23 09:07:40
request
技术分享：从动态网站提取站点密钥的解决方案

本文探讨了如何从动态网站中提取站点密钥，特别是针对验证码（reCAPTCHA）的处理方法。通过结合Selenium和requests库，提供了详细的代码示例和优化建议。 ... [详细]

蜡笔小新 2024-12-28 04:11:47
callback
Python 的 10 个开发技巧！太实用了

1.如何在运行状态查看源代码？查看函数的源代码，我们通常会使用IDE来完成。比如在PyCharm中，你可以Ctrl+鼠标点击进入函数的源代码。那如果没有IDE呢？当我们想使用一个函 ... [详细]

蜡笔小新 2024-12-27 18:36:54
ip
深入解析ExpandableComposite.addExpansionListener()方法及其应用

本文详细介绍了Java中org.eclipse.ui.forms.widgets.ExpandableComposite类的addExpansionListener()方法，并提供了多个实际代码示例，帮助开发者更好地理解和使用该方法。这些示例来源于多个知名开源项目，具有很高的参考价值。 ... [详细]

蜡笔小新 2024-12-27 16:11:49
ip
分页插件3指定到某一页

前言--页数多了以后需要指定到某一页（只做了功能，样式没有细调）html ... [详细]

蜡笔小新 2024-12-27 15:19:01
ip
Akka BackoffSupervisor的深入解析与实践

本文详细介绍了Akka中的BackoffSupervisor机制，探讨其在处理持久化失败和Actor重启时的应用。通过具体示例，展示了如何配置和使用BackoffSupervisor以实现更细粒度的异常处理。 ... [详细]

蜡笔小新 2024-12-27 15:04:09
ip
Android 渐变圆环加载控件实现

本文介绍了如何在 Android 中创建一个自定义的渐变圆环加载控件，该控件已在多个知名应用中使用。我们将详细探讨其工作原理和实现方法。 ... [详细]

蜡笔小新 2024-12-27 13:34:19
request
Go+ 中的上下文处理指南

本文详细介绍 Go+ 编程语言中的上下文处理机制，涵盖其基本概念、关键方法及应用场景。Go+ 是一门结合了 Go 的高效工程开发特性和 Python 数据科学功能的编程语言。 ... [详细]

蜡笔小新 2024-12-28 11:05:31
ip
网络链路质量监控：Smokeping部署与配置

本文详细介绍了如何在Linux系统上安装和配置Smokeping，以实现对网络链路质量的实时监控。通过详细的步骤和必要的依赖包安装，确保用户能够顺利完成部署并优化其网络性能监控。 ... [详细]

蜡笔小新 2024-12-27 19:31:05
ip
Dockerfile 编写与 Docker 网络配置详解

本文详细介绍了 Dockerfile 的编写方法及其在网络配置中的应用，涵盖基础指令、镜像构建与发布流程，并深入探讨了 Docker 的默认网络、容器互联及自定义网络的实现。 ... [详细]

蜡笔小新 2024-12-27 17:31:41
ip
使用 Azure Service Principal 和 Microsoft Graph API 获取 AAD 用户列表

本文介绍了一段通用代码示例，该代码不仅能够操作 Azure Active Directory (AAD)，还可以通过 Azure Service Principal 的授权访问和管理 Azure 订阅资源。Azure 的架构可以分为两个层级：AAD 和 Subscription。 ... [详细]

蜡笔小新 2024-12-27 16:07:12
ip
深入解析 MVC 源码：ParameterDescriptor 与 Action 方法参数绑定

在前两篇文章中，我们探讨了 ControllerDescriptor 和 ActionDescriptor 这两个描述对象，分别对应控制器和操作方法。本文将基于 MVC3 源码进一步分析 ParameterDescriptor，即用于描述 Action 方法参数的对象，并详细介绍其工作原理。 ... [详细]

蜡笔小新 2024-12-27 15:26:10
ip
寻找满足特定条件的整数N的最大和(a+b)

本文探讨了如何在给定整数N的情况下，找到两个不同的整数a和b，使得它们的和最大，并且满足特定的数学条件。 ... [详细]

蜡笔小新 2024-12-26 19:26:18

diy2099_d94639

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章