热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

财务数据分析sqlpython_Python数据分析:让你像写Sql语句一样,使用Pandas做数据分析...

Python数据分析:让你像写Sql语句一样,使用Pandas做数据分析一、加载数据importpandasaspdimportnumpyasnp

Python 数据分析:让你像写 Sql 语句一样,使用 Pandas 做数据分析

一、加载数据

import pandas as pd

import numpy as np

url = ('https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv')

tips = pd.read_csv(url)

output = tips.head()

Output:

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

2 21.01 3.50 Male No Sun Dinner 3

3 23.68 3.31 Male No Sun Dinner 2

4 24.59 3.61 Female No Sun Dinner 4

二、SELECT 的使用方式

sql 语句: SELECT total_bill, tip, smoker, time FROM tips LIMIT 5;。

output = tips[['total_bill', 'tip', 'smoker', 'time']].head(5)

Output:

total_bill tip smoker time

0 16.99 1.01 No Dinner

1 10.34 1.66 No Dinner

2 21.01 3.50 No Dinner

3 23.68 3.31 No Dinner

4 24.59 3.61 No Dinner

三、WHERE 的使用方式

1. 举个栗子

sql 语句: SELECT * FROM tips WHERE time = ‘Dinner‘ LIMIT 5;

output = tips[tips['time'] == 'Dinner'].head(5)

# 或者

output = tips.query("time == 'Dinner'").head(5)

Output:

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

2 21.01 3.50 Male No Sun Dinner 3

3 23.68 3.31 Male No Sun Dinner 2

4 24.59 3.61 Female No Sun Dinner 4

2. 比较运算符&#xff1a;等于 &#61;&#61;、 大于 >、 大于等于 >&#61;、小于等于 <&#61;、不等于 !&#61;

2.1 等于 &#61;&#61;

sql 语句&#xff1a;SELECT * FROM tips WHERE time &#61; ‘Dinner‘;。

output &#61; tips[(tips[&#39;time&#39;] &#61;&#61; &#39;Dinner&#39;)]

2.2 大于 >

sql 语句&#xff1a;SELECT * FROM tips WHERE tip > 5.00;。

output &#61; tips[(tips[&#39;tip&#39;] > 5.00)]

2.3 大于等于 >&#61;

sql 语句&#xff1a;SELECT * FROM tips WHERE tip >&#61; 5.00;。

output &#61; tips[(tips[&#39;size&#39;] >&#61; 5)]

2.4 小于等于 <&#61;

sql 语句&#xff1a;SELECT * FROM tips WHERE tip <&#61; 5.00;。

output &#61; tips[(tips[&#39;size&#39;] <&#61; 5)]

2.5 不等于 !&#61;

sql 语句&#xff1a;SELECT * FROM tips WHERE tip <> 5.00;。

output &#61; tips[(tips[&#39;size&#39;] !&#61; 5)]

3. 逻辑运算符&#xff1a;且 &、或 |、非 -

3.1 且 &

sql 语句&#xff1a;SELECT * FROM tips WHERE time &#61; ‘Dinner‘ AND tip > 5.00;

output &#61; tips[(tips[&#39;time&#39;] &#61;&#61; &#39;Dinner&#39;) & (tips[&#39;tip&#39;] > 5.00)]

3.2 或 |

sql 语句&#xff1a;SELECT * FROM tips WHERE size >&#61; 5 OR total_bill > 45;。

output &#61; tips[(tips[&#39;size&#39;] >&#61; 5) | (tips[&#39;total_bill&#39;] > 45)]

3.3 非 -

sql 语句&#xff1a;SELECT * FROM tips WHERE not (size <> 5 AND size > 4);

output &#61; df[-((df[&#39;size&#39;] !&#61; 5) & (df[&#39;size&#39;] > 4))]

4. Null 的判断

这里重新定义一个包含 NaN 数据的 DataFrame。

frame &#61; pd.DataFrame({

&#39;col1&#39;: [&#39;A&#39;, &#39;B&#39;, np.NaN, &#39;C&#39;, &#39;D&#39;],

&#39;col2&#39;: [&#39;F&#39;, np.NaN, &#39;G&#39;, &#39;H&#39;, &#39;I&#39;]

})

output &#61; frame

Output:

col1 col2

0 A F

1 B NaN

2 NaN G

3 C H

4 D I

4.1 判断列是 Null

sql 语句&#xff1a;SELECT * FROM frame WHERE col2 IS NULL;。

output &#61; frame[frame[&#39;col2&#39;].isna()]

Output:

col1 col2

1 B NaN

4.2 判断列不是 Null

sql 语句&#xff1a;SELECT * FROM frame WHERE col1 IS NOT NULL;。

output &#61; frame[frame[&#39;col1&#39;].notna()]

Output:

col1 col2

0 A F

1 B NaN

3 C H

4 D I

5. In、Like 操作

5.1 In

sql 语句&#xff1a;SELECT * FROM tips WHERE siez in (5, 6);。

output &#61; tips[tips[&#39;size&#39;].isin([2, 5])]

5.2 Like

sql 语句&#xff1a;SELECT * FROM tips WHERE time like ‘Din%‘;。

output &#61; tips[tips.time.str.contains(&#39;Din*&#39;)]

四、GROUP BY 的使用方式

sql 语句&#xff1a;SELECT sex, count(*) FROM tips GROUP BY sex;

output &#61; tips.groupby(&#39;sex&#39;).size()

# 获取相应的结果

output[&#39;Male&#39;]

output[&#39;Female&#39;]

output &#61; tips.groupby(&#39;sex&#39;).count()

# 获取相应的结果

output[&#39;tip&#39;][&#39;Female&#39;]

output &#61; tips.groupby(&#39;sex&#39;)[&#39;total_bill&#39;].count()

# 获取相应的结果

output[&#39;Male&#39;]

output[&#39;Female&#39;]

sql 语句&#xff1a;SELECT day, AVG(tip), COUNT(*) FROM tips GROUP BY day;

output &#61; tips.groupby(&#39;day&#39;).agg({&#39;tip&#39;: np.mean, &#39;day&#39;: np.size})

# 获取相应的结果

output[&#39;day&#39;][&#39;Fri&#39;]

output[&#39;tip&#39;][&#39;Fri&#39;]

sql 语句&#xff1a;SELECT smoker, day, COUNT(*), AVG(tip) FROM tips GROUP BY smoker, day;

output &#61; tips.groupby([&#39;smoker&#39;, &#39;day&#39;]).agg({&#39;tip&#39;: [np.size, np.mean]})

# 获取相应的结果

output[&#39;tip&#39;][&#39;size&#39;][&#39;No&#39;][&#39;Fri&#39;]

sql 语句&#xff1a;SELECT tip, count(distinct sex) FROM tips GROUP BY tip;

output &#61; tips.groupby(&#39;tip&#39;).agg({&#39;sex&#39;: pd.Series.nunique})

五、JOIN 连接的使用方式

定义两个 DataFrame。

df1 &#61; pd.DataFrame({&#39;key&#39;: [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], &#39;value&#39;: np.random.randn(4)})

df2 &#61; pd.DataFrame({&#39;key&#39;: [&#39;B&#39;, &#39;D&#39;, &#39;D&#39;, &#39;E&#39;], &#39;value&#39;: np.random.randn(4)})

1. 内连接 Inner Join

sql 语句&#xff1a;SELECT * FROM df1 INNER JOIN df2 ON df1.key &#61; df2.key;

output &#61; pd.merge(df1, df2, on&#61;&#39;key&#39;)

# 或

indexed_df2 &#61; df2.set_index(&#39;key&#39;)

pd.merge(df1, indexed_df2, left_on&#61;&#39;key&#39;, right_index&#61;True)

2. 左连接 Left Outer Join

sql 语句&#xff1a;SELECT * FROM df1 LEFT OUTER JOIN df2 ON df1.key &#61; df2.key;

output &#61; pd.merge(df1, df2, on&#61;&#39;key&#39;, how&#61;&#39;left&#39;)

# 或

output &#61; df1.join(df2, on&#61;&#39;key&#39;, how&#61;&#39;left&#39;)

3. 右连接 Right Join

sql 语句&#xff1a;SELECT * FROM df1 RIGHT OUTER JOIN df2 ON df1.key &#61; df2.key;

output &#61; pd.merge(df1, df2, on&#61;&#39;key&#39;, how&#61;&#39;right&#39;)

4. 全连接 Full Join

sql 语句&#xff1a;SELECT * FROM df1 FULL OUTER JOIN df2 ON df1.key &#61; df2.key;

output &#61; pd.merge(df1, df2, on&#61;&#39;key&#39;, how&#61;&#39;outer&#39;)

五、UNION 的使用方式

df1 &#61; pd.DataFrame({&#39;city&#39;: [&#39;Chicago&#39;, &#39;San Francisco&#39;, &#39;New York City&#39;], &#39;rank&#39;: range(1, 4)})

df2 &#61; pd.DataFrame({&#39;city&#39;: [&#39;Chicago&#39;, &#39;Boston&#39;, &#39;Los Angeles&#39;], &#39;rank&#39;: [1, 4, 5]})

sql 语句&#xff1a;SELECT city, rank FROM df1 UNION ALL SELECT city, rank FROM df2;

output &#61; pd.concat([df1, df2])

sql 语句&#xff1a;SELECT city, rank FROM df1 UNION SELECT city, rank FROM df2;

output &#61; pd.concat([df1, df2]).drop_duplicates()

六、与 SQL 等价的其他语法

1. 去重 Distinct

sql 语句&#xff1a;SELECT DISTINCT sex FROM tips;

output &#61; tips.drop_duplicates(subset&#61;[&#39;sex&#39;], keep&#61;&#39;first&#39;, inplace&#61;False)

2. 修改列别名 As

sql 语句&#xff1a;SELECT total_bill AS total, sex AS xes FROM tips;

output &#61; tips.rename(columns&#61;{&#39;total_bill&#39;: &#39;total&#39;, &#39;sex&#39;: &#39;xes&#39;}, inplace&#61;False)

3. Limit 与 Offset

sql 语句&#xff1a;SELECT * FROM tips ORDER BY tip DESC LIMIT 10 OFFSET 5;

output &#61; tips.nlargest(10 &#43; 5, columns&#61;&#39;tip&#39;).tail(10)

4. 每个 Group 的前几行

sql 语句&#xff1a;

SELECT * FROM (

SELECT

t.*,

ROW_NUMBER() OVER(PARTITION BY day ORDER BY total_bill DESC) AS rn

FROM tips t

)

WHERE rn <3

ORDER BY day, rn;

output &#61; tips.assign(rn&#61;tips.sort_values([&#39;total_bill&#39;], ascending&#61;False). groupby([&#39;day&#39;]).cumcount() &#43; 1). query(&#39;rn <3&#39;). sort_values([&#39;day&#39;, &#39;rn&#39;])

七、Update 的使用方式

sql 语句&#xff1a;UPDATE tips SET tip &#61; tip*2 WHERE tip <2;

output &#61; tips.loc[tips[&#39;tip&#39;] <2, &#39;tip&#39;] *&#61; 2

八、Delete 的使用方式

sql 语句&#xff1a;DELETE FROM tips WHERE tip > 9;

output &#61; tips &#61; tips.loc[tips[&#39;tip&#39;] <&#61; 9]

九、参考文章

原文&#xff1a;https://www.cnblogs.com/yxhblogs/p/11026575.html



推荐阅读
author-avatar
mobiledu2502870587
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有