当前位置: 开发笔记 > 编程语言 > 正文

Python并发编程Futures

作者：手机用户2602901285 | 来源：互联网 | 2023-09-18 20:47

目录1.并行和并发2.并发编程之Futures2.1单线程与多线程性能比较2.2到底什么是Futures？2.3为什么多线程每次只能有一个线程执行ÿ

1.并行和并发

2.并发编程之 Futures

2.1 单线程与多线程性能比较

2.2 到底什么是 Futures &＃xff1f;

2.3 为什么多线程每次只能有一个线程执行&＃xff1f;

2.4 在Futures&＃xff0c;如何判断任务是否完成以及获取结果&＃xff1f;

无论对于哪门语言&＃xff0c;并发编程都是一项很常用很重要的技巧。正确合理地使用并发编程&＃xff0c;无疑会给程序带来极大的性能提升。

1.并行和并发

并发&＃xff08;Concurrency&＃xff09;和并行&＃xff08;Parallelism&＃xff09;这两个术语经常一起使用&＃xff0c;导致很多人以为它们是一个意思&＃xff0c;其实不然。

在 Python 中&＃xff0c;并发并不是指同一时刻有多个操作&＃xff08;thread、task&＃xff09;同时进行。相反&＃xff0c;某个特定的时刻&＃xff0c;它只允许有一个操作发生&＃xff0c;只不过线程 / 任务之间会互相切换&＃xff0c;直到完成。

我们来看下面这张图&＃xff0c;图中出现了 thread 和 task 两种切换顺序的不同方式&＃xff0c;分别对应 Python 中并发的两种形式——threading 和 asyncio。

对于 threading&＃xff0c;操作系统知道每个线程的所有信息&＃xff0c;因此它会做主在适当的时候做线程切换。很显然&＃xff0c;这样的好处是代码容易书写&＃xff0c;因为程序员不需要做任何切换操作的处理&＃xff1b;但是切换线程的操作&＃xff0c;也有可能出现在一个语句执行的过程中&＃xff08;比如 x &＃43;&＃61; 1&＃xff09;&＃xff0c;这样就容易出现 race condition 的情况。

而对于 asyncio&＃xff0c;主程序想要切换任务时&＃xff0c;必须得到此任务可以被切换的通知&＃xff0c;这样一来也就可以避免刚刚提到的 race condition 的情况。

至于所谓的并行&＃xff0c;指的才是同一时刻、同时发生。Python 中的 multi-processing 便是这个意思&＃xff0c;对于 multi-processing&＃xff0c;你可以简单地这么理解&＃xff1a;比如你的电脑是 6 核处理器&＃xff0c;那么在运行程序时&＃xff0c;就可以强制 Python 开 6 个进程&＃xff0c;同时执行&＃xff0c;以加快运行速度&＃xff0c;它的原理示意图如下&＃xff1a;

简单对比并发和并行&＃xff1a;

并发通常应用于 I/O 操作频繁的场景&＃xff0c;比如你要从网站上下载多个文件&＃xff0c;I/O 操作的时间可能会比 CPU 运行处理的时间长得多。
而并行则更多应用于 CPU heavy 的场景&＃xff0c;比如 MapReduce 中的并行计算&＃xff0c;为了加快运行速度&＃xff0c;一般会用多台机器、多个处理器来完成。

2.并发编程之 Futures

2.1 单线程与多线程性能比较

首先通过一个具体的实例&＃xff0c;从代码的角度来理解并发编程中的 Futures&＃xff0c;并进一步来比较其与单线程的性能区别。假设我们有一个任务&＃xff0c;是下载一些网站的内容并打印&＃xff0c;如果用单线程的方式&＃xff0c;它的基本代码实现如下&＃xff1a;

流程比较简单&＃xff1a;

先是遍历存储网站的列表&＃xff1b;
然后对当前网站执行下载操作&＃xff1b;
等到当前操作完成后&＃xff0c;再对下一个网站进行同样的操作&＃xff0c;一直到结束。

import requests import timedef download_one(url):resp &＃61; requests.get(url)print(&＃39;Read {} from {}&＃39;.format(len(resp.content), url))def download_all(sites):for site in sites:download_one(site)def main():sites &＃61; [&＃39;https://www.baidu.com/s?wd&＃61;Portal:Arts&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:History&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Society&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Biography&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Mathematics&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Technology&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Geography&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Science&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Computer_science&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Python_(programming_language)&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Java_(programming_language)&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;PHP&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Node.js&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;The_C_Programming_Language&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Go_(programming_language)&＃39;]start_time &＃61; time.perf_counter()download_all(sites)end_time &＃61; time.perf_counter()print(&＃39;Download {} sites in {} seconds&＃39;.format(len(sites), end_time - start_time))if __name__ &＃61;&＃61; &＃39;__main__&＃39;:main()

运行结果&＃xff1a;

Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Arts Read 227 from https://www.baidu.com/s?wd&＃61;Portal:History Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Society Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Biography Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Mathematics Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Technology Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Geography Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Science Read 227 from https://www.baidu.com/s?wd&＃61;Computer_science Read 227 from https://www.baidu.com/s?wd&＃61;Python_(programming_language) Read 227 from https://www.baidu.com/s?wd&＃61;Java_(programming_language) Read 227 from https://www.baidu.com/s?wd&＃61;PHP Read 227 from https://www.baidu.com/s?wd&＃61;Node.js Read 227 from https://www.baidu.com/s?wd&＃61;The_C_Programming_Language Read 227 from https://www.baidu.com/s?wd&＃61;Go_(programming_language) Download 15 sites in 1.053478813 seconds

我们可以看到总共耗时约 1s&＃43;。单线程的优点是简单明了&＃xff0c;但是明显效率低下&＃xff0c;因为上述程序的绝大多数时间&＃xff0c;都浪费在了 I/O 等待上。程序每次对一个网站执行下载操作&＃xff0c;都必须等到前一个网站下载完成后才能开始。如果放在实际生产环境中&＃xff0c;我们需要下载的网站数量至少是以万为单位的&＃xff0c;不难想象&＃xff0c;这种方案根本行不通。

接着我们再来看看多线程版本的代码实现&＃xff1a;

import concurrent.futures import requests import threading import timedef download_one(url):resp &＃61; requests.get(url)print(&＃39;Read {} from {}&＃39;.format(len(resp.content), url))def download_all(sites):with concurrent.futures.ThreadPoolExecutor(max_workers&＃61;5) as executor:executor.map(download_one, sites)def main():sites &＃61; [&＃39;https://www.baidu.com/s?wd&＃61;Portal:Arts&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:History&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Society&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Biography&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Mathematics&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Technology&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Geography&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Science&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Computer_science&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Python_(programming_language)&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Java_(programming_language)&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;PHP&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Node.js&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;The_C_Programming_Language&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Go_(programming_language)&＃39;]start_time &＃61; time.perf_counter()download_all(sites)end_time &＃61; time.perf_counter()print(&＃39;Download {} sites in {} seconds&＃39;.format(len(sites), end_time - start_time))if __name__ &＃61;&＃61; &＃39;__main__&＃39;:main()

运行结果&＃xff1a;

Read 227 from https://www.baidu.com/s?wd&＃61;Portal:History Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Mathematics Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Society Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Arts Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Biography Read 227 from https://www.baidu.com/s?wd&＃61;Computer_science Read 227 from https://www.baidu.com/s?wd&＃61;Python_(programming_language) Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Science Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Geography Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Technology Read 227 from https://www.baidu.com/s?wd&＃61;Java_(programming_language) Read 227 from https://www.baidu.com/s?wd&＃61;PHP Read 227 from https://www.baidu.com/s?wd&＃61;Node.js Read 227 from https://www.baidu.com/s?wd&＃61;The_C_Programming_Language Read 227 from https://www.baidu.com/s?wd&＃61;Go_(programming_language) Download 15 sites in 0.25518121699999996 seconds

非常明显&＃xff0c;总耗时是 0.25s 左右&＃xff0c;效率一下子提升了 4 倍多。具体来看这段代码&＃xff0c;它是多线程版本和单线程版的主要区别所在&＃xff1a;

with concurrent.futures.ThreadPoolExecutor(max_workers&＃61;5) as executor:executor.map(download_one, sites)

这里我们创建了一个线程池&＃xff0c;总共有 5 个线程可以分配使用。executor.map() 与前面所讲的 Python 内置的 map() 函数类似&＃xff0c;表示对 sites 中的每一个元素&＃xff0c;并发地调用函数 download_one()。

顺便提一下&＃xff0c;在 download_one() 函数中&＃xff0c;我们使用的 requests.get() 方法是线程安全的&＃xff08;thread-safe&＃xff09;&＃xff0c;因此在多线程的环境下&＃xff0c;它也可以安全使用&＃xff0c;并不会出现 race condition 的情况。

另外&＃xff0c;虽然线程的数量可以自己定义&＃xff0c;但是线程数并不是越多越好&＃xff0c;因为线程的创建、维护和删除也会有一定的开销。所以如果你设置的很大&＃xff0c;反而可能会导致速度变慢。我们往往需要根据实际的需求做一些测试&＃xff0c;来寻找最优的线程数量。

当然&＃xff0c;我们也可以用并行的方式去提高程序运行效率。你只需要在 download_all() 函数中&＃xff0c;做出下面的变化即可&＃xff1a;

with futures.ThreadPoolExecutor(workers) as executor &＃61;> with futures.ProcessPoolExecutor() as executor:

在需要修改的这部分代码中&＃xff0c;函数 ProcessPoolExecutor() 表示创建进程池&＃xff0c;使用多个进程并行的执行程序。不过&＃xff0c;这里我们通常省略参数 workers&＃xff0c;因为系统会自动返回 CPU 的数量作为可以调用的进程数。

并行的方式一般用在 CPU heavy 的场景中&＃xff0c;因为对于 I/O heavy 的操作&＃xff0c;多数时间都会用于等待&＃xff0c;相比于多线程&＃xff0c;使用多进程并不会提升效率。反而很多时候&＃xff0c;因为 CPU 数量的限制&＃xff0c;会导致其执行效率不如多线程版本。

2.2 到底什么是 Futures &＃xff1f;

Python 中的 Futures 模块&＃xff0c;位于 concurrent.futures 和 asyncio 中&＃xff0c;它们都表示带有延迟的操作。Futures 会将处于等待状态的操作包裹起来放到队列中&＃xff0c;这些操作的状态随时可以查询&＃xff0c;当然&＃xff0c;它们的结果或是异常&＃xff0c;也能够在操作完成后被获取。

通常来说&＃xff0c;作为用户&＃xff0c;我们不用考虑如何去创建 Futures&＃xff0c;这些 Futures 底层都会帮我们处理好。我们要做的&＃xff0c;实际上是去 schedule 这些 Futures 的执行。

比如&＃xff0c;Futures 中的 Executor 类&＃xff0c;当我们执行 executor.submit(func) 时&＃xff0c;它便会安排里面的 func() 函数执行&＃xff0c;并返回创建好的 future 实例&＃xff0c;以便你之后查询调用。

这里再介绍一些常用的函数。Futures 中的方法 done()&＃xff0c;表示相对应的操作是否完成——True 表示完成&＃xff0c;False 表示没有完成。不过&＃xff0c;要注意&＃xff0c;done() 是 non-blocking 的&＃xff0c;会立即返回结果。相对应的 add_done_callback(fn)&＃xff0c;则表示 Futures 完成后&＃xff0c;相对应的参数函数 fn&＃xff0c;会被通知并执行调用。

Futures 中还有一个重要的函数 result()&＃xff0c;它表示当 future 完成后&＃xff0c;返回其对应的结果或异常。而 as_completed(fs)&＃xff0c;则是针对给定的 future 迭代器 fs&＃xff0c;在其完成后&＃xff0c;返回完成后的迭代器。

所以&＃xff0c;上述例子也可以写成下面的形式&＃xff1a;

import concurrent.futures import requests import timedef download_one(url):resp &＃61; requests.get(url)print(&＃39;Read {} from {}&＃39;.format(len(resp.content), url))def download_all(sites):with concurrent.futures.ThreadPoolExecutor(max_workers&＃61;5) as executor:to_do &＃61; []for site in sites:future &＃61; executor.submit(download_one, site)to_do.append(future)for future in concurrent.futures.as_completed(to_do):future.result() def main():sites &＃61; [&＃39;https://www.baidu.com/s?wd&＃61;Portal:Arts&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:History&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Society&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Biography&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Mathematics&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Technology&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Geography&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Portal:Science&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Computer_science&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Python_(programming_language)&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Java_(programming_language)&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;PHP&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Node.js&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;The_C_Programming_Language&＃39;,&＃39;https://www.baidu.com/s?wd&＃61;Go_(programming_language)&＃39;]start_time &＃61; time.perf_counter()download_all(sites)end_time &＃61; time.perf_counter()print(&＃39;Download {} sites in {} seconds&＃39;.format(len(sites), end_time - start_time))if __name__ &＃61;&＃61; &＃39;__main__&＃39;:main()

运行结果&＃xff1a;

Read 227 from https://www.baidu.com/s?wd&＃61;Portal:History Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Arts Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Society Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Biography Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Mathematics Read 227 from https://www.baidu.com/s?wd&＃61;Python_(programming_language) Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Geography Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Technology Read 227 from https://www.baidu.com/s?wd&＃61;Portal:Science Read 227 from https://www.baidu.com/s?wd&＃61;Computer_science Read 227 from https://www.baidu.com/s?wd&＃61;Go_(programming_language) Read 227 from https://www.baidu.com/s?wd&＃61;PHP Read 227 from https://www.baidu.com/s?wd&＃61;Node.js Read 227 from https://www.baidu.com/s?wd&＃61;The_C_Programming_Language Read 227 from https://www.baidu.com/s?wd&＃61;Java_(programming_language) Download 15 sites in 0.6074131429999999 seconds

我们首先调用 executor.submit()&＃xff0c;将下载每一个网站的内容都放进 future 队列 to_do&＃xff0c;等待执行。然后是 as_completed() 函数&＃xff0c;在 future 完成后&＃xff0c;便输出结果。不过&＃xff0c;这里要注意&＃xff0c;future 列表中每个 future 完成的顺序&＃xff0c;和它在列表中的顺序并不一定完全一致。到底哪个先完成、哪个后完成&＃xff0c;取决于系统的调度和每个 future 的执行时间。

2.3 为什么多线程每次只能有一个线程执行&＃xff1f;

同一时刻&＃xff0c;Python 主程序只允许有一个线程执行&＃xff0c;所以 Python 的并发&＃xff0c;是通过多线程的切换完成的。你可能会疑惑这到底是为什么呢&＃xff1f;

事实上&＃xff0c;Python 的解释器并不是线程安全的&＃xff0c;为了解决由此带来的 race condition 等问题&＃xff0c;Python 便引入了全局解释器锁&＃xff0c;也就是同一时刻&＃xff0c;只允许一个线程执行。当然&＃xff0c;在执行 I/O 操作时&＃xff0c;如果一个线程被 block 了&＃xff0c;全局解释器锁便会被释放&＃xff0c;从而让另一个线程能够继续执行。

2.4 在Futures&＃xff0c;如何判断任务是否完成以及获取结果&＃xff1f;

这里补充2个demo&＃xff0c;推荐使用第二种&＃xff0c;因为第一种方式是显示等待后拿到结果&＃xff0c;如果不知道任务运行多久&＃xff0c;这样会比较笨&＃xff08;额&＃xff0c;我刚开始就是这样弄的&＃xff09;&＃xff0c;第二种方法就要优雅很多了&＃xff0c;通过future.done()判断线程执行状态是否结束和future.result()拿到函数的返回结果&＃xff0c;直接看代码&＃xff1a;

方法1&＃xff1a;

import time from logzero import logger from concurrent.futures import ThreadPoolExecutor # 线程池模块# 全局变量&＃xff0c;线程池临时结果存储 _thread_pool_executor_result &＃61; []def demo1():logger.info("start demo1 ...")time.sleep(5)logger.info("end demo1 ...")return ["demo1 result"]def callback(data):"""# 线程池运行回调函数:param data::return:"""# 修改全局变量&＃xff0c;需要使用关键字globalglobal _thread_pool_executor_result_thread_pool_executor_result &＃61; data.result()def main():pool &＃61; ThreadPoolExecutor(1)logger.info("debug 1")# 执行完线程后&＃xff0c;跟一个函数回调函数pool.submit(demo1, ).add_done_callback(callback)logger.info("debug 2")logger.info(_thread_pool_executor_result)logger.info("debug 3")time.sleep(6)logger.info(_thread_pool_executor_result)if __name__ &＃61;&＃61; &＃39;__main__&＃39;:# 主函数main()

运行结果&＃xff1a;

[I 211115 09:16:26 demo222:30] debug 1 [I 211115 09:16:26 demo222:10] start demo1 ... [I 211115 09:16:26 demo222:33] debug 2 [I 211115 09:16:26 demo222:34] [] [I 211115 09:16:26 demo222:35] debug 3 [I 211115 09:16:31 demo222:12] end demo1 ... [I 211115 09:16:32 demo222:37] [&＃39;demo1 result&＃39;]

方法2:

import time from logzero import logger from concurrent.futures import ThreadPoolExecutor # 线程池模块# 全局变量&＃xff0c;线程池临时结果存储 _thread_pool_executor_result &＃61; []def demo1():logger.info("start demo1 ...")time.sleep(5)logger.info("end demo1 ...")return ["demo1 result"]def callback(data):"""# 线程池运行回调函数:param data::return:"""# 修改全局变量&＃xff0c;需要使用关键字globalglobal _thread_pool_executor_result_thread_pool_executor_result &＃61; data.result()def main():pool &＃61; ThreadPoolExecutor(1)logger.info("debug 1")# Futures 中的方法 done()&＃xff0c;表示相对应的操作是否完成——True表示完成&＃xff0c;False 表示没有完成future &＃61; pool.submit(demo1,)logger.info("debug 2")# Ps&＃xff1a;这里可以加个超时等待机制&＃xff0c;不然可能死循环while True:if future.done():logger.info("future job done.")logger.info(future.result())breakelse:logger.info("future job not done, will sleep 1s.")time.sleep(1)if __name__ &＃61;&＃61; &＃39;__main__&＃39;:# 主函数main()

运行结果&＃xff1a;

[I 211115 09:15:15 demo222:30] debug 1 [I 211115 09:15:15 demo222:10] start demo1 ... [I 211115 09:15:15 demo222:41] debug 2 [I 211115 09:15:15 demo222:49] future job not done, will sleep 1s. [I 211115 09:15:16 demo222:49] future job not done, will sleep 1s. [I 211115 09:15:17 demo222:49] future job not done, will sleep 1s. [I 211115 09:15:18 demo222:49] future job not done, will sleep 1s. [I 211115 09:15:19 demo222:49] future job not done, will sleep 1s. [I 211115 09:15:20 demo222:12] end demo1 ... [I 211115 09:15:20 demo222:45] future job done. [I 211115 09:15:20 demo222:46] [&＃39;demo1 result&＃39;]