当前位置: 开发笔记 > 编程语言 > 正文

python多进程输出没有回车_Python多进程

作者： | 来源：互联网 | 2023-09-23 00:14

如果想要充分利用，在python中大部分情况需要使用多进程，那么这个包就叫做multiprocessing。借助它，可以轻松完成从单进程到

如果想要充分利用&＃xff0c;在python中大部分情况需要使用多进程&＃xff0c;那么这个包就叫做 multiprocessing。

借助它&＃xff0c;可以轻松完成从单进程到并发执行的转换。multiprocessing支持子进程、通信和共享数据、执行不同形式的同步&＃xff0c;提供了Process、Queue、Pipe、Lock等组件。

那么本节要介绍的内容有&＃xff1a;

Process

Lock

Semaphore

Queue

Pipe

Pool

Process

基本使用

在multiprocessing中&＃xff0c;每一个进程都用一个Process类来表示。首先看下它的API

Process([group [, target [, name [, args [, kwargs]]]]])

target表示调用对象&＃xff0c;你可以传入方法的名字

args表示被调用对象的位置参数元组&＃xff0c;比如target是函数a&＃xff0c;他有两个参数m&＃xff0c;n&＃xff0c;那么args就传入(m, n)即可

kwargs表示调用对象的字典

name是别名&＃xff0c;相当于给这个进程取一个名字

group分组&＃xff0c;实际上不使用

importmultiprocessingdefprocess(num):print &＃39;Process:&＃39;, numif __name__ &＃61;&＃61; &＃39;__main__&＃39;:for i in range(5):

p&＃61; multiprocessing.Process(target&＃61;process, args&＃61;(i,))

p.start()

最简单的创建Process的过程如上所示&＃xff0c;target传入函数名&＃xff0c;args是函数的参数&＃xff0c;是元组的形式&＃xff0c;如果只有一个参数&＃xff0c;那就是长度为1的元组。

然后调用start()方法即可启动多个进程了。

另外你还可以通过 cpu_count() 方法还有 active_children() 方法获取当前机器的 CPU 核心数量以及得到目前所有的运行的进程。

通过一个实例来感受一下&＃xff1a;

importmultiprocessingimporttimedefprocess(num):

time.sleep(num)print &＃39;Process:&＃39;, numif __name__ &＃61;&＃61; &＃39;__main__&＃39;:for i in range(5):

p&＃61; multiprocessing.Process(target&＃61;process, args&＃61;(i,))

p.start()print(&＃39;CPU number:&＃39; &＃43;str(multiprocessing.cpu_count()))for p inmultiprocessing.active_children():print(&＃39;Child process name:&＃39; &＃43; p.name &＃43; &＃39;id:&＃39; &＃43;str(p.pid))print(&＃39;Process Ended&＃39;)

运行结果&＃xff1a;

Process: 0

CPU number:8Child process name: Process-2 id: 9641Child process name: Process-4 id: 9643Child process name: Process-5 id: 9644Child process name: Process-3 id: 9642Process Ended

Process:1Process:2Process:3Process:4

自定义类

另外你还可以继承Process类&＃xff0c;自定义进程类&＃xff0c;实现run方法即可。

用一个实例来感受一下&＃xff1a;

from multiprocessing importProcessimporttimeclassMyProcess(Process):def __init__(self, loop):

Process.__init__(self)

self.loop&＃61;loopdefrun(self):for count inrange(self.loop):

time.sleep(1)print(&＃39;Pid:&＃39; &＃43; str(self.pid) &＃43; &＃39;LoopCount:&＃39; &＃43;str(count))if __name__ &＃61;&＃61; &＃39;__main__&＃39;:for i in range(2, 5):

p&＃61;MyProcess(i)

p.start()

在上面的例子中&＃xff0c;我们继承了 Process 这个类&＃xff0c;然后实现了run方法。打印出来了进程号和参数。

运行结果&＃xff1a;

Pid: 28116LoopCount: 0

Pid:28117LoopCount: 0

Pid:28118LoopCount: 0

Pid:28116 LoopCount: 1Pid:28117 LoopCount: 1Pid:28118 LoopCount: 1Pid:28117 LoopCount: 2Pid:28118 LoopCount: 2Pid:28118 LoopCount: 3

可以看到&＃xff0c;三个进程分别打印出了2、3、4条结果。

我们可以把一些方法独立的写在每个类里封装好&＃xff0c;等用的时候直接初始化一个类运行即可。

deamon

在这里介绍一个属性&＃xff0c;叫做deamon。每个线程都可以单独设置它的属性&＃xff0c;如果设置为True&＃xff0c;当父进程结束后&＃xff0c;子进程会自动被终止。

用一个实例来感受一下&＃xff0c;还是原来的例子&＃xff0c;增加了deamon属性&＃xff1a;

from multiprocessing importProcessimporttimeclassMyProcess(Process):def __init__(self, loop):

Process.__init__(self)

self.loop&＃61;loopdefrun(self):for count inrange(self.loop):

time.sleep(1)print(&＃39;Pid:&＃39; &＃43; str(self.pid) &＃43; &＃39;LoopCount:&＃39; &＃43;str(count))if __name__ &＃61;&＃61; &＃39;__main__&＃39;:for i in range(2, 5):

p&＃61;MyProcess(i)

p.daemon&＃61;True

p.start()print &＃39;Main process Ended!&＃39;

在这里&＃xff0c;调用的时候增加了设置deamon&＃xff0c;最后的主进程(即父进程)打印输出了一句话。

运行结果&＃xff1a;

Main process Ended!

结果很简单&＃xff0c;因为主进程没有做任何事情&＃xff0c;直接输出一句话结束&＃xff0c;所以在这时也直接终止了子进程的运行。

这样可以有效防止无控制地生成子进程。如果这样写了&＃xff0c;你在关闭这个主程序运行时&＃xff0c;就无需额外担心子进程有没有被关闭了。

不过这样并不是我们想要达到的效果呀&＃xff0c;能不能让所有子进程都执行完了然后再结束呢&＃xff1f;那当然是可以的&＃xff0c;只需要加入join()方法即可。

from multiprocessing importProcessimporttimeclassMyProcess(Process):def __init__(self, loop):

Process.__init__(self)

self.loop&＃61;loopdefrun(self):for count inrange(self.loop):

time.sleep(1)print(&＃39;Pid:&＃39; &＃43; str(self.pid) &＃43; &＃39;LoopCount:&＃39; &＃43;str(count))if __name__ &＃61;&＃61; &＃39;__main__&＃39;:for i in range(2, 5):

p&＃61;MyProcess(i)

p.daemon&＃61;True

p.start()

p.join()print &＃39;Main process Ended!&＃39;

在这里&＃xff0c;每个子进程都调用了join()方法&＃xff0c;这样父进程(主进程)就会等待子进程执行完毕。

运行结果&＃xff1a;

Pid: 29902LoopCount: 0

Pid:29902 LoopCount: 1Pid:29905LoopCount: 0

Pid:29905 LoopCount: 1Pid:29905 LoopCount: 2Pid:29912LoopCount: 0

Pid:29912 LoopCount: 1Pid:29912 LoopCount: 2Pid:29912 LoopCount: 3Main process Ended!

发现所有子进程都执行完毕之后&＃xff0c;父进程最后打印出了结束的结果。

Lock

在上面的一些小实例中&＃xff0c;你可能会遇到如下的运行结果&＃xff1a;

什么问题&＃xff1f;有的输出错位了。这是由于并行导致的&＃xff0c;两个进程同时进行了输出&＃xff0c;结果第一个进程的换行没有来得及输出&＃xff0c;第二个进程就输出了结果。所以导致这种排版的问题。

那这归根结底是因为线程同时资源(输出操作)而导致的。

那怎么来避免这种问题&＃xff1f;那自然是在某一时间&＃xff0c;只能一个进程输出&＃xff0c;其他进程等待。等刚才那个进程输出完毕之后&＃xff0c;另一个进程再进行输出。这种现象就叫做“互斥”。

我们可以通过 Lock 来实现&＃xff0c;在一个进程输出时&＃xff0c;加锁&＃xff0c;其他进程等待。等此进程执行结束后&＃xff0c;释放锁&＃xff0c;其他进程可以进行输出。

我们现用一个实例来感受一下&＃xff1a;

from multiprocessing importProcess, LockimporttimeclassMyProcess(Process):def __init__(self, loop, lock):

Process.__init__(self)

self.loop&＃61;loop

self.lock&＃61;lockdefrun(self):for count inrange(self.loop):

time.sleep(0.1)#self.lock.acquire()

print(&＃39;Pid:&＃39; &＃43; str(self.pid) &＃43; &＃39;LoopCount:&＃39; &＃43;str(count))#self.lock.release()

if __name__ &＃61;&＃61; &＃39;__main__&＃39;:

lock&＃61;Lock()for i in range(10, 15):

p&＃61;MyProcess(i, lock)

p.start()

首先看一下不加锁的输出结果&＃xff1a;

Pid: 45755LoopCount: 0

Pid:45756LoopCount: 0

Pid:45757LoopCount: 0

Pid:45758LoopCount: 0

Pid:45759LoopCount: 0

Pid:45755 LoopCount: 1Pid:45756 LoopCount: 1Pid:45757 LoopCount: 1Pid:45758 LoopCount: 1Pid:45759 LoopCount: 1Pid:45755 LoopCount: 2Pid: 45756 LoopCount: 2Pid:45757 LoopCount: 2Pid:45758 LoopCount: 2Pid:45759 LoopCount: 2Pid:45756 LoopCount: 3Pid:45755 LoopCount: 3Pid:45757 LoopCount: 3Pid:45758 LoopCount: 3Pid:45759 LoopCount: 3Pid:45755 LoopCount: 4Pid:45756 LoopCount: 4Pid:45757 LoopCount: 4Pid:45759 LoopCount: 4Pid:45758 LoopCount: 4Pid:45756 LoopCount: 5Pid:45755 LoopCount: 5Pid:45757 LoopCount: 5Pid:45759 LoopCount: 5Pid:45758 LoopCount: 5Pid:45756 LoopCount: 6Pid: 45755 LoopCount: 6Pid:45757 LoopCount: 6Pid:45759 LoopCount: 6Pid:45758 LoopCount: 6Pid:45755 LoopCount: 7Pid: 45756 LoopCount: 7Pid:45757 LoopCount: 7Pid:45758 LoopCount: 7Pid:45759 LoopCount: 7Pid:45756 LoopCount: 8Pid: 45755 LoopCount: 8Pid:45757 LoopCount: 8Pid:45758 LoopCount: 8Pid: 45759 LoopCount: 8Pid:45755 LoopCount: 9Pid:45756 LoopCount: 9Pid:45757 LoopCount: 9Pid:45758 LoopCount: 9Pid:45759 LoopCount: 9Pid:45756 LoopCount: 10Pid:45757 LoopCount: 10Pid:45758 LoopCount: 10Pid:45759 LoopCount: 10Pid:45757 LoopCount: 11Pid:45758 LoopCount: 11Pid:45759 LoopCount: 11Pid:45758 LoopCount: 12Pid:45759 LoopCount: 12Pid:45759 LoopCount: 13

可以看到有些输出已经造成了影响。

然后我们对其加锁&＃xff1a;

from multiprocessing importProcess, LockimporttimeclassMyProcess(Process):def __init__(self, loop, lock):

Process.__init__(self)

self.loop&＃61;loop

self.lock&＃61;lockdefrun(self):for count inrange(self.loop):

time.sleep(0.1)

self.lock.acquire()print(&＃39;Pid:&＃39; &＃43; str(self.pid) &＃43; &＃39;LoopCount:&＃39; &＃43;str(count))

self.lock.release()if __name__ &＃61;&＃61; &＃39;__main__&＃39;:

lock&＃61;Lock()for i in range(10, 15):

p&＃61;MyProcess(i, lock)

p.start()

我们在print方法的前后分别添加了获得锁和释放锁的操作。这样就能保证在同一时间只有一个print操作。

看一下运行结果&＃xff1a;

Pid: 45889LoopCount: 0

Pid:45890LoopCount: 0

Pid:45891LoopCount: 0

Pid:45892LoopCount: 0

Pid:45893LoopCount: 0

Pid:45889 LoopCount: 1Pid:45890 LoopCount: 1Pid:45891 LoopCount: 1Pid:45892 LoopCount: 1Pid:45893 LoopCount: 1Pid:45889 LoopCount: 2Pid:45890 LoopCount: 2Pid:45891 LoopCount: 2Pid:45892 LoopCount: 2Pid:45893 LoopCount: 2Pid:45889 LoopCount: 3Pid:45890 LoopCount: 3Pid:45891 LoopCount: 3Pid:45892 LoopCount: 3Pid:45893 LoopCount: 3Pid:45889 LoopCount: 4Pid:45890 LoopCount: 4Pid:45891 LoopCount: 4Pid:45892 LoopCount: 4Pid:45893 LoopCount: 4Pid:45889 LoopCount: 5Pid:45890 LoopCount: 5Pid:45891 LoopCount: 5Pid:45892 LoopCount: 5Pid:45893 LoopCount: 5Pid:45889 LoopCount: 6Pid:45890 LoopCount: 6Pid:45891 LoopCount: 6Pid:45893 LoopCount: 6Pid:45892 LoopCount: 6Pid:45889 LoopCount: 7Pid:45890 LoopCount: 7Pid:45891 LoopCount: 7Pid:45892 LoopCount: 7Pid:45893 LoopCount: 7Pid:45889 LoopCount: 8Pid:45890 LoopCount: 8Pid:45891 LoopCount: 8Pid:45892 LoopCount: 8Pid:45893 LoopCount: 8Pid:45889 LoopCount: 9Pid:45890 LoopCount: 9Pid:45891 LoopCount: 9Pid:45892 LoopCount: 9Pid:45893 LoopCount: 9Pid:45890 LoopCount: 10Pid:45891 LoopCount: 10Pid:45892 LoopCount: 10Pid:45893 LoopCount: 10Pid:45891 LoopCount: 11Pid:45892 LoopCount: 11Pid:45893 LoopCount: 11Pid:45893 LoopCount: 12Pid:45892 LoopCount: 12Pid:45893 LoopCount: 13

嗯&＃xff0c;一切都没问题了。

所以在访问临界资源时&＃xff0c;使用Lock就可以避免进程同时占用资源而导致的一些问题。

Semaphore

信号量&＃xff0c;是在进程同步过程中一个比较重要的角色。可以控制临界资源的数量&＃xff0c;保证各个进程之间的互斥和同步。

如果你学过操作系统&＃xff0c;那么一定对这方面非常了解&＃xff0c;如果你还不了解信号量是什么&＃xff0c;可以参考

来了解一下它是做什么的。

那么接下来我们就用一个实例来演示一下进程之间利用Semaphore做到同步和互斥&＃xff0c;以及控制临界资源数量

from multiprocessing importProcess, Semaphore, Lock, Queueimporttime

buffer&＃61; Queue(10)

empty&＃61; Semaphore(2)

full&＃61;Semaphore(0)

lock&＃61;Lock()classConsumer(Process):defrun(self):globalbuffer, empty, full, lockwhileTrue:

full.acquire()

lock.acquire()

buffer.get()print(&＃39;Consumer pop an element&＃39;)

time.sleep(1)

lock.release()

empty.release()classProducer(Process):defrun(self):globalbuffer, empty, full, lockwhileTrue:

empty.acquire()

lock.acquire()

buffer.put(1)print(&＃39;Producer append an element&＃39;)

time.sleep(1)

lock.release()

full.release()if __name__ &＃61;&＃61; &＃39;__main__&＃39;:

p&＃61;Producer()

c&＃61;Consumer()

p.daemon&＃61; c.daemon &＃61;True

p.start()

c.start()

p.join()

c.join()print &＃39;Ended!&＃39;

如上代码实现了注明的生产者和消费者问题&＃xff0c;定义了两个进程类&＃xff0c;一个是消费者&＃xff0c;一个是生产者。

定义了一个共享队列&＃xff0c;利用了Queue数据结构&＃xff0c;然后定义了两个信号量&＃xff0c;一个代表缓冲区空余数&＃xff0c;一个表示缓冲区占用数。

生产者Producer使用empty.acquire()方法来占用一个缓冲区位置&＃xff0c;然后缓冲区空闲区大小减小1&＃xff0c;接下来进行加锁&＃xff0c;对缓冲区进行操作。然后释放锁&＃xff0c;然后让代表占用的缓冲区位置数量&＃43;1&＃xff0c;消费者则相反。

运行结果如下&＃xff1a;

Producer append an element

Consumer pop an element

Producer append an element

Consumer pop an element

Producer append an element

Consumer pop an element

Producer append an element

可以发现两个进程在交替运行&＃xff0c;生产者先放入缓冲区物品&＃xff0c;然后消费者取出&＃xff0c;不停地进行循环。

通过上面的例子来体会一下信号量的用法。

Queue

在上面的例子中我们使用了Queue&＃xff0c;可以作为进程通信的共享队列使用。

在上面的程序中&＃xff0c;如果你把Queue换成普通的list&＃xff0c;是完全起不到效果的。即使在一个进程中改变了这个list&＃xff0c;在另一个进程也不能获取到它的状态。

因此进程间的通信&＃xff0c;队列需要用Queue。当然这里的队列指的是 multiprocessing.Queue

依然是用上面那个例子&＃xff0c;我们一个进程向队列中放入数据&＃xff0c;然后另一个进程取出数据。

from multiprocessing importProcess, Semaphore, Lock, Queueimporttimefrom random importrandom

buffer&＃61; Queue(10)

empty&＃61; Semaphore(2)

full&＃61;Semaphore(0)

lock&＃61;Lock()classConsumer(Process):defrun(self):globalbuffer, empty, full, lockwhileTrue:

full.acquire()

lock.acquire()print &＃39;Consumer get&＃39;, buffer.get()

time.sleep(1)

lock.release()

empty.release()classProducer(Process):defrun(self):globalbuffer, empty, full, lockwhileTrue:

empty.acquire()

lock.acquire()

num&＃61;random()print &＃39;Producer put&＃39;, num

buffer.put(num)

time.sleep(1)

lock.release()

full.release()if __name__ &＃61;&＃61; &＃39;__main__&＃39;:

p&＃61;Producer()

c&＃61;Consumer()

p.daemon&＃61; c.daemon &＃61;True

p.start()

c.start()

p.join()

c.join()print &＃39;Ended!&＃39;

运行结果&＃xff1a;

Producer put 0.719213647437Producer put0.44287326683Consumer get0.719213647437Consumer get0.44287326683Producer put0.722859424381Producer put0.525321338921Consumer get0.722859424381Consumer get0.525321338921

可以看到生产者放入队列中数据&＃xff0c;然后消费者将数据取出来。

get方法有两个参数&＃xff0c;blocked和timeout&＃xff0c;意思为阻塞和超时时间。默认blocked是true&＃xff0c;即阻塞式。

当一个队列为空的时候如果再用get取则会阻塞&＃xff0c;所以这时候就需要吧blocked设置为false&＃xff0c;即非阻塞式&＃xff0c;实际上它就会调用get_nowait()方法&＃xff0c;此时还需要设置一个超时时间&＃xff0c;在这么长的时间内还没有取到队列元素&＃xff0c;那就抛出Queue.Empty异常。

当一个队列为满的时候如果再用put放则会阻塞&＃xff0c;所以这时候就需要吧blocked设置为false&＃xff0c;即非阻塞式&＃xff0c;实际上它就会调用put_nowait()方法&＃xff0c;此时还需要设置一个超时时间&＃xff0c;在这么长的时间内还没有放进去元素&＃xff0c;那就抛出Queue.Full异常。

另外队列中常用的方法

Queue.qsize() 返回队列的大小 &＃xff0c;不过在 Mac OS 上没法运行。

原因&＃xff1a;

def qsize(self):

# Raises NotImplementedError on Mac OSX because of broken sem_getvalue()

return self._maxsize – self._sem._semlock._get_value()

Queue.empty() 如果队列为空&＃xff0c;返回True, 反之False

Queue.full() 如果队列满了&＃xff0c;返回True,反之False

Queue.get([block[, timeout]]) 获取队列&＃xff0c;timeout等待时间

Queue.get_nowait() 相当Queue.get(False)

Queue.put(item) 阻塞式写入队列&＃xff0c;timeout等待时间

Queue.put_nowait(item) 相当Queue.put(item, False)

Pipe

管道&＃xff0c;顾名思义&＃xff0c;一端发一端收。

Pipe可以是单向(half-duplex)&＃xff0c;也可以是双向(duplex)。我们通过mutiprocessing.Pipe(duplex&＃61;False)创建单向管道 (默认为双向)。一个进程从PIPE一端输入对象&＃xff0c;然后被PIPE另一端的进程接收&＃xff0c;单向管道只允许管道一端的进程输入&＃xff0c;而双向管道则允许从两端输入。

用一个实例来感受一下&＃xff1a;

from multiprocessing importProcess, PipeclassConsumer(Process):def __init__(self, pipe):

Process.__init__(self)

self.pipe&＃61;pipedefrun(self):

self.pipe.send(&＃39;Consumer Words&＃39;)print &＃39;Consumer Received:&＃39;, self.pipe.recv()classProducer(Process):def __init__(self, pipe):

Process.__init__(self)

self.pipe&＃61;pipedefrun(self):print &＃39;Producer Received:&＃39;, self.pipe.recv()

self.pipe.send(&＃39;Producer Words&＃39;)if __name__ &＃61;&＃61; &＃39;__main__&＃39;:

pipe&＃61;Pipe()

p&＃61;Producer(pipe[0])

c&＃61; Consumer(pipe[1])

p.daemon&＃61; c.daemon &＃61;True

p.start()

c.start()

p.join()

c.join()print &＃39;Ended!&＃39;

在这里声明了一个默认为双向的管道&＃xff0c;然后将管道的两端分别传给两个进程。两个进程互相收发。观察一下结果&＃xff1a;

Producer Received: Consumer Words

Consumer Received: Producer Words

Ended!

Pool

在利用Python进行系统管理的时候&＃xff0c;特别是同时操作多个文件目录&＃xff0c;或者远程控制多台主机&＃xff0c;并行操作可以节约大量的时间。当被操作对象数目不大时&＃xff0c;可以直接利用multiprocessing中的Process动态成生多个进程&＃xff0c;十几个还好&＃xff0c;但如果是上百个&＃xff0c;上千个目标&＃xff0c;手动的去限制进程数量却又太过繁琐&＃xff0c;此时可以发挥进程池的功效。

Pool可以提供指定数量的进程&＃xff0c;供用户调用&＃xff0c;当有新的请求提交到pool中时&＃xff0c;如果池还没有满&＃xff0c;那么就会创建一个新的进程用来执行该请求&＃xff1b;但如果池中的进程数已经达到规定最大值&＃xff0c;那么该请求就会等待&＃xff0c;直到池中有进程结束&＃xff0c;才会创建新的进程来它。

在这里需要了解阻塞和非阻塞的概念。

阻塞和非阻塞关注的是程序在等待调用结果(消息&＃xff0c;返回值)时的状态。

阻塞即要等到回调结果出来&＃xff0c;在有结果之前&＃xff0c;当前进程会被挂起。

Pool的用法有阻塞和非阻塞两种方式。非阻塞即为添加进程后&＃xff0c;不一定非要等到改进程执行完就添加其他进程运行&＃xff0c;阻塞则相反。

现用一个实例感受一下非阻塞的用法&＃xff1a;

from multiprocessing importLock, Poolimporttimedeffunction(index):print &＃39;Start process:&＃39;, index

time.sleep(3)print &＃39;End process&＃39;, indexif __name__ &＃61;&＃61; &＃39;__main__&＃39;:

pool&＃61; Pool(processes&＃61;3)for i in xrange(4):

pool.apply_async(function, (i,))print "Started processes"pool.close()

pool.join()print "Subprocess done."

在这里利用了apply_async方法&＃xff0c;即非阻塞。

运行结果&＃xff1a;

Started processes

Start process: Start process: 01Start process:2End processEnd process 01Start process:3End process2End process3Subprocess done.

可以发现在这里添加三个进程进去后&＃xff0c;立马就开始执行&＃xff0c;不用非要等到某个进程结束后再添加新的进程进去。

下面再看看阻塞的用法&＃xff1a;

from multiprocessing importLock, Poolimporttimedeffunction(index):print &＃39;Start process:&＃39;, index

time.sleep(3)print &＃39;End process&＃39;, indexif __name__ &＃61;&＃61; &＃39;__main__&＃39;:

pool&＃61; Pool(processes&＃61;3)for i in xrange(4):

pool.apply(function, (i,))print "Started processes"pool.close()

pool.join()print "Subprocess done."

在这里只需要把apply_async改成apply即可。

运行结果如下&＃xff1a;

Start process: 0

End process 0

Start process:1End process1Start process:2End process2Start process:3End process3Started processes

Subprocess done.

这样一来就好理解了吧&＃xff1f;

下面对函数进行解释&＃xff1a;

apply_async(func[, args[, kwds[, callback]]]) 它是非阻塞&＃xff0c;apply(func[, args[, kwds]])是阻塞的。

close() 关闭pool&＃xff0c;使其不在接受新的任务。

terminate() 结束工作进程&＃xff0c;不在处理未完成的任务。

join() 主进程阻塞&＃xff0c;等待子进程的退出&＃xff0c; join方法要在close或terminate之后使用。

当然每个进程可以在各自的方法返回一个结果。apply或apply_async方法可以拿到这个结果并进一步进行处理。

from multiprocessing importLock, Poolimporttimedeffunction(index):print &＃39;Start process:&＃39;, index

time.sleep(3)print &＃39;End process&＃39;, indexreturnindexif __name__ &＃61;&＃61; &＃39;__main__&＃39;:

pool&＃61; Pool(processes&＃61;3)for i in xrange(4):

result&＃61;pool.apply_async(function, (i,))printresult.get()print "Started processes"pool.close()

pool.join()print "Subprocess done."

运行结果&＃xff1a;

Start process: 0

End process 0

Start process:1End process1

1Start process:2End process2

2Start process:3End process3

3Started processes

Subprocess done.

另外还有一个非常好用的map方法。

如果你现在有一堆数据要处理&＃xff0c;每一项都需要经过一个方法来处理&＃xff0c;那么map非常适合。

比如现在你有一个数组&＃xff0c;包含了所有的URL&＃xff0c;而现在已经有了一个方法用来抓取每个URL内容并解析&＃xff0c;那么可以直接在map的第一个参数传入方法名&＃xff0c;第二个参数传入URL数组。

现在我们用一个实例来感受一下&＃xff1a;

from multiprocessing importPoolimportrequestsfrom requests.exceptions importConnectionErrordefscrape(url):try:printrequests.get(url)exceptConnectionError:print &＃39;Error Occured&＃39;, urlfinally:print &＃39;URL&＃39;, url, &＃39;Scraped&＃39;

if __name__ &＃61;&＃61; &＃39;__main__&＃39;:

pool&＃61; Pool(processes&＃61;3)

urls&＃61;[&＃39;https://www.baidu.com&＃39;,&＃39;http://www.meituan.com/&＃39;,&＃39;http://blog.csdn.net/&＃39;,&＃39;http://xxxyxxx.net&＃39;]

pool.map(scrape, urls)

在这里初始化一个Pool&＃xff0c;指定进程数为3&＃xff0c;如果不指定&＃xff0c;那么会自动根据CPU内核来分配进程数。

然后有一个链接列表&＃xff0c;map函数可以遍历每个URL&＃xff0c;然后对其分别执行scrape方法。

运行结果&＃xff1a;

URL http://blog.csdn.net/ScrapedURL https://www.baidu.com Scraped

Error Occured http://xxxyxxx.net

URL http://xxxyxxx.net ScrapedURL http://www.meituan.com/ Scraped

多进程multiprocessing相比多线程功能强大太多