热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

Python中requests模块源码分析:requests是如何调用urllib3的

文章目录1.requests是怎么实现长链接的2.requests的Session作用是什么3.requests的模块在哪调用到了urllib34.Session类中的mount方




文章目录


      • 1. requests是怎么实现长链接的
      • 2. requests的Session作用是什么
      • 3. requests的模块在哪调用到了urllib3
      • 4. Session类中的mount方法做了什么
      • 5. HTTPAdapter对象
      • 6. Session类的send函数调用adapters过程
      • 7. 相关文章





1. requests是怎么实现长链接的

今天看一段代码的时候突然想到,requests是怎么实现长链接的?

然后一顿找,大致知道了requests是依靠Session类的请求头实现的(当然自定义请求头也没有问题)

class Session(SessionRedirectMixin):
__attrs__ = [
'headers', 'COOKIEs', 'auth', 'proxies', 'hooks', 'params', 'verify',
'cert', 'prefetch', 'adapters', 'stream', 'trust_env',
'max_redirects',
]
def __init__(self):
self.headers = default_headers()
............

def default_headers():
"""
:rtype: requests.structures.CaseInsensitiveDict
"""

return CaseInsensitiveDict({
'User-Agent': default_user_agent(),
'Accept-Encoding': ', '.join(('gzip', 'deflate')),
'Accept': '*/*',
'Connection': 'keep-alive',
})
可以看到默认请求头就是个长链接keep-alive

2. requests的Session作用是什么

那么requests的Session作用是什么?又是一顿找,最后在requests文档里面找到了这句话



会话对象:会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 COOKIE, 期间使用 urllib3 的 connection pooling 功能。


说白了就是实现了会话维持,真正使我感兴趣并写下这篇文章的,是最后一句话期间使用 urllib3 的 connection pooling 功能。


3. requests的模块在哪调用到了urllib3

那么requests的模块在哪调用到了urllib3?以及connection pooling具体实现了什么?第一个问题我们接下来跟着源码看一看,第二个问题留到下次讨论。

首先来看一看requests的Session类吧

class Session(SessionRedirectMixin):
__attrs__ = [
'headers', 'COOKIEs', 'auth', 'proxies', 'hooks', 'params', 'verify',
'cert', 'prefetch', 'adapters', 'stream', 'trust_env',
'max_redirects',
]
def __init__(self):
...............
self.adapters = OrderedDict()
self.mount('https://', HTTPAdapter())
self.mount('http://', HTTPAdapter())
..................

首先在Session类中,初始化方法有一个self.mount(),其中加载了HTTPAdapter(),那么首先来看一下mount方法做了什么


4. Session类中的mount方法做了什么

def mount(self, prefix, adapter):
"""Registers a connection adapter to a prefix.
Adapters are sorted in descending order by prefix length.
"""

self.adapters[prefix] = adapter
keys_to_move &#61; [k for k in self.adapters if len(k) < len(prefix)]
for key in keys_to_move:
self.adapters[key] &#61; self.adapters.pop(key)
def __getstate__(self):
state &#61; {attr: getattr(self, attr, None) for attr in self.__attrs__}
return state
def __setstate__(self, state):
for attr, value in state.items():
setattr(self, attr, value)
大致可以看出来&#xff0c;是组成了一个adapters的有序字典&#xff0c;key是http/https&#xff0c;value是HTTPAdapter对象&#xff1b;

def mount(self, prefix, adapter):
"""Registers a connection adapter to a prefix.
Adapters are sorted in descending order by prefix length.
"""

self.adapters[prefix] &#61; adapter
keys_to_move &#61; [k for k in self.adapters if len(k) < len(prefix)]
for key in keys_to_move:
self.adapters[key] &#61; self.adapters.pop(key)
def __getstate__(self):
state &#61; {attr: getattr(self, attr, None) for attr in self.__attrs__}
return state
def __setstate__(self, state):
for attr, value in state.items():
setattr(self, attr, value)
大致可以看出来&#xff0c;是组成了一个adapters的有序字典&#xff0c;key是http/https&#xff0c;value是HTTPAdapter对象&#xff1b;

adapters这之后基本上就是在Session类的send方法里面使用了&#xff0c;并没有涉及到pool的概念&#xff0c;重点就在它传递过来的HTTPAdapter这个对象


5. HTTPAdapter对象

首先看一段HTTPAdapter的注释
Usage::
>>> import requests
>>> s &#61; requests.Session()
>>> a &#61; requests.adapters.HTTPAdapter(max_retries&#61;3)
>>> s.mount(&#39;http://&#39;, a)
这里写的很清楚了&#xff0c;基本用法就是手动构造s.mount&#xff0c;而在倒数第二行可以看到&#xff0c;可以为HTTPAdapter手动传参&#xff0c;可以看到类有以下几个参数&#xff1a;
pool_connections&#61;DEFAULT_POOLSIZE, # 链接池容量
pool_maxsize&#61;DEFAULT_POOLSIZE, # 容量最大值&#xff0c;和上一个是一样的
max_retries&#61;DEFAULT_RETRIES, # 重试次数
pool_block&#61;DEFAULT_POOLBLOCK # 链接池是否阻止链接


class HTTPAdapter(BaseAdapter):
__attrs__ &#61; [&#39;max_retries&#39;, &#39;config&#39;, &#39;_pool_connections&#39;, &#39;_pool_maxsize&#39;,
&#39;_pool_block&#39;]
def __init__(self, pool_connections&#61;DEFAULT_POOLSIZE,
pool_maxsize&#61;DEFAULT_POOLSIZE, max_retries&#61;DEFAULT_RETRIES,
pool_block&#61;DEFAULT_POOLBLOCK):
if max_retries &#61;&#61; DEFAULT_RETRIES:
self.max_retries &#61; Retry(0, read&#61;False)
else:
self.max_retries &#61; Retry.from_int(max_retries)
self.config &#61; {}
self.proxy_manager &#61; {}
super(HTTPAdapter, self).__init__()
self._pool_connections &#61; pool_connections
self._pool_maxsize &#61; pool_maxsize
self._pool_block &#61; pool_block
self.init_poolmanager(pool_connections, pool_maxsize, block&#61;pool_block)

可以看出来&#xff0c;在这个对象中&#xff0c;定义了pool_connection的一系列属性&#xff0c;而且不仅仅是pool_connection&#xff0c;requests中的一系列配置&#xff0c;都是在这个类中完成proxy_headers/add_headers/request_url&#xff0c;甚至还有两个方法&#xff1a;get_connection/build_response&#xff1b;可以看出Adapter这个类是requests的一个核心类

那我们就从头捋一下requests的源码 # 太占空间了我只贴有用代码了
比如我发送一个post请求&#xff1a;requests.post(&#39;127.0.0.1:12345&#39;, {&#39;data&#39;: &#39;hello world&#39;})
# 进入requests.api
def post(url, data&#61;None, json&#61;None, **kwargs):
return request(&#39;post&#39;, url, data&#61;data, json&#61;json, **kwargs)
def request(method, url, **kwargs):
with sessions.Session() as session:
return session.request(method&#61;method, url&#61;url, **kwargs)

# 返回了一个session的对象&#xff0c;并调用了request方法&#xff0c;进入requests.session
class Session(SessionRedirectMixin):
...
def request(self, method, url,
params&#61;None, data&#61;None, headers&#61;None, COOKIEs&#61;None, files&#61;None,
auth&#61;None, timeout&#61;None, allow_redirects&#61;True, proxies&#61;None,
hooks&#61;None, stream&#61;None, verify&#61;None, cert&#61;None, json&#61;None):
....
resp &#61; self.send(prep, **send_kwargs) # 这里进入了send方法&#xff0c;不知道大家有没有印象&#xff0c;上面讲过send函数中调用了adapters&#xff0c;下面我会把具体调用步骤列出来
return resp
# 到此就和上面的串联了起来&#xff0c;adapters就是HTTPAdapter的对象

6. Session类的send函数调用adapters过程

def send(self, request, **kwargs):
.............
# Get the appropriate adapter to use
adapter &#61; self.get_adapter(url&#61;request.url) # 函数在下方
# Start time (approximately) of the request
start &#61; preferred_clock()
# Send the request
r &#61; adapter.send(request, **kwargs) # 调用了HttpAdapter的send方法
..........



def get_adapter(self, url): # 在get_adapter函数中取出了HttpAdapter对象
for (prefix, adapter) in self.adapters.items():
if url.lower().startswith(prefix.lower()):
return adapter

接下来就看看HTTPAdapter里面的send实现了什么&#xff0c;重头戏来了&#xff0c;下面的是HTTPAdapter类的send函数&#xff0c;注意不要和上面Session类的send搞混了

def send(self, request, stream&#61;False, timeout&#61;None, verify&#61;True, cert&#61;None, proxies&#61;None):
try:
conn &#61; self.get_connection(request.url, proxies) # 函数在下方
except LocationValueError as e:
raise InvalidURL(e, request&#61;request)
.........................................# 这一堆都是在配置和判断就略过了

# Receive the response from the server
try:
# For Python 2.7, use buffering of HTTP responses
r &#61; low_conn.getresponse(buffering&#61;True)
except TypeError:
# For compatibility with Python 3.3&#43;
r &#61; low_conn.getresponse()
resp &#61; HTTPResponse.from_httplib(
r,
pool&#61;conn,
connection&#61;low_conn,
preload_content&#61;False,
decode_content&#61;False
)
except:
# If we hit any problems here, clean up the connection.
# Then, reraise so that we can handle the actual exception.
low_conn.close()
raise
............................................# 这一堆都是在raise各个情况的error也略过了
return self.build_response(request, resp)


# get_connection func
# 这次注释特意留了下来&#xff0c;从注释可以看出来&#xff0c;send里面的get_connection返回的是一个urllib3链接&#xff0c;到这里终于能从requests的代码跳到urllib3了&#xff0c;而下面的proxy_manager.connection_from_url/self.poolmanager.connection_from_url其实就是在调用urllib3的模块了
def get_connection(self, url, proxies&#61;None):
"""Returns a urllib3 connection for the given URL. This should not be
called from user code, and is only exposed for use when subclassing the
:class:&#96;HTTPAdapter &#96;.
:param url: The URL to connect to.
:param proxies: (optional) A Requests-style dictionary of proxies used on this request.
:rtype: urllib3.ConnectionPool
"""

proxy &#61; select_proxy(url, proxies)
if proxy:
proxy &#61; prepend_scheme_if_needed(proxy, &#39;http&#39;)
proxy_url &#61; parse_url(proxy)
if not proxy_url.host:
raise InvalidProxyURL("Please check proxy URL. It is malformed"
" and could be missing the host.")
proxy_manager &#61; self.proxy_manager_for(proxy)
conn &#61; proxy_manager.connection_from_url(url)
else:
# Only scheme should be lower case
parsed &#61; urlparse(url)
url &#61; parsed.geturl()
conn &#61; self.poolmanager.connection_from_url(url)
return conn

追着源码跑了半天才看到调用的地方&#xff0c;requests源码不是很多&#xff0c;逻辑也很清晰&#xff0c;当然这里并没有深入的去讲解各个功能的实现&#xff0c;因为感觉太复杂 了&#xff0c;以我的文笔水平大概是写不出来的:P 所以只是简单的介绍了一下对urllib3的引用&#xff0c;有兴趣的童鞋可以自己去看一看&#xff0c;下一次 试着去看一看urllib3的源码


7. 相关文章


  • Requsets库的基本使用

  • requests.get()和requests.session.get()的区别与联系

  • python requests timeout详解







推荐阅读
author-avatar
美美2012的小幸福
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有