requests的高级功能-认证机制与redirect机制

requests中的认证机制


用户可以通过Session.request接口传入auth参数指定用户名和密码。auth参数可以是(username, password)的数组;也可以是HTTPBaiscAuth类似的实例,只要支持调用即可。

1
2
3
:param auth: (optional) Auth tuple or callable to enable
Basic/Digest/Custom HTTP Auth.

认证信息主要从多个方面来获取

  • 通过auth参数指定
  • 否则,从URI中获取。(http://username:passwd@www.sina.com)
  • 否则,从Session.auth中获取。( 通过session.auth=(username,passwd)设置 )
  • 否则,从.netrc中获取

.netrc

requests也支持.netrc.netrc用于记录访问的认证信息,具体的语法可以参考这里,大致语法如下。

machine definitions

认证信息

1
2
3
4
5
6
7
machine ftp.freebsd.org
login anonymous
password edwin@mavetju.org
machine myownmachine
login myusername
password mypassword

macro definitions

定义ftp bash登录后的执行命令

1
2
3
4
5
6
7
8
9
10
11
macdef uploadtest
cd /pub/tests
bin
put filename.tar.gz
quit
macdef dailyupload
cd /pub/tests
bin
put daily-$1.tar.gz
quit

requests中的redirect机制


当访问www.sina.com时,会发现requests中缓存了两个地址www.sina.comwww.sina.com.cn,因为前一个地址会被重定向到后一个地址上。当我们用curl工具直接访问会发现,该地址返回了301 Moved Permanently以及Location: http://www.sina.com.cn。于是requests会自动对重定向地址再次发起请求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ curl -i -X GET 'http://www.sina.com'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 178 100 178 0 0 712 0 --:--:-- --:--:-- --:--:-- 712HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Tue, 02 Aug 2016 05:20:52 GMT
Content-Type: text/html
Location: http://www.sina.com.cn/ # 重定向后的地址
Expires: Tue, 02 Aug 2016 05:22:52 GMT
Cache-Control: max-age=120
Age: 96
Content-Length: 178
X-Cache: HIT from xd33-78.sina.com.cn
<html>
<head><title>301 Moved Permanently</title></head> # status_code和Moved Permanently
<body bgcolor="white">
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx</center>
</body>
</html>

重定向后的访问逻辑主要在SessionRedirectMixin中(具体的请求过程分析参见这里)

1
2
3
4
5
6
7
8
9
10
11
12
13
resp = requests.get('http://www.sina.com')
resp # 返回重定向后的真实访问请求response
Out[10]: <Response [200]>
resp.url
Out[11]: u'http://www.sina.com.cn/'
resp.history
Out[12]: [<Response [301]>] # 多个重定向的访问response
resp.history[0].url
Out[13]: u'http://www.sina.com/' # 最初传入的地址。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
class SessionRedirectMixin(object):
def resolve_redirects(self, resp, req, stream=False, timeout=None,
verify=True, cert=None, proxies=None, **adapter_kwargs):
"""Receives a Response. Returns a generator of Responses."""
i = 0
hist = [] # keep track of history
# is_redirect就是返回的code in [301, 302, 303, 307, 308] 且headers中有Location字段
while resp.is_redirect:
prepared_request = req.copy()
if i > 0:
# Update history and keep track of redirects.
hist.append(resp)
new_hist = list(hist)
resp.history = new_hist
try:
resp.content # Consume socket so it can be released
except (ChunkedEncodingError, ContentDecodingError, RuntimeError):
resp.raw.read(decode_content=False)
if i >= self.max_redirects: # 默认值为30
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
# Release the connection back into the pool.
resp.close()
url = resp.headers['location'] # Location地址
# Handle redirection without scheme (see: RFC 1808 Section 4)
if url.startswith('//'):
parsed_rurl = urlparse(resp.url)
url = '%s:%s' % (parsed_rurl.scheme, url)
# The scheme should be lower case...
parsed = urlparse(url)
url = parsed.geturl()
# Facilitate relative 'location' headers, as allowed by RFC 7231.
# (e.g. '/path/to/resource' instead of 'http://domain.tld/path/to/resource')
# Compliant with RFC3986, we percent encode the url.
if not parsed.netloc:
url = urljoin(resp.url, requote_uri(url))
else:
url = requote_uri(url)
prepared_request.url = to_native_string(url)
# Cache the url, unless it redirects to itself.
if resp.is_permanent_redirect and req.url != prepared_request.url:
self.redirect_cache[req.url] = prepared_request.url
self.rebuild_method(prepared_request, resp)
# https://github.com/kennethreitz/requests/issues/1084
if resp.status_code not in (codes.temporary_redirect, codes.permanent_redirect):
if 'Content-Length' in prepared_request.headers:
del prepared_request.headers['Content-Length']
prepared_request.body = None
headers = prepared_request.headers
try:
del headers['Cookie']
except KeyError:
pass
# Extract any cookies sent on the response to the cookiejar
# in the new request. Because we've mutated our copied prepared
# request, use the old one that we haven't yet touched.
extract_cookies_to_jar(prepared_request._cookies, req, resp.raw)
prepared_request._cookies.update(self.cookies)
prepared_request.prepare_cookies(prepared_request._cookies)
# Rebuild auth and proxy information.
proxies = self.rebuild_proxies(prepared_request, proxies)
self.rebuild_auth(prepared_request, resp)
# Override the original request.
req = prepared_request
# 重新发起send操作。
resp = self.send(
req,
stream=stream,
timeout=timeout,
verify=verify,
cert=cert,
proxies=proxies,
allow_redirects=False,
**adapter_kwargs
)
extract_cookies_to_jar(self.cookies, prepared_request, resp.raw)
i += 1
yield resp

一般情况下只有指定的方法能够重定向

1
2
允许重定向: GET OPTIONS HEAD
不能重定向:POST PUT PATCH DELETE

根据重定向返回的状态码和访问方法,对重定向地址的访问需要修改访问方法

1
2
3
4
原访问方法 返回的状态码 状态码名字 新访问方法
GET/OPTIONS 303 see_other GET
GET/OPTIONS 302 found GET
POST 301 moved GET