requests的高级功能-重试机制

requests中的重试


通过之前的流程图知道max_retries参数在HTTPAdapter初始化时设置。可以传递整数或者传递urllib3.util.Retry实例。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class HTTPAdapter(BaseAdapter):
"""The built-in HTTP Adapter for urllib3.
:param max_retries: The maximum number of retries each connection
should attempt. Note, this applies only to failed DNS lookups, socket
connections and connection timeouts, never to requests where data has
made it to the server. By default, Requests does not retry failed
connections. If you need granular control over the conditions under
which we retry a request, import urllib3's ``Retry`` class and pass
that instead.
Usage::
>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)
"""
__attrs__ = ['max_retries', 'config', '_pool_connections', '_pool_maxsize',
'_pool_block']
def __init__(self, pool_connections=DEFAULT_POOLSIZE,
pool_maxsize=DEFAULT_POOLSIZE, max_retries=DEFAULT_RETRIES,
pool_block=DEFAULT_POOLBLOCK):
if max_retries == DEFAULT_RETRIES:
self.max_retries = Retry(0, read=False)
else:
self.max_retries = Retry.from_int(max_retries)

Retry的设计比较简单,在HTTPConnectionPool中根据返回的异常和访问方法,区分是那种链接失败(connect? read?),然后减少对应的值即可。然后再判断是否所有的操作重试都归零,归零则报MaxRetries异常即可。不过对于每次重试之间的间隔使用了一个简单的backoff算法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
class Retry(object):
""" Retry configuration.
Each retry attempt will create a new Retry object with updated values, so
they can be safely reused.
Retries can be defined as a default for a pool::
retries = Retry(connect=5, read=2, redirect=5)
http = PoolManager(retries=retries)
response = http.request('GET', 'http://example.com/')
Or per-request (which overrides the default for the pool)::
response = http.request('GET', 'http://example.com/', retries=Retry(10))
Retries can be disabled by passing ``False``::
response = http.request('GET', 'http://example.com/', retries=False)
Errors will be wrapped in :class:`~urllib3.exceptions.MaxRetryError` unless
retries are disabled, in which case the causing exception will be raised.
:param int total:
Total number of retries to allow. Takes precedence over other counts.
Set to ``None`` to remove this constraint and fall back on other
counts. It's a good idea to set this to some sensibly-high value to
account for unexpected edge cases and avoid infinite retry loops.
Set to ``0`` to fail on the first retry.
Set to ``False`` to disable and imply ``raise_on_redirect=False``.
....
:param iterable method_whitelist:
Set of uppercased HTTP method verbs that we should retry on.
By default, we only retry on methods which are considered to be
indempotent (multiple requests with the same parameters end with the
same state). See :attr:`Retry.DEFAULT_METHOD_WHITELIST`.
:param iterable status_forcelist:
A set of HTTP status codes that we should force a retry on.
By default, this is disabled with ``None``.
:param float backoff_factor:
A backoff factor to apply between attempts. urllib3 will sleep for::
{backoff factor} * (2 ^ ({number of total retries} - 1))
seconds. If the backoff_factor is 0.1, then :func:`.sleep` will sleep
for [0.1s, 0.2s, 0.4s, ...] between retries. It will never be longer
than :attr:`Retry.BACKOFF_MAX`.
By default, backoff is disabled (set to 0).
"""
DEFAULT_METHOD_WHITELIST = frozenset([
'HEAD', 'GET', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'])
#: Maximum backoff time.
BACKOFF_MAX = 120
def __init__(self, total=10, connect=None, read=None, redirect=None,
method_whitelist=DEFAULT_METHOD_WHITELIST, status_forcelist=None,
backoff_factor=0, raise_on_redirect=True, raise_on_status=True,
_observed_errors=0):
self.total = total
self.connect = connect
self.read = read
if redirect is False or total is False:
redirect = 0
raise_on_redirect = False
self.redirect = redirect
self.status_forcelist = status_forcelist or set()
self.method_whitelist = method_whitelist
self.backoff_factor = backoff_factor
self.raise_on_redirect = raise_on_redirect
self.raise_on_status = raise_on_status
self._observed_errors = _observed_errors # TODO: use .history instead?
def get_backoff_time(self):
""" Formula for computing the current backoff
:rtype: float
"""
if self._observed_errors <= 1:
return 0
# 重试算法, _observed_erros就是第几次重试,每次失败这个值就+1.
# backoff_factor = 0.1, 重试的间隔为[0.1, 0.2, 0.4, 0.8, ..., BACKOFF_MAX(120)]
backoff_value = self.backoff_factor * (2 ** (self._observed_errors - 1))
return min(self.BACKOFF_MAX, backoff_value)
def sleep(self):
""" Sleep between retry attempts using an exponential backoff.
By default, the backoff factor is 0 and this method will return
immediately.
"""
backoff = self.get_backoff_time()
if backoff <= 0:
return
time.sleep(backoff)
def is_forced_retry(self, method, status_code):
""" Is this method/status code retryable? (Based on method/codes whitelists)
"""
if self.method_whitelist and method.upper() not in self.method_whitelist:
return False
return self.status_forcelist and status_code in self.status_forcelist
# For backwards compatibility (equivalent to pre-v1.9):
Retry.DEFAULT = Retry(3)

使用重试有几点需要注意:

  • 如果使用requests.get等简单形式,默认会重试3次
  • 重试只有在DNS解析错误、链接错误、链接超时等异常是才重试。在比如读取超时、写超时、HTTP协议错误等不会重试
  • 使用重试会导致返回的错误为MaxRetriesError,而不是确切的异常。