Skip to content

Commit d0d4d30

Browse files
orsenthilmerwokFidget-SpinnerAdamGold
authored
[3.7] bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24531)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <[email protected]> Co-authored-by: Ken Jin <[email protected]> Co-authored-by: Adam Goldschmidt <[email protected]> (cherry picked from commit fcbe0cb)
1 parent d9b8f13 commit d0d4d30

File tree

9 files changed

+152
-46
lines changed

9 files changed

+152
-46
lines changed

Doc/library/cgi.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,10 +277,10 @@ These are useful if you want more control, or if you want to employ some of the
277277
algorithms implemented in this module in other circumstances.
278278

279279

280-
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
280+
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281281

282282
Parse a query in the environment or from a file (the file defaults to
283-
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
283+
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284284
passed to :func:`urllib.parse.parse_qs` unchanged.
285285

286286

@@ -296,7 +296,7 @@ algorithms implemented in this module in other circumstances.
296296
instead. It is maintained here only for backward compatibility.
297297

298298

299-
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace")
299+
.. function:: parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&")
300300

301301
Parse input of type :mimetype:`multipart/form-data` (for file uploads).
302302
Arguments are *fp* for the input file, *pdict* for a dictionary containing
@@ -315,6 +315,9 @@ algorithms implemented in this module in other circumstances.
315315
Added the *encoding* and *errors* parameters. For non-file fields, the
316316
value is now a list of strings, not bytes.
317317

318+
.. versionchanged:: 3.7.10
319+
Added the *separator* parameter.
320+
318321

319322
.. function:: parse_header(string)
320323

Doc/library/urllib.parse.rst

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ or on combining URL components into a URL string.
165165
now raise :exc:`ValueError`.
166166

167167

168-
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
168+
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
169169

170170
Parse a query string given as a string argument (data of type
171171
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
@@ -190,6 +190,9 @@ or on combining URL components into a URL string.
190190
read. If set, then throws a :exc:`ValueError` if there are more than
191191
*max_num_fields* fields read.
192192

193+
The optional argument *separator* is the symbol to use for separating the
194+
query arguments. It defaults to ``&``.
195+
193196
Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
194197
parameter set to ``True``) to convert such dictionaries into query
195198
strings.
@@ -200,8 +203,14 @@ or on combining URL components into a URL string.
200203
.. versionchanged:: 3.7.2
201204
Added *max_num_fields* parameter.
202205

206+
.. versionchanged:: 3.7.10
207+
Added *separator* parameter with the default value of ``&``. Python
208+
versions earlier than Python 3.7.10 allowed using both ``;`` and ``&`` as
209+
query parameter separator. This has been changed to allow only a single
210+
separator key, with ``&`` as the default separator.
211+
203212

204-
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
213+
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
205214

206215
Parse a query string given as a string argument (data of type
207216
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
@@ -225,6 +234,9 @@ or on combining URL components into a URL string.
225234
read. If set, then throws a :exc:`ValueError` if there are more than
226235
*max_num_fields* fields read.
227236

237+
The optional argument *separator* is the symbol to use for separating the
238+
query arguments. It defaults to ``&``.
239+
228240
Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
229241
query strings.
230242

@@ -234,6 +246,13 @@ or on combining URL components into a URL string.
234246
.. versionchanged:: 3.7.2
235247
Added *max_num_fields* parameter.
236248

249+
.. versionchanged:: 3.7.10
250+
Added *separator* parameter with the default value of ``&``. Python
251+
versions earlier than Python 3.7.10 allowed using both ``;`` and ``&`` as
252+
query parameter separator. This has been changed to allow only a single
253+
separator key, with ``&`` as the default separator.
254+
255+
237256
.. function:: urlunparse(parts)
238257

239258
Construct a URL from a tuple as returned by ``urlparse()``. The *parts*

Doc/whatsnew/3.6.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2443,3 +2443,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
24432443
details, see the documentation for ``loop.create_datagram_endpoint()``.
24442444
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
24452445
:issue:`37228`.)
2446+
2447+
Notable changes in Python 3.6.13
2448+
================================
2449+
2450+
Earlier Python versions allowed using both ``;`` and ``&`` as
2451+
query parameter separators in :func:`urllib.parse.parse_qs` and
2452+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2453+
newer W3C recommendations, this has been changed to allow only a single
2454+
separator key, with ``&`` as the default. This change also affects
2455+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2456+
functions internally. For more details, please see their respective
2457+
documentation.
2458+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Doc/whatsnew/3.7.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2572,3 +2572,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
25722572
details, see the documentation for ``loop.create_datagram_endpoint()``.
25732573
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
25742574
:issue:`37228`.)
2575+
2576+
Notable changes in Python 3.7.10
2577+
================================
2578+
2579+
Earlier Python versions allowed using both ``;`` and ``&`` as
2580+
query parameter separators in :func:`urllib.parse.parse_qs` and
2581+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2582+
newer W3C recommendations, this has been changed to allow only a single
2583+
separator key, with ``&`` as the default. This change also affects
2584+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2585+
functions internally. For more details, please see their respective
2586+
documentation.
2587+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Lib/cgi.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,8 @@ def closelog():
117117
# 0 ==> unlimited input
118118
maxlen = 0
119119

120-
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
120+
def parse(fp=None, environ=os.environ, keep_blank_values=0,
121+
strict_parsing=0, separator='&'):
121122
"""Parse a query in the environment or from a file (default stdin)
122123
123124
Arguments, all optional:
@@ -136,6 +137,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
136137
strict_parsing: flag indicating what to do with parsing errors.
137138
If false (the default), errors are silently ignored.
138139
If true, errors raise a ValueError exception.
140+
141+
separator: str. The symbol to use for separating the query arguments.
142+
Defaults to &.
139143
"""
140144
if fp is None:
141145
fp = sys.stdin
@@ -156,7 +160,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
156160
if environ['REQUEST_METHOD'] == 'POST':
157161
ctype, pdict = parse_header(environ['CONTENT_TYPE'])
158162
if ctype == 'multipart/form-data':
159-
return parse_multipart(fp, pdict)
163+
return parse_multipart(fp, pdict, separator=separator)
160164
elif ctype == 'application/x-www-form-urlencoded':
161165
clength = int(environ['CONTENT_LENGTH'])
162166
if maxlen and clength > maxlen:
@@ -180,7 +184,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
180184
qs = ""
181185
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
182186
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
183-
encoding=encoding)
187+
encoding=encoding, separator=separator)
184188

185189

186190
# parse query string function called from urlparse,
@@ -198,7 +202,7 @@ def parse_qsl(qs, keep_blank_values=0, strict_parsing=0):
198202
DeprecationWarning, 2)
199203
return urllib.parse.parse_qsl(qs, keep_blank_values, strict_parsing)
200204

201-
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
205+
def parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator='&'):
202206
"""Parse multipart input.
203207
204208
Arguments:
@@ -222,7 +226,7 @@ def parse_multipart(fp, pdict, encoding="utf-8", errors="replace"):
222226
except KeyError:
223227
pass
224228
fs = FieldStorage(fp, headers=headers, encoding=encoding, errors=errors,
225-
environ={'REQUEST_METHOD': 'POST'})
229+
environ={'REQUEST_METHOD': 'POST'}, separator=separator)
226230
return {k: fs.getlist(k) for k in fs}
227231

228232
def _parseparam(s):
@@ -332,7 +336,7 @@ class FieldStorage:
332336
def __init__(self, fp=None, headers=None, outerboundary=b'',
333337
environ=os.environ, keep_blank_values=0, strict_parsing=0,
334338
limit=None, encoding='utf-8', errors='replace',
335-
max_num_fields=None):
339+
max_num_fields=None, separator='&'):
336340
"""Constructor. Read multipart/* until last part.
337341
338342
Arguments, all optional:
@@ -380,6 +384,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
380384
self.keep_blank_values = keep_blank_values
381385
self.strict_parsing = strict_parsing
382386
self.max_num_fields = max_num_fields
387+
self.separator = separator
383388
if 'REQUEST_METHOD' in environ:
384389
method = environ['REQUEST_METHOD'].upper()
385390
self.qs_on_post = None
@@ -606,7 +611,7 @@ def read_urlencoded(self):
606611
query = urllib.parse.parse_qsl(
607612
qs, self.keep_blank_values, self.strict_parsing,
608613
encoding=self.encoding, errors=self.errors,
609-
max_num_fields=self.max_num_fields)
614+
max_num_fields=self.max_num_fields, separator=self.separator)
610615
self.list = [MiniFieldStorage(key, value) for key, value in query]
611616
self.skip_lines()
612617

@@ -622,7 +627,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
622627
query = urllib.parse.parse_qsl(
623628
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
624629
encoding=self.encoding, errors=self.errors,
625-
max_num_fields=self.max_num_fields)
630+
max_num_fields=self.max_num_fields, separator=self.separator)
626631
self.list.extend(MiniFieldStorage(key, value) for key, value in query)
627632

628633
klass = self.FieldStorageClass or self.__class__
@@ -666,7 +671,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
666671
else self.limit - self.bytes_read
667672
part = klass(self.fp, headers, ib, environ, keep_blank_values,
668673
strict_parsing, limit,
669-
self.encoding, self.errors, max_num_fields)
674+
self.encoding, self.errors, max_num_fields, self.separator)
670675

671676
if max_num_fields is not None:
672677
max_num_fields -= 1

Lib/test/test_cgi.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,9 @@ def do_test(buf, method):
5555
("", ValueError("bad query field: ''")),
5656
("&", ValueError("bad query field: ''")),
5757
("&&", ValueError("bad query field: ''")),
58-
(";", ValueError("bad query field: ''")),
59-
(";&;", ValueError("bad query field: ''")),
6058
# Should the next few really be valid?
6159
("=", {}),
6260
("=&=", {}),
63-
("=;=", {}),
6461
# This rest seem to make sense
6562
("=a", {'': ['a']}),
6663
("&=a", ValueError("bad query field: ''")),
@@ -75,8 +72,6 @@ def do_test(buf, method):
7572
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
7673
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
7774
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
78-
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
79-
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
8075
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
8176
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
8277
'cuyer': ['r'],
@@ -212,6 +207,30 @@ def test_strict(self):
212207
else:
213208
self.assertEqual(fs.getvalue(key), expect_val[0])
214209

210+
def test_separator(self):
211+
parse_semicolon = [
212+
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
213+
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
214+
(";", ValueError("bad query field: ''")),
215+
(";;", ValueError("bad query field: ''")),
216+
("=;a", ValueError("bad query field: 'a'")),
217+
(";b=a", ValueError("bad query field: ''")),
218+
("b;=a", ValueError("bad query field: 'b'")),
219+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
220+
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
221+
]
222+
for orig, expect in parse_semicolon:
223+
env = {'QUERY_STRING': orig}
224+
fs = cgi.FieldStorage(separator=';', environ=env)
225+
if isinstance(expect, dict):
226+
for key in expect.keys():
227+
expect_val = expect[key]
228+
self.assertIn(key, fs)
229+
if len(expect_val) > 1:
230+
self.assertEqual(fs.getvalue(key), expect_val)
231+
else:
232+
self.assertEqual(fs.getvalue(key), expect_val[0])
233+
215234
def test_log(self):
216235
cgi.log("Testing")
217236

Lib/test/test_urlparse.py

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,10 @@
3232
(b"&a=b", [(b'a', b'b')]),
3333
(b"a=a+b&b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
3434
(b"a=1&a=2", [(b'a', b'1'), (b'a', b'2')]),
35-
(";", []),
36-
(";;", []),
37-
(";a=b", [('a', 'b')]),
38-
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
39-
("a=1;a=2", [('a', '1'), ('a', '2')]),
40-
(b";", []),
41-
(b";;", []),
42-
(b";a=b", [(b'a', b'b')]),
43-
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
44-
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
35+
(";a=b", [(';a', 'b')]),
36+
("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
37+
(b";a=b", [(b';a', b'b')]),
38+
(b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
4539
]
4640

4741
# Each parse_qs testcase is a two-tuple that contains
@@ -68,16 +62,10 @@
6862
(b"&a=b", {b'a': [b'b']}),
6963
(b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
7064
(b"a=1&a=2", {b'a': [b'1', b'2']}),
71-
(";", {}),
72-
(";;", {}),
73-
(";a=b", {'a': ['b']}),
74-
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
75-
("a=1;a=2", {'a': ['1', '2']}),
76-
(b";", {}),
77-
(b";;", {}),
78-
(b";a=b", {b'a': [b'b']}),
79-
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
80-
(b"a=1;a=2", {b'a': [b'1', b'2']}),
65+
(";a=b", {';a': ['b']}),
66+
("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
67+
(b";a=b", {b';a': [b'b']}),
68+
(b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
8169
]
8270

8371
class UrlParseTestCase(unittest.TestCase):
@@ -884,10 +872,46 @@ def test_parse_qsl_encoding(self):
884872
def test_parse_qsl_max_num_fields(self):
885873
with self.assertRaises(ValueError):
886874
urllib.parse.parse_qs('&'.join(['a=a']*11), max_num_fields=10)
887-
with self.assertRaises(ValueError):
888-
urllib.parse.parse_qs(';'.join(['a=a']*11), max_num_fields=10)
889875
urllib.parse.parse_qs('&'.join(['a=a']*10), max_num_fields=10)
890876

877+
def test_parse_qs_separator(self):
878+
parse_qs_semicolon_cases = [
879+
(";", {}),
880+
(";;", {}),
881+
(";a=b", {'a': ['b']}),
882+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
883+
("a=1;a=2", {'a': ['1', '2']}),
884+
(b";", {}),
885+
(b";;", {}),
886+
(b";a=b", {b'a': [b'b']}),
887+
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
888+
(b"a=1;a=2", {b'a': [b'1', b'2']}),
889+
]
890+
for orig, expect in parse_qs_semicolon_cases:
891+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
892+
result = urllib.parse.parse_qs(orig, separator=';')
893+
self.assertEqual(result, expect, "Error parsing %r" % orig)
894+
895+
896+
def test_parse_qsl_separator(self):
897+
parse_qsl_semicolon_cases = [
898+
(";", []),
899+
(";;", []),
900+
(";a=b", [('a', 'b')]),
901+
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
902+
("a=1;a=2", [('a', '1'), ('a', '2')]),
903+
(b";", []),
904+
(b";;", []),
905+
(b";a=b", [(b'a', b'b')]),
906+
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
907+
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
908+
]
909+
for orig, expect in parse_qsl_semicolon_cases:
910+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
911+
result = urllib.parse.parse_qsl(orig, separator=';')
912+
self.assertEqual(result, expect, "Error parsing %r" % orig)
913+
914+
891915
def test_urlencode_sequences(self):
892916
# Other tests incidentally urlencode things; test non-covered cases:
893917
# Sequence and object values.

0 commit comments

Comments
 (0)