Opened 6 years ago

Last modified 6 years ago

#29323 closed Bug

HTTPRequest QueryDict wrongly decodes binary encoded values — at Version 3

Reported by: Thomas Riccardi Owned by: nobody
Component: HTTP handling Version: 1.11
Severity: Normal Keywords:
Cc: Claude Paroz Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Thomas Riccardi)

application/x-www-form-urlencoded bodies allow binary values: just percent-escape (most of) the bytes.

It doesn't work with python2 and django.

In python2, QueryDict returns an unicode string for values, which makes it impossible to represent binary data.
Worse: it returns an unicode string whose unicode code points are the raw expected bytes (caused by lax behavior of str.decode('iso-8859-1')) so it silently fails.

Example:

q = QueryDict('foo=%00%7f%80%ff')
q['foo']
# actual: 
# u'\x00\x7f\x80\xff'
# expected:
# '\x00\x7f\x80\xff'

Relevant code:
https://github.com/django/django/blob/f89b11b879b83aa505dc8231da5f06ca4b1b062e/django/http/request.py#L397-L401

This was introduced by https://github.com/django/django/commit/fa02120d360387bebbbe735e86686bb4c7c43db2 while trying to accept broken user agents that send non-ascii as urlencoded raw values:

  • the python 3 version seems fine: it tries to decode before calling the query string parser (from urllib)
  • the python 2 version made a mistake: it decodes after calling the query string parser (and only on the value, not the key strangely)

I'm not sure how to fix this, as there are many locations where the key and value are re-encoded (appendlist does it too..)

In python 3 it fails too, but probably only because of the forced post-query-string-parser decoding into unicode string: to support binary values we don't want decoding at this point.

A test for python3:

    def test_binary_input(self):
        """
        QueryDicts must be able to handle binary.
        """
        q = QueryDict(b'foo=%00%7f%80%ff')
        self.assertListEqual(list(map(ord, q['foo'])), [0x00, 0x7F, 0x80, 0xFF])
AssertionError: Lists differ: [0, 127, 65533, 65533] != [0, 127, 128, 255]

First differing element 2:
65533
128

- [0, 127, 65533, 65533]
+ [0, 127, 128, 255]

Change History (3)

comment:1 by Simon Charette, 6 years ago

At this point the only version of Django still supporting Python 2 is only receiving security fixes so unless you can also reproduce on Python 3 I'm afraid this will not get backported.

comment:2 by Tim Graham, 6 years ago

Resolution: wontfix
Status: newclosed
Type: UncategorizedBug

comment:3 by Thomas Riccardi, 6 years ago

Description: modified (diff)
Resolution: wontfix
Status: closednew
Summary: HTTPRequest QueryDict wrongly decodes binary encoded values in python2HTTPRequest QueryDict wrongly decodes binary encoded values

OK.

I tested with python3 and it breaks too, but only because the code explicitly tries to generates an unicode string (bytes_to_text in appendlist).

I updated the description to also add the python3 test.

To support binary we would need to add an explicit option to get the bytes (maybe an encoding that asks to keep bytes?); not sure how to expose it in HTTPRequest.

Note: See TracTickets for help on using tickets.
Back to Top