Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#22996 closed Bug (fixed)

UnicodeDecodeError on accessing `request.GET`

Reported by: jorgecarleitao Owned by: nobody
Component: HTTP handling Version: 1.6
Severity: Normal Keywords:
Cc: jorgecarleitao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Aymeric Augustin)

I'm getting a non-deterministic error while running Django 1.6.5 in production, when I try to access request.GET:

    Traceback (most recent call last):
    
      File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response
        response = wrapped_callback(request, *callback_args, **callback_kwargs)
    
      File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted
        context = build_costumer_list_context(context, request.GET)
    
      File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get
        raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')

from a request of the form:

    'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3'

I'm using one middleware, 'django.middleware.locale.LocaleMiddleware' and página is a translation.

This error occurs ~1 every 200 pageviews (estimated), and it seems to occur only on requests with gets of the form ?página=....

I will gladly help on this, although I'm not familiar with HTTP handling, thus I would need some guidance on what could be and where I should start looking.

Attachments (2)

22996-1.6.diff (2.1 KB ) - added by Claude Paroz 10 years ago.
22996-master.diff (2.2 KB ) - added by Claude Paroz 10 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 by Aymeric Augustin, 10 years ago

Description: modified (diff)

Could you provide the full stack trace please?

comment:2 by jorgecarleitao, 10 years ago

Thanks for formatting it Aymeric.

This is all I have in the email I receive with the traceback (removed 2 informations that could help to identify the user):

Traceback (most recent call last):

  File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)

  File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted
    context = build_costumer_list_context(context, request.GET)

  File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get
    raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1: invalid continuation byte


<WSGIRequest
path:/categoria/8696/contratados,
GET:<could not parse>,
POST:<QueryDict: {}>,
COOKIES:{'_ga': 'GA1.2.235397185.1404980740'},
META:{'DOCUMENT_ROOT': '/usr/local/apache2/htdocs',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': 'image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */*',
 'HTTP_ACCEPT_ENCODING': 'gzip, deflate',
 'HTTP_ACCEPT_LANGUAGE': 'pt',
 'HTTP_CACHE_CONTROL': 'max-age=0',
 'HTTP_CONNECTION': 'close',
 'HTTP_COOKIE': '_ga=GA1.2.235397185.1404980740',
 'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3',
 'HTTP_HOST': 'publicos.pt',
 'HTTP_HTTPS': 'off',
 'HTTP_HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_USER_AGENT': ------------------------
 'HTTP_VIA': '1.1 cmaGD.cma.local (squid/3.3.8)',
 'HTTP_X_FORWARDED_FOR': ---------------------
 'HTTP_X_FORWARDED_HOST': 'publicos.pt',
 'HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_X_FORWARDED_SERVER': 'publicos.pt',
 'HTTP_X_FORWARDED_SSL': 'off',
 'PATH_INFO': '/categoria/8696/contratados',
 'PATH_TRANSLATED': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py/categoria/8696/contratados',
 'QUERY_STRING': 'página=3',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': '33908',
 'REQUEST_METHOD': 'GET',
 'REQUEST_URI': '/categoria/8696/contratados?página=3',
 'SCRIPT_FILENAME': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py',
 'SCRIPT_NAME': '',
 'SERVER_ADDR': '127.0.0.1',
 'SERVER_ADMIN': '[no address given]',
 'SERVER_NAME': 'publicos.pt',
 'SERVER_PORT': '80',
 'SERVER_PROTOCOL': 'HTTP/1.0',
 'SERVER_SIGNATURE': '',
 'SERVER_SOFTWARE': 'Apache/2.2.25 (Unix) mod_wsgi/3.4 Python/3.3.2',
 'mod_wsgi.application_group': 'web306.webfaction.com|',
 'mod_wsgi.callable_object': 'application',
 'mod_wsgi.enable_sendfile': '0',
 'mod_wsgi.handler_script': '',
 'mod_wsgi.input_chunked': '0',
 'mod_wsgi.listener_host': '',
 'mod_wsgi.listener_port': '10392',
 'mod_wsgi.process_group': 'publics',
 'mod_wsgi.queue_start': '1405000563669244',
 'mod_wsgi.request_handler': 'wsgi-script',
 'mod_wsgi.script_reloading': '1',
 'mod_wsgi.version': (3, 4),
 'wsgi.errors': <_io.TextIOWrapper encoding='utf-8'>,
 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7fc0566d5828>,
 'wsgi.input': <mod_wsgi.Input object at 0x7fc05c0586b0>,
 'wsgi.multiprocess': True,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}>

comment:3 by Aymeric Augustin, 10 years ago

Clearly the exception happens because the query string contains a á encoded as latin-1 while Django expect non-ASCII characters in the URL to be encoded in UTF-8.

>>> b'\xe1'.decode('latin-1')
'á'

If I understand correctly, you cannot reproduce this reliably? Maybe it's an old and buggy browser? If you have the web server's log, can you look for the 500 error and check the user agent?

comment:4 by jorgecarleitao, 10 years ago

Since I had old error emails, some examples of user agents where this happened:

'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET CLR 2.0.50727)',
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; MATMJS)' (4 times)
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727)',
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',

In the last error, the server log (accesses) shows the first user-agent above on the access with status 500.

comment:5 by Claude Paroz, 10 years ago

This seems to be typical of IE which messes URL encoding.
From http://blogs.msdn.com/b/ieinternals/archive/2012/07/13/internet-explorer-and-international-text-encoding-unicode-punycode-ansi-oh-my.aspx:

"URLs in IE may use up to three (!!) different encodings at once: punycode in the hostname, %-escaped UTF-8 for the path, and raw codepaged-ANSI for the query and fragment components. This is clearly a mess, but fixing it to match the IRI specification incurs compatibility costs. (Trust me, we’ve tried!)"

While it's unfortunate, we should at least not crash.

comment:6 by Tim Graham, 10 years ago

Triage Stage: UnreviewedAccepted

by Claude Paroz, 10 years ago

Attachment: 22996-1.6.diff added

by Claude Paroz, 10 years ago

Attachment: 22996-master.diff added

comment:7 by Claude Paroz, 10 years ago

Has patch: set

comment:8 by Aymeric Augustin, 10 years ago

Yeah, we have no choice but shoving "in the face of ambiguity, refuse to guess" up on asses. Thank you, IE.

Patches look pretty good. Can you add a comment explaining why the results are different on Python 2 and 3 -- if you know why? (I'm pretty sure I've seen that before but I can't remember the reason.) Can you also add a reference to this ticket (#22996) in the test's docstring?

Version 0, edited 10 years ago by Aymeric Augustin (next)

comment:9 by Claude Paroz, 10 years ago

Here is the pull request for master: https://github.com/django/django/pull/2910
For the backport on 1.6, I might limit the changes to the Python 3 issue.

comment:10 by Tim Graham, 10 years ago

Patch needs improvement: set

According to the PR, there is a failing test on Python 2.

comment:11 by Claude Paroz, 10 years ago

Patch needs improvement: unset

Patch updated, including a note in the 1.7 release notes.

comment:12 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

comment:13 by Claude Paroz <claude@…>, 10 years ago

Resolution: fixed
Status: newclosed

In fa02120d360387bebbbe735e86686bb4c7c43db2:

Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.

comment:14 by Claude Paroz <claude@…>, 10 years ago

In 72ad014b6aee3e8d996af4646b97228e82fc4cc1:

[1.7.x] Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.
Backport of fa02120d36 from master.

comment:15 by Claude Paroz <claude@…>, 10 years ago

In 9f9fdc4b0a33abfe3255302300ea1e3d1c33a3a0:

[1.6.x] Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.
Backport of fa02120d36 from master.

Note: See TracTickets for help on using tickets.
Back to Top