#22996 closed Bug (fixed)
UnicodeDecodeError on accessing `request.GET`
Reported by: | jorgecarleitao | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 1.6 |
Severity: | Normal | Keywords: | |
Cc: | jorgecarleitao | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I'm getting a non-deterministic error while running Django 1.6.5 in production, when I try to access request.GET
:
Traceback (most recent call last): File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted context = build_costumer_list_context(context, request.GET) File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')
from a request of the form:
'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3'
I'm using one middleware, 'django.middleware.locale.LocaleMiddleware'
and página
is a translation.
This error occurs ~1 every 200 pageviews (estimated), and it seems to occur only on requests with gets of the form ?página=...
.
I will gladly help on this, although I'm not familiar with HTTP handling, thus I would need some guidance on what could be and where I should start looking.
Attachments (2)
Change History (17)
comment:1 by , 11 years ago
Description: | modified (diff) |
---|
comment:2 by , 11 years ago
Thanks for formatting it Aymeric.
This is all I have in the email I receive with the traceback (removed 2 informations that could help to identify the user):
Traceback (most recent call last): File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted context = build_costumer_list_context(context, request.GET) File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1: invalid continuation byte <WSGIRequest path:/categoria/8696/contratados, GET:<could not parse>, POST:<QueryDict: {}>, COOKIES:{'_ga': 'GA1.2.235397185.1404980740'}, META:{'DOCUMENT_ROOT': '/usr/local/apache2/htdocs', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': 'image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */*', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_ACCEPT_LANGUAGE': 'pt', 'HTTP_CACHE_CONTROL': 'max-age=0', 'HTTP_CONNECTION': 'close', 'HTTP_COOKIE': '_ga=GA1.2.235397185.1404980740', 'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3', 'HTTP_HOST': 'publicos.pt', 'HTTP_HTTPS': 'off', 'HTTP_HTTP_X_FORWARDED_PROTO': 'http', 'HTTP_USER_AGENT': ------------------------ 'HTTP_VIA': '1.1 cmaGD.cma.local (squid/3.3.8)', 'HTTP_X_FORWARDED_FOR': --------------------- 'HTTP_X_FORWARDED_HOST': 'publicos.pt', 'HTTP_X_FORWARDED_PROTO': 'http', 'HTTP_X_FORWARDED_SERVER': 'publicos.pt', 'HTTP_X_FORWARDED_SSL': 'off', 'PATH_INFO': '/categoria/8696/contratados', 'PATH_TRANSLATED': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py/categoria/8696/contratados', 'QUERY_STRING': 'página=3', 'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '33908', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/categoria/8696/contratados?página=3', 'SCRIPT_FILENAME': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py', 'SCRIPT_NAME': '', 'SERVER_ADDR': '127.0.0.1', 'SERVER_ADMIN': '[no address given]', 'SERVER_NAME': 'publicos.pt', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.0', 'SERVER_SIGNATURE': '', 'SERVER_SOFTWARE': 'Apache/2.2.25 (Unix) mod_wsgi/3.4 Python/3.3.2', 'mod_wsgi.application_group': 'web306.webfaction.com|', 'mod_wsgi.callable_object': 'application', 'mod_wsgi.enable_sendfile': '0', 'mod_wsgi.handler_script': '', 'mod_wsgi.input_chunked': '0', 'mod_wsgi.listener_host': '', 'mod_wsgi.listener_port': '10392', 'mod_wsgi.process_group': 'publics', 'mod_wsgi.queue_start': '1405000563669244', 'mod_wsgi.request_handler': 'wsgi-script', 'mod_wsgi.script_reloading': '1', 'mod_wsgi.version': (3, 4), 'wsgi.errors': <_io.TextIOWrapper encoding='utf-8'>, 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7fc0566d5828>, 'wsgi.input': <mod_wsgi.Input object at 0x7fc05c0586b0>, 'wsgi.multiprocess': True, 'wsgi.multithread': True, 'wsgi.run_once': False, 'wsgi.url_scheme': 'http', 'wsgi.version': (1, 0)}>
comment:3 by , 11 years ago
Clearly the exception happens because the query string contains a á
encoded as latin-1 while Django expect non-ASCII characters in the URL to be encoded in UTF-8.
>>> b'\xe1'.decode('latin-1') 'á'
If I understand correctly, you cannot reproduce this reliably? Maybe it's an old and buggy browser? If you have the web server's log, can you look for the 500 error and check the user agent?
comment:4 by , 11 years ago
Since I had old error emails, some examples of user agents where this happened:
'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET CLR 2.0.50727)', 'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; MATMJS)' (4 times) 'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)', 'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727)', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',
In the last error, the server log (accesses) shows the first user-agent above on the access with status 500.
comment:5 by , 11 years ago
This seems to be typical of IE which messes URL encoding.
From http://blogs.msdn.com/b/ieinternals/archive/2012/07/13/internet-explorer-and-international-text-encoding-unicode-punycode-ansi-oh-my.aspx:
"URLs in IE may use up to three (!!) different encodings at once: punycode in the hostname, %-escaped UTF-8 for the path, and raw codepaged-ANSI for the query and fragment components. This is clearly a mess, but fixing it to match the IRI specification incurs compatibility costs. (Trust me, we’ve tried!)"
While it's unfortunate, we should at least not crash.
comment:6 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
by , 11 years ago
Attachment: | 22996-1.6.diff added |
---|
by , 11 years ago
Attachment: | 22996-master.diff added |
---|
comment:7 by , 11 years ago
Has patch: | set |
---|
comment:8 by , 11 years ago
Yeah, we have no choice but shoving "in the face of ambiguity, refuse to guess" up our asses. Thank you, IE.
Patches look pretty good. Can you add a comment explaining why the results are different on Python 2 and 3 -- if you know why? (I'm pretty sure I've seen that before but I can't remember the reason.) Can you also add a reference to this ticket (#22996) in the test's docstring?
comment:9 by , 11 years ago
Here is the pull request for master: https://github.com/django/django/pull/2910
For the backport on 1.6, I might limit the changes to the Python 3 issue.
comment:10 by , 10 years ago
Patch needs improvement: | set |
---|
According to the PR, there is a failing test on Python 2.
comment:11 by , 10 years ago
Patch needs improvement: | unset |
---|
Patch updated, including a note in the 1.7 release notes.
comment:12 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:13 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Could you provide the full stack trace please?