Opened 8 years ago
Closed 8 years ago
#26971 closed Bug (fixed)
UnicodeDecodeError with non-ASCII string in quoted URL
Reported by: | Oleg Blinov | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 1.8 |
Severity: | Normal | Keywords: | UnicodeDecodeError UTF-8 windows-1251 URL wsgi |
Cc: | loic84 | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Django raises UnicodeDecodeError if there are non UTF-8 characters in the url.
https://github.com/django/django/blob/master/django/core/handlers/wsgi.py#L190:
return path_info.decode(UTF_8)
It doesn't work if the parameter in the URL is not in UTF-8 /tag/%E7%E0%EA%EB%E0%E4%EA%E0/
:
GET /tag/%E7%E0%EA%EB%E0%E4%EA%E0/ => generated 0 bytes in 1 msecs (HTTP/1.1 400) 1 headers in 68 bytes (1 switches on core 0) Bad Request (UnicodeDecodeError) Traceback (most recent call last): File "/home/ubuntu/django/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 167, in __call__ request = self.request_class(environ) File "/home/ubuntu/django/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 80, in __init__ path_info = get_path_info(environ) File "/home/ubuntu/django/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 197, in get_path_info return path_info.decode(UTF_8) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5: invalid continuation byte
With utf url-quoted parameter /tag/%D0%B7%D0%B0%D0%BA%D0%BB%D0%B0%D0%B4%D0%BA%D0%B0
there is no errors, but the old site has used windows-1251 encoding and I need to support old links. So I use this dirty hack:
try: return path_info.decode(UTF_8) except: return path_info.decode(windows-1251)
The problem is only in wsgi handler, manage.py runserver
handles non-utf urls without errors.
Change History (6)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
Cc: | added |
---|
comment:3 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Not sure about the appropriate resolution, but I could reproduce this crash by trying to fetch a URL like /tag/%E7%E0%EA%EB%E0%E4%EA%E0/
using gunicorn as the server.
comment:5 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
This was supposed to be fixed by #19508 (hence the
runserver
not failing).However, I suspect that in your production deployment, the received URI is already percent-decoded higher in the stack (Apache, mod_wsgi,...), so Django is receiving
/tag/\xe7\xe0\xea\xeb\xe0\xe4\xea\xe0/
instead of/tag/%E7%E0%EA%EB%E0%E4%EA%E0/
. In that case, we may try to "repercent" the URI in case ofUnicodeDecodeError
.Loïc, could you advise?