Opened 16 years ago
Closed 12 years ago
#9370 closed Bug (fixed)
Non utf-8 DEFAULT_CHARSET causes UnicodeDecodeError when serving binary data through middleware
Reported by: | kikko | Owned by: | Kevin Kubasik |
---|---|---|---|
Component: | Core (Other) | Version: | 1.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When serving binary static files (for example images) using django.views.static.serve and using GZipMiddleware a traceback is returned instead of the image:
Traceback (most recent call last): File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/servers/basehttp.py", line 277, in run self.result = application(self.environ, self.start_response) File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/servers/basehttp.py", line 634, in __call__ return self.application(environ, start_response) File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/handlers/wsgi.py", line 243, in __call__ response = middleware_method(request, response) File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/middleware/gzip.py", line 16, in process_response if response.status_code != 200 or len(response.content) < 200: File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/http/__init__.py", line 359, in _get_content return smart_str(''.join(self._container), self._charset) File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/utils/encoding.py", line 97, in smart_str return s.decode('utf-8', errors).encode(encoding, errors) File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
Change History (13)
comment:1 by , 16 years ago
milestone: | → 1.1 |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 16 years ago
Owner: | changed from | to
---|
comment:3 by , 16 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I cannot reproduce this against current trunk. Please reopen if you can create a test case which is reliably reproducible.
comment:4 by , 16 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
I have this reproduced with latest TRUNK:
[16:25 /home/niksite/webapps/django/datamining]$ ./manage.py runserver
Validating models...
0 errors found
Django version 1.1 beta 1 SVN-10982, using settings 'datamining.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Traceback (most recent call last):
File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run
self.result = application(self.environ, self.start_response)
File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in call
return self.application(environ, start_response)
File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in call
response = middleware_method(request, response)
File "/home/niksite/lib/site-python/django/middleware/gzip.py", line 16, in process_response
if response.status_code != 200 or len(response.content) < 200:
File "/home/niksite/lib/site-python/django/http/init.py", line 365, in _get_content
return smart_str(.join(self._container), self._charset)
File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
return s.decode('utf-8', errors).encode(encoding, errors)
File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:26:27] "GET /media/images/white.png HTTP/1.0" 500 1162
By the way, the following error is produced if gzip-middleware is disabled:
Traceback (most recent call last):
File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run
self.result = application(self.environ, self.start_response)
File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in call
return self.application(environ, start_response)
File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in call
response = middleware_method(request, response)
File "/home/niksite/lib/site-python/django/middleware/cache.py", line 91, in process_response
patch_response_headers(response, timeout)
File "/home/niksite/lib/site-python/django/utils/cache.py", line 108, in patch_response_headers
responseETag = '"%s"' % md5_constructor(response.content).hexdigest()
File "/home/niksite/lib/site-python/django/http/init.py", line 365, in _get_content
return smart_str(.join(self._container), self._charset)
File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
return s.decode('utf-8', errors).encode(encoding, errors)
File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:28:37] "GET /media/images/white.png HTTP/1.0" 500 1319
With cache-middleware disabled I see no errors:
[12/Jun/2009 16:29:31] "GET /media/images/white.png HTTP/1.0" 200 4233
comment:5 by , 16 years ago
Sorry, formatting has been lost in my last message. This is repost with some additions.
I have this reproduced with latest TRUNK:
[16:25 /home/niksite/webapps/django/datamining]$ ./manage.py runserver Validating models... 0 errors found Django version 1.1 beta 1 SVN-10982, using settings 'datamining.settings' Development server is running at http://127.0.0.1:8000/ Quit the server with CONTROL-C. Traceback (most recent call last): File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run self.result = application(self.environ, self.start_response) File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in __call__ return self.application(environ, start_response) File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in __call__ response = middleware_method(request, response) File "/home/niksite/lib/site-python/django/middleware/gzip.py", line 16, in process_response if response.status_code != 200 or len(response.content) < 200: File "/home/niksite/lib/site-python/django/http/__init__.py", line 365, in _get_content return smart_str(''.join(self._container), self._charset) File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str return s.decode('utf-8', errors).encode(encoding, errors) File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte [12/Jun/2009 16:26:27] "GET /media/images/white.png HTTP/1.0" 500 1162
By the way, the following error is produced if gzip-middleware is disabled:
Traceback (most recent call last): File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run self.result = application(self.environ, self.start_response) File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in __call__ return self.application(environ, start_response) File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in __call__ response = middleware_method(request, response) File "/home/niksite/lib/site-python/django/middleware/cache.py", line 91, in process_response patch_response_headers(response, timeout) File "/home/niksite/lib/site-python/django/utils/cache.py", line 108, in patch_response_headers response['ETag'] = '"%s"' % md5_constructor(response.content).hexdigest() File "/home/niksite/lib/site-python/django/http/__init__.py", line 365, in _get_content return smart_str(''.join(self._container), self._charset) File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str return s.decode('utf-8', errors).encode(encoding, errors) File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte [12/Jun/2009 16:28:37] "GET /media/images/white.png HTTP/1.0" 500 1319
With cache-middleware disabled I see no errors:
[12/Jun/2009 16:29:31] "GET /media/images/white.png HTTP/1.0" 200 4233
I have the following settings:
MIDDLEWARE_CLASSES = ( 'django.middleware.cache.UpdateCacheMiddleware', #'django.middleware.http.ConditionalGetMiddleware', 'django.middleware.gzip.GZipMiddleware', # 'debug_toolbar.middleware.DebugToolbarMiddleware', 'django.contrib.sessions.middleware.SessionMiddleware', 'django.middleware.locale.LocaleMiddleware', 'django.middleware.common.CommonMiddleware', 'django.contrib.auth.middleware.AuthenticationMiddleware', 'django.middleware.doc.XViewMiddleware', 'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware', 'django.middleware.cache.FetchFromCacheMiddleware', ) CACHE_MIDDLEWARE_SECONDS = 0 CACHE_BACKEND = "dummy://" MEDIA_URL = "http://127.0.0.1:8000/media/"
And the following lines in urls.py:
urlpatterns += patterns('', (r'^media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),
comment:6 by , 16 years ago
Resolution: | → worksforme |
---|---|
Status: | reopened → closed |
Can't reproduce with provided instructions.
comment:7 by , 16 years ago
Resolution: | worksforme |
---|---|
Status: | closed → reopened |
Summary: | UnicodeDecodeError when serving binary static files through GZipMiddleWare → Non utf-8 DEFAULT_CHARSET causes UnicodeDecodeError when serving binary data through middleware |
The bit of info missing from the descriptions that is key to hitting the problem is to have DEFAULT_CHARSET set to something other than utf-8 in settings.py. That's how you reach this line in the traceback:
File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str return s.decode('utf-8', errors).encode(encoding, errors)
I tried setting DEFAULT_CHARSET to 'latin1' (and adding the gzip middleware) for one of my projects that generates and serves .png files and sure enough starting hitting this traceback. It's not specific to django.views.static.serve, nor really the gzip middleware, it's anything that causes this routine:
def _get_content(self): if self.has_header('Content-Encoding'): return ''.join(self._container) return smart_str(''.join(self._container), self._charset)
in django/http/__init__.py
to be called and take the path of calling smart_str
with binary (and thus non-likely-to-be-successfully-decoded using utf-8) data and self._charset
set to something other than utf-8.
Given the constraints needed to recreate this, it may be appropriate to defer this past 1.1. But it's also pretty late here right now so perhaps there is some easy fix that escapes me at the moment, so I'll leave that decision to someone else.
comment:8 by , 16 years ago
You are right. My settings.py have DEFAULT_CHARSET = 'UTF-8'
record. Error disappears when I've changed it to DEFAULT_CHARSET = 'utf-8'
. Thank you!
comment:9 by , 16 years ago
milestone: | 1.1 |
---|
I'm happy to push this to post v1.1. There's no data loss involved, and the reproduction condition is an edge case.
comment:10 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:13 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
This was fixed in da56e1bac6449daef9aeab8d076d2594d9fd5b44, where I took care *not* to call force_bytes
on bytes
objects, in order not to trigger re-encoding. See #18796.
I'm going to look into this.