Opened 16 years ago

Closed 12 years ago

#9370 closed Bug (fixed)

Non utf-8 DEFAULT_CHARSET causes UnicodeDecodeError when serving binary data through middleware

Reported by: kikko Owned by: Kevin Kubasik
Component: Core (Other) Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When serving binary static files (for example images) using django.views.static.serve and using GZipMiddleware a traceback is returned instead of the image:

Traceback (most recent call last):

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/servers/basehttp.py", line 277, in run
    self.result = application(self.environ, self.start_response)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/servers/basehttp.py", line 634, in __call__
    return self.application(environ, start_response)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/core/handlers/wsgi.py", line 243, in __call__
    response = middleware_method(request, response)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/middleware/gzip.py", line 16, in process_response
    if response.status_code != 200 or len(response.content) < 200:

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/http/__init__.py", line 359, in _get_content
    return smart_str(''.join(self._container), self._charset)

  File "/lib/python2.5/site-packages/Django-1.0_final-py2.5.egg/django/utils/encoding.py", line 97, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)

  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte

Change History (13)

comment:1 by Jacob, 16 years ago

milestone: 1.1
Triage Stage: UnreviewedAccepted

comment:2 by Kevin Kubasik, 16 years ago

Owner: changed from nobody to Kevin Kubasik

I'm going to look into this.

comment:3 by Kevin Kubasik, 16 years ago

Resolution: invalid
Status: newclosed

I cannot reproduce this against current trunk. Please reopen if you can create a test case which is reliably reproducible.

comment:4 by Nikolay, 16 years ago

Resolution: invalid
Status: closedreopened

I have this reproduced with latest TRUNK:

[16:25 /home/niksite/webapps/django/datamining]$ ./manage.py runserver
Validating models...
0 errors found

Django version 1.1 beta 1 SVN-10982, using settings 'datamining.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Traceback (most recent call last):

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run

self.result = application(self.environ, self.start_response)

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in call

return self.application(environ, start_response)

File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in call

response = middleware_method(request, response)

File "/home/niksite/lib/site-python/django/middleware/gzip.py", line 16, in process_response

if response.status_code != 200 or len(response.content) < 200:

File "/home/niksite/lib/site-python/django/http/init.py", line 365, in _get_content

return smart_str(.join(self._container), self._charset)

File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str

return s.decode('utf-8', errors).encode(encoding, errors)

File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:26:27] "GET /media/images/white.png HTTP/1.0" 500 1162

By the way, the following error is produced if gzip-middleware is disabled:

Traceback (most recent call last):

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run

self.result = application(self.environ, self.start_response)

File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in call

return self.application(environ, start_response)

File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in call

response = middleware_method(request, response)

File "/home/niksite/lib/site-python/django/middleware/cache.py", line 91, in process_response

patch_response_headers(response, timeout)

File "/home/niksite/lib/site-python/django/utils/cache.py", line 108, in patch_response_headers

responseETag = '"%s"' % md5_constructor(response.content).hexdigest()

File "/home/niksite/lib/site-python/django/http/init.py", line 365, in _get_content

return smart_str(.join(self._container), self._charset)

File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str

return s.decode('utf-8', errors).encode(encoding, errors)

File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode

return codecs.utf_8_decode(input, errors, True)

UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:28:37] "GET /media/images/white.png HTTP/1.0" 500 1319

With cache-middleware disabled I see no errors:
[12/Jun/2009 16:29:31] "GET /media/images/white.png HTTP/1.0" 200 4233

comment:5 by Nikolay, 16 years ago

Sorry, formatting has been lost in my last message. This is repost with some additions.

I have this reproduced with latest TRUNK:

[16:25 /home/niksite/webapps/django/datamining]$ ./manage.py runserver
Validating models...
0 errors found

Django version 1.1 beta 1 SVN-10982, using settings 'datamining.settings'
Development server is running at http://127.0.0.1:8000/
Quit the server with CONTROL-C.
Traceback (most recent call last):
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in __call__
    return self.application(environ, start_response)
  File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in __call__
    response = middleware_method(request, response)
  File "/home/niksite/lib/site-python/django/middleware/gzip.py", line 16, in process_response
    if response.status_code != 200 or len(response.content) < 200:
  File "/home/niksite/lib/site-python/django/http/__init__.py", line 365, in _get_content
    return smart_str(''.join(self._container), self._charset)
  File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:26:27] "GET /media/images/white.png HTTP/1.0" 500 1162

By the way, the following error is produced if gzip-middleware is disabled:

Traceback (most recent call last):
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)
  File "/home/niksite/lib/site-python/django/core/servers/basehttp.py", line 636, in __call__
    return self.application(environ, start_response)
  File "/home/niksite/lib/site-python/django/core/handlers/wsgi.py", line 245, in __call__
    response = middleware_method(request, response)
  File "/home/niksite/lib/site-python/django/middleware/cache.py", line 91, in process_response
    patch_response_headers(response, timeout)
  File "/home/niksite/lib/site-python/django/utils/cache.py", line 108, in patch_response_headers
    response['ETag'] = '"%s"' % md5_constructor(response.content).hexdigest()
  File "/home/niksite/lib/site-python/django/http/__init__.py", line 365, in _get_content
    return smart_str(''.join(self._container), self._charset)
  File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: unexpected code byte
[12/Jun/2009 16:28:37] "GET /media/images/white.png HTTP/1.0" 500 1319

With cache-middleware disabled I see no errors:

[12/Jun/2009 16:29:31] "GET /media/images/white.png HTTP/1.0" 200 4233

I have the following settings:

MIDDLEWARE_CLASSES = (
	'django.middleware.cache.UpdateCacheMiddleware',
	#'django.middleware.http.ConditionalGetMiddleware',
	'django.middleware.gzip.GZipMiddleware',
	# 'debug_toolbar.middleware.DebugToolbarMiddleware',
	'django.contrib.sessions.middleware.SessionMiddleware',
	'django.middleware.locale.LocaleMiddleware',
	'django.middleware.common.CommonMiddleware',
	'django.contrib.auth.middleware.AuthenticationMiddleware',
	'django.middleware.doc.XViewMiddleware',
	'django.contrib.flatpages.middleware.FlatpageFallbackMiddleware',
	'django.middleware.cache.FetchFromCacheMiddleware',
)
CACHE_MIDDLEWARE_SECONDS = 0
CACHE_BACKEND = "dummy://"
MEDIA_URL = "http://127.0.0.1:8000/media/"

And the following lines in urls.py:

	urlpatterns += patterns('',
							(r'^media/(?P<path>.*)$', 'django.views.static.serve', {'document_root': settings.MEDIA_ROOT}),

comment:6 by Russell Keith-Magee, 16 years ago

Resolution: worksforme
Status: reopenedclosed

Can't reproduce with provided instructions.

comment:7 by Karen Tracey, 16 years ago

Resolution: worksforme
Status: closedreopened
Summary: UnicodeDecodeError when serving binary static files through GZipMiddleWareNon utf-8 DEFAULT_CHARSET causes UnicodeDecodeError when serving binary data through middleware

The bit of info missing from the descriptions that is key to hitting the problem is to have DEFAULT_CHARSET set to something other than utf-8 in settings.py. That's how you reach this line in the traceback:

File "/home/niksite/lib/site-python/django/utils/encoding.py", line 119, in smart_str
    return s.decode('utf-8', errors).encode(encoding, errors)

I tried setting DEFAULT_CHARSET to 'latin1' (and adding the gzip middleware) for one of my projects that generates and serves .png files and sure enough starting hitting this traceback. It's not specific to django.views.static.serve, nor really the gzip middleware, it's anything that causes this routine:

    def _get_content(self):
        if self.has_header('Content-Encoding'):
            return ''.join(self._container)
        return smart_str(''.join(self._container), self._charset)

in django/http/__init__.py to be called and take the path of calling smart_str with binary (and thus non-likely-to-be-successfully-decoded using utf-8) data and self._charset set to something other than utf-8.

Given the constraints needed to recreate this, it may be appropriate to defer this past 1.1. But it's also pretty late here right now so perhaps there is some easy fix that escapes me at the moment, so I'll leave that decision to someone else.

comment:8 by Nikolay, 16 years ago

You are right. My settings.py have DEFAULT_CHARSET = 'UTF-8' record. Error disappears when I've changed it to DEFAULT_CHARSET = 'utf-8' . Thank you!

comment:9 by Russell Keith-Magee, 16 years ago

milestone: 1.1

I'm happy to push this to post v1.1. There's no data loss involved, and the reproduction condition is an edge case.

comment:10 by Luke Plant, 14 years ago

Severity: Normal
Type: Bug

comment:11 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:12 by Aymeric Augustin, 13 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:13 by Aymeric Augustin, 12 years ago

Resolution: fixed
Status: reopenedclosed

This was fixed in da56e1bac6449daef9aeab8d076d2594d9fd5b44, where I took care *not* to call force_bytes on bytes objects, in order not to trigger re-encoding. See #18796.

Note: See TracTickets for help on using tickets.
Back to Top