#15152 closed Bug (fixed)
Common middleware raises UnicodeDecodeError if receives non-ASCII QUERY_STRING from buggy web server
Reported by: | Loststylus | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Core (Other) | Version: | 1.2 |
Severity: | Normal | Keywords: | common middleware |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
So, FlaxCrawler seems to like my site and visits it very often always getting a 500 error.
Here's the common traceback:
Traceback (most recent call last): File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/base.py", line 80, in get_response response = middleware_method(request) File "/usr/local/lib/python2.6/dist-packages/django/middleware/common.py", line 79, in process_request newurl += '?' + request.META['QUERY_STRING'] UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 3: ordinal not in range(128) <WSGIRequest GET:<QueryDict: {u'q': [u'\u0427\u0430\u0439\u043a\u0430']}>, POST:<QueryDict: {}>, COOKIES:{}, META:{'CONTENT_LENGTH': '', 'CONTENT_TYPE': '', 'HTTP_ACCEPT': 'text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2', 'HTTP_ACCEPT_ENCODING': 'gzip,defalte', 'HTTP_ACCEPT_LANGUAGE': 'ru,en-us;q=0.7,en;q=0.3', 'HTTP_CACHE_CONTROL': 'no-cache', 'HTTP_CONNECTION': 'close', 'HTTP_HOST': '{sorry, i've got that censored out}', 'HTTP_PRAGMA': 'no-cache', 'HTTP_USER_AGENT': 'FlaxCrawler/1.0', 'PATH_INFO': u'/articles/ajaxsearch', 'QUERY_STRING': 'q=\xd0\xa7\xd0\xb0\xd0\xb9\xd0\xba\xd0\xb0', 'REMOTE_ADDR': '92.241.173.132', 'REQUEST_METHOD': 'GET', 'SCRIPT_NAME': u'', 'SERVER_NAME': '{sorry, i've got that censored out}', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'wsgi.errors': <flup.server.fcgi_base.TeeOutputStream object at 0x2634850>, 'wsgi.input': <flup.server.fcgi_base.InputStream object at 0x2634610>, 'wsgi.multiprocess': True, 'wsgi.multithread': False, 'wsgi.run_once': False, 'wsgi.url_scheme': 'http', 'wsgi.version': (1, 0)}>
The major problem i see here is that developer cannot do anything to catch the error :(
Change History (18)
comment:2 by , 14 years ago
comment:3 by , 14 years ago
Oh, thank you for your response, i'll try to catch it via middleware.
I think the WSGIRequest constructor seems like the proper place to check for that.
comment:4 by , 14 years ago
Temporary workaround (middleware should be added befor common middleware):
class RequestCheckMiddleware(object): def process_request(self, request): try: u'%s' % request.META.get('QUERY_STRING','') except UnicodeDecodeError: response = HttpResponse() response.status_code = 400 #Bad Request return response return None
comment:5 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Accepted on the basis that we could do something here, but I agree with Luke - we don't want to pay a big price because a handful of servers can't implement the spec correctly.
comment:6 by , 14 years ago
Summary: | Common middleware raises UnicodeDecodeError if receives specially formatted unicode-like query string → Common middleware raises UnicodeDecodeError if receives non-ASCII QUERY_STRING from buggy web server |
---|
comment:7 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:13 by , 13 years ago
I got the same error.
My server is Apache with mod_wsgi. The client seems to be Internet Explorer 9.
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/Django-1.3-py2.7.egg/django/core/handlers/base.py", line 89, in get_response response = middleware_method(request) File "/usr/local/lib/python2.7/dist-packages/Django-1.3-py2.7.egg/django/middleware/common.py", line 89, in process_request newurl += '?' + request.META['QUERY_STRING'] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 58: ordinal not in range(128) <WSGIRequest GET:<QueryDict: {u'datetime': [u'--------------'], u'time_group_id': [u'---'], u'speciality': [u'Psicologia (Dist\xfarbios emocionais e de personalidade)'], u'office': [u'--------------------------------------------'], u'health_insurance': [u'----------']}>, POST:<QueryDict: {}>, META:{ 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': 'text/html, application/xhtml+xml, */*', 'HTTP_ACCEPT_LANGUAGE': 'pt-BR', 'HTTP_CONNECTION': 'Keep-Alive', 'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)', 'HTTP_VIA': '1.1 SVGWF02', 'QUERY_STRING': 'health_insurance=----------&speciality=Psicologia%20(Dist\xc3\xbarbios%20emocionais%20e%20de%20personalidade)&office=--------------------------------------------&time_group_id=---&datetime=----------------', 'REQUEST_METHOD': 'GET', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SOFTWARE': 'Apache', 'mod_wsgi.callable_object': 'application', 'mod_wsgi.handler_script': '', 'mod_wsgi.input_chunked': '0', 'mod_wsgi.listener_host': '', 'mod_wsgi.process_group': '', 'mod_wsgi.request_handler': 'wsgi-script', 'mod_wsgi.script_reloading': '1', 'mod_wsgi.version': (3, 3), 'wsgi.errors': <mod_wsgi.Log object at 0xba62c0c0>, 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0xba4b2608>, 'wsgi.input': <mod_wsgi.Input object at 0xba41cd90>, 'wsgi.multiprocess': True, 'wsgi.multithread': False, 'wsgi.run_once': False, 'wsgi.url_scheme': 'http', 'wsgi.version': (1, 1)}>
comment:14 by , 13 years ago
I have same error
Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/django/core/handlers/base.py", line 89, in get_response response = middleware_method(request) File "/usr/lib/python2.6/site-packages/django/middleware/common.py", line 89, in process_request newurl += '?' + request.META['QUERY_STRING'] UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 4: ordinal not in range(128)
Some additional data
HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)', 'HTTP_VIA': '1.0 niiri.kharkov.com (squid/3.0.STABLE7), 1.0 wwwniiri (squid/3.2.0.12)', 'HTTP_X_FORWARDED_FOR': 'unknown, 172.16.0.4, 82.117.230.71', 'PATH_INFO': u'/ru/products/tag/N', 'QUERY_STRING': 'N??\xb0???\xbb??????/fancybox/fancy_loading.png', 'REMOTE_PORT': '27370', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/ru/products/tag/N?N??\xb0???\xbb??????/fancybox/fancy_loading.png',
comment:15 by , 13 years ago
Same recurrent error here, just after updating to 1.4 :
Traceback (most recent call last): File "/usr/lib/python2.7/wsgiref/handlers.py", line 85, in run self.result = application(self.environ, self.start_response) File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/contrib/staticfiles/handlers.py", line 67, in __call__ return self.application(environ, start_response) File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/core/handlers/wsgi.py", line 241, in __call__ response = self.get_response(request) File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/core/handlers/base.py", line 146, in get_response response = debug.technical_404_response(request, e) File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/views/debug.py", line 432, in technical_404_response 'reason': smart_str(exception, errors='replace'), File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/utils/encoding.py", line 116, in smart_str return str(s) File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/core/urlresolvers.py", line 185, in __repr__ return smart_str(u'<%s %s %s>' % (self.__class__.__name__, self.name, self.regex.pattern)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I tried add utf8 spec at the top of urlresolver + add DEFAULT utf8 at settings.py => no effect.
comment:16 by , 12 years ago
Owner: | changed from | to
---|
comment:17 by , 12 years ago
Apache + mod_wsgi and the Bingbot is causing this error on one of my servers.
comment:18 by , 12 years ago
This happens because the URL produced by reversing is a unicode string and the query string is a bytestring.
(Interestingly, this bug doesn't exist under Python 3.)
comment:19 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
There is no "specially formatted unicode-like" query string here - it is a straightforward UTF-8 encoded string.
The strange thing here is that non-ASCII characters are ending up in
META['QUERY_STRING']
. With browsers, non-ASCII characters get percent encoded. So the request is simply wrong - this is definitely a bug the crawler (but that is irrelevant).The next question is whether this is a bug in the web server, which appears to be flup. Looking at the spec for QUERY_STRING in CGI (http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt) which is the basis of the WSGI spec (http://www.python.org/dev/peps/pep-0333/#environ-variables), the value of QUERY_STRING should not contain these values.
So AFAICS, this is a bug in flup, because it should never be passing on values like these. That doesn't mean we shouldn't fix it in Django to stop 500 errors being produced. The best behaviour would be to return a '400 Malformed request' error if QUERY_STRING has any non-ascii chars, but we probably don't want to do that in the bit of code that is raising this exception, but somewhere like
WSGIRequest.__init__
orBaseHandler.get_response
. But this will add overhead to every request, so I'm not sure what to do.There is a way to catch this at the developer level - install an exception middleware. You could also install a request middleware that checked that no invalid chars were in QUERY_STRING.