Opened 17 years ago
Closed 17 years ago
#5956 closed (fixed)
unicode character in response headers
Reported by: | Owned by: | Jeroen Vloothuis | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Keywords: | ||
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
when you put a non-7-bit character to a header, django dies in a bad way.
when you have DEBUG=True, you even don't see a django exception.
Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 278, in run self.result = application(self.environ, self.start_response) File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 620, in __call__ return self.application(environ, start_response) File "/usr/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 218, in __call__ response_headers = [(str(k), str(v)) for k, v in response.items()] UnicodeEncodeError: 'ascii' codec can't encode character u'\u011f' in position 9: ordinal not in range(128)
we need to display a better error desription or just ignore unicode objects for the headers
Attachments (3)
Change History (14)
comment:1 by , 17 years ago
Component: | Uncategorized → HTTP handling |
---|
comment:2 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Triage Stage: | Unreviewed → Accepted |
by , 17 years ago
Attachment: | unicode_http_headers.diff added |
---|
comment:3 by , 17 years ago
Has patch: | set |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Fixed the problem by converting both keys and values to str objects based on the charset of the HTTPResponse. The patch to this issue includes both a test and the patches to HTTPResponse to make this work.
comment:4 by , 17 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Probably better (for consistency) to call smart_str(value, self._encoding)
instead of writing a new function here. That's what smart_str()
is for.
by , 17 years ago
Attachment: | unicode_http_headers.2.diff added |
---|
comment:5 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
Summary: | unicode character in response headars → unicode character in response headers |
Triage Stage: | Accepted → Ready for checkin |
Using smart_str is better indeed. A response object does not have the _encoding. That is why the _charset is used (which boils down to the same thing).
comment:6 by , 17 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
This ticket was originally asking for better error handling, but the attached patches actually try to encode the headers. I'm more in favour of better error handling, but if somebody is trying to encode things, the current code doesn't work. We MUST comply with sections 4.2 and 2.2 of RFC 2616 (the HTTP/1.1 spec) with respect to encoding. You can't just arbitrarily encode things using the character set given in the HttpResponse, since that's an encoding for the content, whereas headers need to be understood and manipulated by encoding-unaware machinery. So somebody needs to read the appropriate parts of RFC 2616 very carefully (and possibly portions of RFC 2047 as well) and implement the handling required by the specs here if you want to try and handle all data (without a huge performance impact on the common "things that are strings" case).
Or just raise better errors and ask the user to encode the headers appropriately and insert a string type, which is quite reasonable, since non-ASCII headers are very rare (cookies aside).
comment:7 by , 17 years ago
After reading the RFC's I agree with you in that my solution is the wrong one. From what I have read all headers should be printable ASCII (including cookies). I will create a new patch which will at least verify that all headers are ASCII (like you suggested).
by , 17 years ago
Attachment: | better_header_encoding_error.diff added |
---|
comment:8 by , 17 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
The new patch (better_header_encoding_error.diff) tells the developer about the reasons for the unicode error while still keeping the traceback.
comment:9 by , 17 years ago
I suspect we're still not implementing RFC 2616 correctly, but we can fix that in another ticket. For the record, though: section 4.2 says header values are text, tokens or quoted strings. Section 2.2 says text (and hence, quoted-strings) can contain octets, with some low-value exclusions, which means 8-bit values outside the ASCII range are permitted. Not sure of any case where this is used in practice, but it appears to be permissible.
I'm going to apply this patch and close this ticket, though, since it only reinforces our current constraints, so if we'll be no more wrong after this patch is applied, but the errors will be clearer.
comment:11 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Patch for unicode support for response headers