#15237 closed Bug (fixed)
Django generated Atom/RSS feeds don't specify charset=utf8 in their Content-Type
Reported by: | simon | Owned by: | Jason Kotenko |
---|---|---|---|
Component: | contrib.syndication | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | Jason Kotenko, shadow, techtonik@… | Triage Stage: | Ready for checkin |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8". At the moment Django's default behaviour is to serve them without the charset bit, and it's not particularly easy to over-ride this behaviour:
http://code.djangoproject.com/browser/django/trunk/django/utils/feedgenerator.py#L290
The workaround I'm using at the moment is to wrap the feed in a view function which over-rides the content-type on the generated response object, but it's a bit of a hack:
def feed(request): response = MyFeed()(request) response['Content-Type'] = 'application/atom+xml; charset=utf-8' return response
Attachments (3)
Change History (22)
comment:1 by , 14 years ago
milestone: | → 1.3 |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 14 years ago
Since the syndication framework actually writes everything in utf-8 in the view (see http://code.djangoproject.com/browser/django/trunk/django/contrib/syndication/views.py#L40) this should be a case of just adding "; charset=utf8" to the line simon is referring to?
comment:3 by , 14 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 14 years ago
Cc: | added |
---|---|
Has patch: | set |
Assertion "Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8"." verified here: http://tools.ietf.org/html/rfc2045#section-5.1 (mime-type syntax) and here: http://tools.ietf.org/html/rfc3023#section-3.2 (recommendation to always set the charset).
Although it does appear that if the charset is set in the XML declaration, i.e. <?xml version="1.0" encoding="utf-8"?> , then the charset in the Content-Type is not required, since everything within that XML document is supposed to be treated as UTF8. However, it is still recommended so will proceed.
Looks like the code in /django/contrib/syndication/views.py does not set the mime-type, it only uses it. It is the util code in /django/utils/feedgenerator.py that sets it, as mentioned above.
Can't find anything in the docs that requires changing due to this small change.
Added regression test to verify the MIME type is still set with encoding in the future.
comment:7 by , 14 years ago
Another workaround for this issue, is extending feed class:
from django.contrib.syndication.views import Feed from django.utils import feedgenerator class FeedUTF8(feedgenerator.DefaultFeed): def __init__(self, *args, **kwargs): super(FeedUTF8, self).__init__(*args, **kwargs) self.mime_type = '%s; charset=utf-8' % self.mime_type
And then specify the feed_type:
class LatestEntriesFeed(Feed): ... feed_type = FeedUTF8
...
$ curl -I http://localhost:8000/feed HTTP/1.0 200 OK Date: Thu, 31 Mar 2011 16:40:26 GMT Server: WSGIServer/0.1 Python/2.6.1 Content-Type: application/rss+xml; charset=utf-8
comment:8 by , 14 years ago
Resolution: | fixed |
---|---|
Severity: | → Normal |
Status: | closed → reopened |
Type: | → Uncategorized |
The charset should be “utf-8” rather than “utf8”, since the latter isn't what's registered with IANA. See: http://www.w3.org/International/O-HTTP-charset.
comment:9 by , 14 years ago
Type: | Uncategorized → Bug |
---|
comment:10 by , 13 years ago
Easy pickings: | set |
---|---|
UI/UX: | unset |
Bug confirmed: http://tools.ietf.org/html/rfc3023#section-3.2 (link given in a previous comment) says 'utf-8'
and not 'utf8'
.
While investigating this problem, I noticed that the codebase consistently uses <unicode>.encode('utf-8')
, except one instance in tests/regressiontests/signing/tests.py
, where the dash is missing. The codecs
module defines utf8
as an alias of utf-8
, so the code works, but there's no reason to keep this exception. I included that fix in the patch too — feel free to commit it separately or not commit it at all.
PS : you could have opened a new ticket instead of reopening this one, because strictly speaking, it's a different issue.
by , 13 years ago
Attachment: | 15237-reopened.patch added |
---|
comment:11 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Acting like another set of eyes. Seems pretty straightforward — RFC.
comment:14 by , 13 years ago
Cc: | added |
---|---|
Has patch: | unset |
Resolution: | fixed |
Status: | closed → reopened |
Triage Stage: | Ready for checkin → Unreviewed |
Version: | 1.2 → SVN |
This fix only seems to have been applied to Atom feeds, and not RSS feeds.
Is there a reason for this? If not, could it please also be applied to RSS feeds?
One use case is: debugging feeds with Google Chrome, which displays them in text/plain, and therefore doesn't parse the document level encoding attribute (<?xml version="1.0" encoding="utf-8"?>). The result is it uses an incorrect encoding (e.g. country’s, instead of country's).
comment:15 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Given the previous argument that Feed always writes the content in UTF-*, it sound reasonable to me. And as the original ticket mentions both Atom and RSS, I think it's ok to reopen this ticket.
comment:16 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:17 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
In [17494]:
(The changeset message doesn't reference this ticket)
follow-up: 19 comment:18 by , 12 years ago
Cc: | added |
---|---|
Version: | master → 1.3 |
Any chance for it to be backported to 1.3?
comment:19 by , 12 years ago
Replying to techtonik:
Any chance for it to be backported to 1.3?
Not at all, sorry. Only security-related issues might have a chance to be backported to 1.3.
This seems like a reasonable request. I'm not an expert on the feeds framework, but it doesn't look like it ever produces things which are NOT UTF-8, so hopefully the fix is trivial.