Opened 10 years ago
Closed 10 years ago
#24985 closed Cleanup/optimization (fixed)
Warn about invalid RSS characters in syndication docs
Reported by: | Michael Wood | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | 1.7 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I have some data which comes from log files that I'd like to put into a RSS feed, unfortunately due to the nature of this data it sometimes contains control characters e.g. \0001 \0003 , this causes it to fail RSS feed reader validation due to these characters (although valid utf-8) are not allowed (1).
I'm not sure if this is something that should be fixed in this module, perhaps in sax/saxutils or in somewhere like django.utils.encoding force_text ?
At the moment I'm working around this issue with a regex which replaces this range of chars.
Change History (7)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|
comment:2 by , 10 years ago
Summary: | Rss201rev2Feed invalid characters in character data for RSS → Provide a way to santize invalid characters from Rss201rev2Feed |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Bug → New feature |
comment:3 by , 10 years ago
Summary: | Provide a way to santize invalid characters from Rss201rev2Feed → Provide a way to sanitize invalid characters from Rss201rev2Feed |
---|
comment:4 by , 10 years ago
#20197 is similar but targets XML serialization with dumpdata
. I just added a patch in that ticket to loudly fail instead of silently producing invalid XML. Automatic sanitation is tricky, because depending on the use case, you might want to remove the offending chars, replace them with some alternative coding, or simply fix the source.
The patch for #20197 also affects RSS production, as the same django.utils.xmlutils.SimplerXMLGenerator
is used. If it gets committed, we might want to add a similar admonition in syndication docs.
comment:5 by , 10 years ago
Proposal for a documentation addition:
-
docs/ref/contrib/syndication.txt
diff --git a/docs/ref/contrib/syndication.txt b/docs/ref/contrib/syndication.txt index 6c86be0..940123c 100644
a b They share this interface: 919 919 ``self.feed`` for use with `custom feed generators`_. 920 920 921 921 All parameters should be Unicode objects, except ``categories``, which 922 should be a sequence of Unicode objects. 922 should be a sequence of Unicode objects. Beware that some control characters 923 are `not allowed <http://www.w3.org/International/questions/qa-controls>`_ 924 in XML documents. If your content has some of them, you might encounter a 925 :exp:`ValueError` when producing the feed. 923 926 924 927 :meth:`.SyndicationFeed.add_item` 925 928 Add an item to the feed with the given parameters.
comment:6 by , 10 years ago
Component: | contrib.syndication → Documentation |
---|---|
Summary: | Provide a way to sanitize invalid characters from Rss201rev2Feed → Warn about invalid RSS characters in syndication docs |
Triage Stage: | Accepted → Ready for checkin |
Type: | New feature → Cleanup/optimization |
exp -> exc, otherwise looks good.
We could look and see if other web frameworks perform sanitization or make alternate recommendations. If we don't make a change in Django, we could at least update the docs to note that requirement of sanitizing your own input and make a recommendation of how to do so.