#15936 closed New feature (invalid)
Syndication: Turning off autoescape (content:encoded)
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | contrib.syndication | Version: | 1.3 |
Severity: | Normal | Keywords: | syndication, content:encoded |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
A very common element to pull into an RSS feed is the content:encoded element. In a blog, for example, it allows you to put a full entry into your RSS feed, including the HTML in that entry (headings, paragraphs, lists, whatnot).
I couldn't find a way to get this element to work, given the current contrib.syndication module. I would do this:
class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed): """ Create a type of RSS feed that has content:encoded elements. """ def root_attributes(self): attrs = super(ExtendedRSSFeed, self).root_attributes() attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/' return attrs def add_item_elements(self, handler, item): super(ExtendedRSSFeed, self).add_item_elements(handler, item) handler.addQuickElement(u'content:encoded', item['content_encoded']) ... class TheFeed(Feed): feed_type = ExtendedRSSFeed .... def item_extra_kwargs(self, item): return {'content_encoded': self.item_content_encoded(item)} def item_content_encoded(self, item): return "<![CDATA[%s]]>" % item.content
But that would generate a feed with all of the HTML bits autoescaped... even if I put an {% autoescape off %} block in the template where the content:encoded was being pulled from. So, instead of being able to stick html tags inside the CDATA, I would just end up with a lot of <h1> stuff.
After drilling in and finding the SimplerXMLGenerator, it seemed like the ability to turn off autoescaping could be done at this point, without breaking anyone's current implementations (see patch). Thus, the only change to the above example becomes this:
class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed): """ Create a type of RSS feed that has content:encoded elements. """ def root_attributes(self): attrs = super(ExtendedRSSFeed, self).root_attributes() attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/' return attrs def add_item_elements(self, handler, item): super(ExtendedRSSFeed, self).add_item_elements(handler, item) handler.addQuickElement(u'content:encoded', item['content_encoded'], escape=False)
And then content:encoded can be handled through the normal syndication process.
Attachments (4)
Change History (10)
by , 14 years ago
Attachment: | patch.diff added |
---|
comment:1 by , 14 years ago
I should mention, in item_content_encoded, item.content has a bunch of HTML in it...
comment:2 by , 14 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Resolution: | → invalid |
Status: | new → closed |
Based on http://web.resource.org/rss/1.0/modules/content/ content:encoded
is:
An element whose contents are the entity-encoded or CDATA-escaped version of the content of the item.
I see that you are trying to force CDATA-escaping:
def item_content_encoded(self, item): return "<![CDATA[%s]]>" % item.content
Why don't you just let Django perform the equivalent entity-encoding?
As far as I can tell, the following solution works perfectly (see screenshot):
from django.contrib.syndication.views import feedgenerator, Feed ### not touched from your example class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed): """ Create a type of RSS feed that has content:encoded elements. """ def root_attributes(self): attrs = super(ExtendedRSSFeed, self).root_attributes() attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/' return attrs def add_item_elements(self, handler, item): super(ExtendedRSSFeed, self).add_item_elements(handler, item) handler.addQuickElement(u'content:encoded', item['content_encoded']) ### customized class TestFeed(Feed): title = "test" link = "/" description = "test" feed_type = ExtendedRSSFeed def items(self): return range(3) def item_title(self, item): return "Title of item %d" % item def item_description(self, item): return "Description of item %d" % item def item_link(self, item): return "/%d/" % item def item_extra_kwargs(self, item): return {'content_encoded': '<h1>Item %d</h1><p>lorem ipsum...</p>' % item}
by , 14 years ago
Attachment: | Screen shot 2011-05-01 at 09.37.36.png added |
---|
comment:3 by , 14 years ago
I believe that what is making your example work correctly is actually safari's parsing. Check it in FireFox (you'll need to view the source, firefox doesn't show the content:encoded portion on the screen).
I modified my feed to reflect the example above. I'm attaching 2 screenshots of the source (one from safari, one from firefox). You'll see that it's autoescaped in firefox. Also, if I check the same feed in Chrome, (which doesn't have a native RSS parser so it just shows you source) I see the same result as the firefox screenshot.
comment:4 by , 14 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
comment:5 by , 14 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
As far as I can tell, everything is working properly.
In HTML, "<p> 1 < 2 </p>" is invalid, the correct version is "<p> 1 < 2 </p>". In Atom, it is the same, "<content:encoded> foo<br />bar </content:encoded>" is invalid, the correct version "<content:encoded> foo<br />bar </content:encoded>".
Your screenshot in Firefox shows correct escaping in the raw, unparsed source of your feed. Your screenshot in Safari shows that Safari has parsed the source, properly extracted the contents of the <content:encoded> tag, and has inserted it in an HTML structure for display.
The name of the tag "content:encoded" itself makes it fairly explicit that its content be encoded, and so does the spec (see my first comment). If you could insert arbitrary unescaped HTML inside "content:encoded", your RSS feed would no longer be valid XML, something RSS parsers clearly do not handle!
comment:6 by , 14 years ago
Interesting... hence why it's always needed to be wrapped in CDATA.
Thanks for clearing that up... I guess it can work either way but leaving it the way it is keeps it more generalized.
Whenever I've looked at RSS feeds, they always do the CDATA inside content:encoded elements... so I figured they were always just a hand-in-hand kind of thing. Which is why I was so surprised when I couldn't figure out how to turn off auto escaping using the feedgenerator.
Thanks again, that makes a lot of sense now.
Patch for SimplerXMLGenerator