Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#15936 closed New feature (invalid)

Syndication: Turning off autoescape (content:encoded)

Reported by: Brant Steen <brant.steen@…> Owned by: nobody
Component: contrib.syndication Version: 1.3
Severity: Normal Keywords: syndication, content:encoded
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

A very common element to pull into an RSS feed is the content:encoded element. In a blog, for example, it allows you to put a full entry into your RSS feed, including the HTML in that entry (headings, paragraphs, lists, whatnot).

I couldn't find a way to get this element to work, given the current contrib.syndication module. I would do this:

class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content:encoded', item['content_encoded'])

...

class TheFeed(Feed):
    feed_type = ExtendedRSSFeed

    ....

    def item_extra_kwargs(self, item):
        return {'content_encoded': self.item_content_encoded(item)}
    
    def item_content_encoded(self, item):
        return "<![CDATA[%s]]>" % item.content    

But that would generate a feed with all of the HTML bits autoescaped... even if I put an {% autoescape off %} block in the template where the content:encoded was being pulled from. So, instead of being able to stick html tags inside the CDATA, I would just end up with a lot of &lt;h1&gt; stuff.

After drilling in and finding the SimplerXMLGenerator, it seemed like the ability to turn off autoescaping could be done at this point, without breaking anyone's current implementations (see patch). Thus, the only change to the above example becomes this:

class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content:encoded', item['content_encoded'], escape=False)

And then content:encoded can be handled through the normal syndication process.

Attachments (4)

patch.diff (814 bytes ) - added by Brant Steen <brant.steen@…> 14 years ago.
Patch for SimplerXMLGenerator
Screen shot 2011-05-01 at 09.37.36.png (81.3 KB ) - added by Aymeric Augustin 14 years ago.
Safari-RSS.jpg (162.8 KB ) - added by brant 14 years ago.
RSS feed in Safari
FireFox-RSS.jpg (140.0 KB ) - added by brant 14 years ago.
RSS in Firefox

Download all attachments as: .zip

Change History (10)

by Brant Steen <brant.steen@…>, 14 years ago

Attachment: patch.diff added

Patch for SimplerXMLGenerator

comment:1 by Brant Steen <brant.steen@…>, 14 years ago

I should mention, in item_content_encoded, item.content has a bunch of HTML in it...

comment:2 by Aymeric Augustin, 14 years ago

Needs documentation: set
Needs tests: set
Resolution: invalid
Status: newclosed

Based on http://web.resource.org/rss/1.0/modules/content/ content:encoded is:

An element whose contents are the entity-encoded or CDATA-escaped version of the content of the item.

I see that you are trying to force CDATA-escaping:

    def item_content_encoded(self, item):
        return "<![CDATA[%s]]>" % item.content  

Why don't you just let Django perform the equivalent entity-encoding?

As far as I can tell, the following solution works perfectly (see screenshot):

from django.contrib.syndication.views import feedgenerator, Feed

### not touched from your example

class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content:encoded', item['content_encoded'])

### customized

class TestFeed(Feed):
    title = "test"
    link = "/"
    description = "test"
    feed_type = ExtendedRSSFeed

    def items(self):
        return range(3)

    def item_title(self, item):
        return "Title of item %d" % item

    def item_description(self, item):
        return "Description of item %d" % item

    def item_link(self, item):
        return "/%d/" % item

    def item_extra_kwargs(self, item):
        return {'content_encoded': '<h1>Item %d</h1><p>lorem ipsum...</p>' % item}

by Aymeric Augustin, 14 years ago

comment:3 by brant, 14 years ago

I believe that what is making your example work correctly is actually safari's parsing. Check it in FireFox (you'll need to view the source, firefox doesn't show the content:encoded portion on the screen).

I modified my feed to reflect the example above. I'm attaching 2 screenshots of the source (one from safari, one from firefox). You'll see that it's autoescaped in firefox. Also, if I check the same feed in Chrome, (which doesn't have a native RSS parser so it just shows you source) I see the same result as the firefox screenshot.

by brant, 14 years ago

Attachment: Safari-RSS.jpg added

RSS feed in Safari

by brant, 14 years ago

Attachment: FireFox-RSS.jpg added

RSS in Firefox

comment:4 by anonymous, 14 years ago

Resolution: invalid
Status: closedreopened

comment:5 by Aymeric Augustin, 14 years ago

Resolution: invalid
Status: reopenedclosed

As far as I can tell, everything is working properly.

In HTML, "<p> 1 < 2 </p>" is invalid, the correct version is "<p> 1 &lt; 2 </p>". In Atom, it is the same, "<content:encoded> foo<br />bar </content:encoded>" is invalid, the correct version "<content:encoded> foo&lt;br /&gt;bar </content:encoded>".

Your screenshot in Firefox shows correct escaping in the raw, unparsed source of your feed. Your screenshot in Safari shows that Safari has parsed the source, properly extracted the contents of the <content:encoded> tag, and has inserted it in an HTML structure for display.

The name of the tag "content:encoded" itself makes it fairly explicit that its content be encoded, and so does the spec (see my first comment). If you could insert arbitrary unescaped HTML inside "content:encoded", your RSS feed would no longer be valid XML, something RSS parsers clearly do not handle!

comment:6 by Brant Steen <brant.steen@…>, 14 years ago

Interesting... hence why it's always needed to be wrapped in CDATA.

Thanks for clearing that up... I guess it can work either way but leaving it the way it is keeps it more generalized.

Whenever I've looked at RSS feeds, they always do the CDATA inside content:encoded elements... so I figured they were always just a hand-in-hand kind of thing. Which is why I was so surprised when I couldn't figure out how to turn off auto escaping using the feedgenerator.

Thanks again, that makes a lot of sense now.

Note: See TracTickets for help on using tickets.
Back to Top