Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#4430 closed (fixed)

[unicode] Syndication framework cannot handle unicode description

Reported by: bugs@… Owned by: Malcolm Tredinnick
Component: contrib.syndication Version: other branch
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Malcolm Tredinnick)

I have object with content attribute, where I have non-ascii data. For both cases (either specifying {{ obj.content }} in description template or by adding method

    def __unicode__(self):
        return smart_unicode(self.content)

), I got UnicodeDecodeError when trying to display feed:

UnicodeDecodeError at /feeds/wiki/
'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Request Method: 	GET
Request URL: 	        http://rpgpedia.cz/feeds/wiki/
Exception Type: 	UnicodeDecodeError
Exception Value: 	'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Exception Location: 	/usr/lib/python2.5/codecs.py in write, line 303

Local variables show object codecs is trying to decode:

u'\xdasp\u011bch zna\u010d\xed zd\xe1rn\xe9 zavr\u0161en\xed akce, kter\xe1 je p\u0159edm\u011btem testov\xe1n\xed. Je pot\u0159ebn\xfd zejm\xe9na tehdy, kdy\u017e se n\u011bkter\xe1 ((rp postava postava)) nebo jin\xfd element v ((rp rolova_hra rolov\xe9 h\u0159e)) sna\u017e\xed n\u011bco ud\u011blat, n\u011bco zd\xe1rn\u011b zavr\u0161it, nebo n\u011bjak\xfdm zp\u016fsobem zvr\xe1tit situaci ve sv\u016fj prosp\u011bch.\r\n\r\nNakl\xe1d\xe1n\xed s \xfasp\u011bchem z\xe1vis\xed od ((rp pravidla pravidel)) hry. V n\u011bkter\xfdch hr\xe1ch je d\u016fle\u017eit\xfd tak\xe9 po\u010det \xfasp\u011bch\u016f (pokud jich m\u016f\u017ee hr\xe1\u010d v testu dos\xe1hnout v\xedce), v jin\xfdch hr\xe1ch je podstatn\xe9 jenom to, jestli hr\xe1\u010d v ((rp test testu)) usp\u011bje, nebo ne.\r\n\r\nV prvn\xedm p\u0159\xedpad\u011b m\u016f\u017ee nav\xedc p\u0159i v\xfdsledku konfliktn\xed akce mezi dv\u011bma nebo v\xedce postavami (nebo elementy) b\xfdt rozhoduj\xedc\xed i po\u010det \xfasp\u011bch\u016f jednotliv\xfdch postav a ta s nejvy\u0161\u0161\xedm po\u010dtem \xfasp\u011bch\u016f pak v dan\xe9m konfliktu zpravidla v\xedt\u011bz\xed.\r\n\r\nV n\u011bkter\xfdch hr\xe1ch existuje t\xe9\u017e """tot\xe1ln\xed \xfasp\u011bch""" (jak\xe1si zes\xedlen\xe1 varianta \xfasp\u011bchu obvykle s \u0159\xe1dov\u011b ni\u017e\u0161\xed pravd\u011bpodobnost\xed) vedouc\xed zpravida k v\xfdkon\u016fm \u010di ud\xe1lostem, kter\xe9 by za norm\xe1ln\xedch okolnost\xed byly (t\xe9m\u011b\u0159) nemo\u017en\xe9.'

( = normal unicode string, which has no problem when encoding with s.encode('utf-8')

Attachments (1)

rss-unicode.patch (5.9 KB ) - added by Almad 18 years ago.
patch fixing unicode description issues.

Download all attachments as: .zip

Change History (8)

comment:1 by michal@…, 18 years ago

I have similar problem (not exactly same, but it's in relation with RSS framework and strings in Czech language and UTF-8).

When I try to fetch RSS feed, I get this error:

UnicodeDecodeError at /rss/aktualni-zpravy/
'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
Request Method: 	GET
Request URL: 	http://127.0.0.1:8000/rss/aktualni-zpravy/
Exception Type: 	UnicodeDecodeError
Exception Value: 	'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
Exception Location: 	/usr/local/lib/python2.4/codecs.py in write, line 178
Traceback (innermost last)

Traceback (most recent call last):
File "/usr/local/lib/python2.4/site-packages/django/core/handlers/base.py" in get_response
  77. response = callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python2.4/site-packages/django/contrib/syndication/views.py" in feed
  24. feedgen.write(response, 'utf-8')
File "/usr/local/lib/python2.4/site-packages/django/utils/feedgenerator.py" in write
  136. self.write_items(handler)
File "/usr/local/lib/python2.4/site-packages/django/utils/feedgenerator.py" in write_items
  160. handler.addQuickElement(u"title", item['title'])
File "/usr/local/lib/python2.4/site-packages/django/utils/xmlutils.py" in addQuickElement
  13. self.characters(contents)
File "/usr/local/lib/python2.4/site-packages/_xmlplus/sax/saxutils.py" in characters
  309. writetext(self._out, content)
File "/usr/local/lib/python2.4/site-packages/_xmlplus/sax/saxutils.py" in writetext
  188. stream.write(escape(text, entities))
File "/usr/local/lib/python2.4/codecs.py" in write
  178. data, consumed = self.encode(object, self.errors)

  UnicodeDecodeError at /rss/aktualni-zpravy/
  'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

It looks like the RSS framework wrong handle items. I looked into Django source, into file django/utils/feedgenerator.py and change code from line 160 to:

...

from django.utils.encoding import smart_unicode
handler.addQuickElement(u"title", smart_unicode(item['title']))
handler.addQuickElement(u"link", smart_unicode(item['link']))
if item['description'] is not None:
    handler.addQuickElement(u"description", smart_unicode(item['description']))

# Author information.
if item["author_name"] and item["author_email"]:
    handler.addQuickElement(u"author", u"%s (%s)" % \
        (smart_unicode(item['author_email']), smart_unicode(item['author_name'])))
elif item["author_email"]:
    handler.addQuickElement(u"author", smart_unicode(item["author_email"]))
elif item["author_name"]:
    handler.addQuickElement(u"dc:creator", smart_unicode(item["author_name"]), {"xmlns:dc": u"http://purl.org/dc/elements/1.1/"})

if item['pubdate'] is not None:
    handler.addQuickElement(u"pubDate", rfc2822_date(item['pubdate']).decode('ascii'))
if item['comments'] is not None:
    handler.addQuickElement(u"comments", smart_unicode(item['comments']))
if item['unique_id'] is not None:
    handler.addQuickElement(u"guid", smart_unicode(item['unique_id']))

# Enclosure.
if item['enclosure'] is not None:
    handler.addQuickElement(u"enclosure", '',
        {u"url": item['enclosure'].url, u"length": item['enclosure'].length,
            u"type": item['enclosure'].mime_type})

# Categories.
for cat in item['categories']:
    handler.addQuickElement(u"category", smart_unicode(cat))
...

In every call of handler.addQuickElement I used smart_unicode function to recode content. Now my RSS feed is running.

Maybe there is need to make patch do something similar in the RSS framework?

comment:2 by Almad, 18 years ago

Has patch: set

Fixed like michal pointed out + fix also other classes.

Adding patch.

by Almad, 18 years ago

Attachment: rss-unicode.patch added

patch fixing unicode description issues.

comment:3 by Malcolm Tredinnick, 18 years ago

Description: modified (diff)
Triage Stage: UnreviewedAccepted

(fixed description formatting)

The patch goes a bit too far. We should never be applying smart_unicode() to anything is a URL. If they aren't already in ASCII, it's a bug on the client code's side (they should be using things like iri_to_uri() at the appropriate moments).

I'm having a bit of trouble understanding the original report, because smart_unicode() does work on the string you posted and you don't include what's in the traceback leading up to the error.

If the patch fixes it for you, can you just drop in a comment saying so? I'll apply a version of this patch anyway, since it mostly fixes some places that have been overlooked (thanks for testing that, both of you), but I would like some confirmation that it is fixing the original report as well.

comment:4 by Malcolm Tredinnick, 18 years ago

Owner: changed from Adrian Holovaty to Malcolm Tredinnick

Okay, the original bug report does make sense (that is, I can repeat it) if the string passed in is a UTF-8 bytestring that uses non-ASCII characters.

I'll commit a modified patch shortly that takes care of the IRI -> URI mapping as well.

comment:5 by Malcolm Tredinnick, 18 years ago

Resolution: fixed
Status: newclosed

(In [5389]) unicode: Fixed #4430 -- Handle bytestrings and IRIs more robustly in feed
production. Thanks to Almad and Michal@… for some good debugging here.

comment:6 by Malcolm Tredinnick, 18 years ago

(In [5400]) unicode: Reverted [5388] and fixed the problem in a different way. Checked
every occurrence of smart_unicode() and force_unicode() that was not previously
a str() call, so hopefully the problems will not reoccur. Fixed #4447. Refs #4435, #4430.

comment:7 by michal@…, 18 years ago

Works for me, thank you.

Note: See TracTickets for help on using tickets.
Back to Top