Opened 18 years ago

Closed 18 years ago

Last modified 18 years ago

#3664 closed (fixed)

UnicodeDecodeError in contrib/syndication/feeds.py

Reported by: Ville Säävuori <Ville@…> Owned by: Jacob
Component: Documentation Version: dev
Severity: Keywords: unicode
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I'm using contrib.syndication for making feeds for Flickr photos and Ma.gnolia links that both have tags which have funky characters (tags like 'pärnu' and 'työ'). Django dies with UnicodeDecodeError when trying to make a feed that has url with funky characters.

The error message is:

UnicodeDecodeError at /syndicate/tag/pärnu/
'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

...

Exception Location:  	/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/contrib/syndication/feeds.py in add_domain, line 9

add_domain function is very simple, and the problem seems to be with line that is:

url = u'http://%s%s' % (domain, url)

I tested this and found that when decoding the url with latin1 (iso-8859-1) like:

url = u'http://%s%s' % (domain, url.decode('latin1'))

but I'm not very confident of this being a good fix for this.

Attachments (1)

fix.diff (654 bytes ) - added by Gary Wilson <gary.wilson@…> 18 years ago.
wording fix

Download all attachments as: .zip

Change History (13)

comment:1 by Simon G. <dev@…>, 18 years ago

Triage Stage: UnreviewedAccepted

This looks to be another unicode issue that we're going to look into after 0.96 is released.

comment:2 by Ville Säävuori <Ville@…>, 18 years ago

I wrote a workaround for myself for this. Details are at http://www.unessa.net/en/hoyci/2007/03/unicode-and-django-rss-framework/

It would have been better to write a good patch to resolve the problem and not it's causes, but I'm still not really sure how this should be fixed "right".

comment:3 by Malcolm Tredinnick, 18 years ago

Component: RSS frameworkDocumentation
Owner: changed from Adrian Holovaty to Jacob

This is a documentation bug, rather than a code bug.

Anything you pass up as a link, including things returned from item_link() in syndication classes and get_absolute_urls() on models, must already be in the character set specified in RFC 1738 (the URL spec). So you must already have done the necessary conversion from non-ASCII characters to ASCII and called urllib.quote() if necessary. In the above example, you are passing non-ASCII characters to something expecting content for a URL, so it is failing.

We cannot perform the conversion to utf-8 and/or url quoting, because, for example, the standard IRI -> URI conversion process is that you convert first and then quote(), so we don't want to accidently do it twice (and there are lots of other places where get_absolute_url() needs to already be returning the correctly quoted string).

I will update the documentation.

comment:4 by Malcolm Tredinnick, 18 years ago

Resolution: fixed
Status: newclosed

(In [5250]) Fixed #3664 -- Documented that get_absolute_url() and item_link() (in
syndication) links are expected to be strings that can be used in URLs without
further quoting or encoding.

by Gary Wilson <gary.wilson@…>, 18 years ago

Attachment: fix.diff added

wording fix

comment:5 by Gary Wilson <gary.wilson@…>, 18 years ago

Has patch: set
Resolution: fixed
Status: closedreopened
Triage Stage: AcceptedReady for checkin

comment:6 by Gary Wilson <gary.wilson@…>, 18 years ago

Has patch: unset
Resolution: fixed
Status: reopenedclosed
Triage Stage: Ready for checkinAccepted

oops, wrong ticket number mentioned in [5250]

comment:7 by Julian, 18 years ago

I can't see how this is fixed now. Still makes errors for me, I have quoted everything correctly but feeds.py still seems to get in trouble because of the request URL containing urlencoded unicode.

Why is it even

url = u'http://%s%s' % (domain, url)

and not

url = 'http://%s%s' % (domain, url)

if the urls shouldnt be unicode??

comment:8 by Malcolm Tredinnick, 18 years ago

It sounds like you haven't fully URL and IRI encoded your "url" fragment. Please ask support questions on the mailing list (django-users), though, rather than in Trac.

comment:9 by anonymous, 18 years ago

Needs tests: set
Patch needs improvement: set

I still have this error, I think the ticket should be reopened.
From what I can tell the error has nothing to do with fully encoding your url fragments and so on. The problem seems to be that the feed object gets a somehow not URL-quoted feed_url where it says

    def __init__(self, slug, feed_url):

when I do a print feed_url it does not show me a URL which is "ASCII and URL-quoted". So the part after

# 'url' must already be ASCII and URL-quoted, so no need for encoding

throws an error. Maybe no one ever discovered the bug because you don't have to do with foreign-language sites!?

comment:10 by James Bennett, 18 years ago

Please read the Unicode URI/IRI documentation carefully; if you have Unicode inside URLs, you are responsible for ensuring that you call the proper function to escape it before handing it off to anything else. If you have further questions, please follow Malcolm's suggestion and ask them on the django-users mailing list.

comment:11 by anonymous, 18 years ago

That would mean I can't use the feeds as described in the docs!?
The request URL has encoded and quoted Unicode, so what can I do when it is passed wrong to the feed object which throws an error?
All my other URLs are completely correct.

comment:12 by Malcolm Tredinnick, 18 years ago

We have asked a number of times in the comments to please ask questions on the django-users list. You can post an example of how your code is generating the URL and what the problem is. The lack of examples you have provided makes it impossible to debug anything and Trac is not a good place to have support and debugging conversations. Certainly the earlier examples in this ticket were cases of bad user code, rather than a bug in Django, and yours may well be similar.

Post to django-users. Give an example of what the URL string is and how you are generating it. Then you will get help with fixing it.

Note: See TracTickets for help on using tickets.
Back to Top