Opened 15 years ago
Closed 13 years ago
#13704 closed Bug (fixed)
utils.html.urlize mishandles IDN style domain names
Reported by: | Daniel Ryan | Owned by: | nobody |
---|---|---|---|
Component: | Template system | Version: | 1.2 |
Severity: | Normal | Keywords: | IDN, urlize |
Cc: | dougal85@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The urlize function runs urlquote on the url it is processing which incorrectly handles domain names with unicode characters in them (IDN style).
To test, run urlize('http://c✶.ws'). The text of the link is fine, the href of the anchor tag is not.
Attachments (4)
Change History (13)
by , 15 years ago
Attachment: | django.diff added |
---|
comment:1 by , 15 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
comment:2 by , 15 years ago
Component: | Uncategorized → Template system |
---|---|
Has patch: | set |
Triage Stage: | Unreviewed → Accepted |
Still needs tests; also needs PEP8 (i..e, strip all that extra space around the parentheses).
comment:3 by , 14 years ago
I've reviewed the ticket when attempting to add tests. I'm not sure how to handle IDN domains. I got the following result from a quick doctest in the defaultfilters tests;
Expected: u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://c✶.ws</a>' Got: u'<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>'
Should the URL be encoded the same way for both the href and the innerHTML? I wasn't sure if it needed 'html encoding' ? When I attempted adding '<a href="http://xn--c-lgq.ws" rel="nofollow">http://xn--c-lgq.ws</a>' to a HTML file and opened it in chrome it displayed the (punny code?) characters rather than the expected unicode result
comment:5 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:6 by , 14 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
comment:7 by , 14 years ago
Patch needs improvement: | set |
---|
comment:8 by , 13 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
UI/UX: | unset |
New patch handles IDN names in various recognized forms. The visible part is left unconverted, as suggested in comment:3.
by , 13 years ago
Attachment: | 13704-4.patch added |
---|
a) Needs tests.
b) There's almost certainly no cause to catch EVERY single exception coming out of there, it should catch only the applicable errors.