Opened 5 months ago
Last modified 4 months ago
#35533 assigned Bug
urlize() makes a bit of a mess of links embedded in Markdown
Reported by: | Simon Willison | Owned by: | DongwookKim0823 |
---|---|---|---|
Component: | Utilities | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Adam Johnson | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I have this input text:
Annotated versions of talks I have given, with extensive notes and additional links. Here's [how I make these](https://simonwillison.net/2023/Aug/6/annotated-presentations/).
After running through Django's urlize helper function I got this:
... Here's [how I make <a href="http://these](https://simonwillison.net/2023/Aug/6/annotated-presentations/)" rel="nofollow">these](https://simonwillison.net/2023/Aug/6/annotated-presentations/)</a>
Attachments (2)
Change History (12)
by , 5 months ago
Attachment: | screenshot-of-markdown.jpg added |
---|
comment:2 by , 5 months ago
Current URLizer code is here: āhttps://github.com/django/django/blob/2719a7f8c161233f45d34b624a9df9392c86cc1b/django/utils/html.py#L258-L413
comment:3 by , 5 months ago
Triage Stage: | Unreviewed ā Accepted |
---|
Thank you for the report! Replicated š
comment:4 by , 5 months ago
Cc: | added |
---|
comment:5 by , 5 months ago
Owner: | changed from | to
---|---|
Status: | new ā assigned |
follow-ups: 7 10 comment:6 by , 5 months ago
Hey DongwookKim0823? Are you actively working on this? Do you require any help? Would love to help
comment:7 by , 5 months ago
Replying to Vaarun Sinha:
I'm currently working on this. I'll leave a comment if I need any help. Thank you! :)
comment:8 by , 5 months ago
Has patch: | set |
---|
comment:9 by , 5 months ago
Patch needs improvement: | set |
---|
by , 5 months ago
Attachment: | changes.patch added |
---|
comment:10 by , 4 months ago
Replying to Vaarun Sinha:
I made some progress on this issue, but I think collaborating on a better solution would be great. Iād love to work together and find the best approach.
The ideal output for this example would be:
Alternatively, not URLizing at all would be preferable to URLizing in a way that produces broken links.
I think what's happening here is the logic that looks for non-protocol links that include or end in .net may be kicking in, and deciding that the following is the URL that should be linked:
It's hard to suggest a fix for this. Ideally the code would "notice" that
these](https://simonwillison.net/2023
is not a valid URL, but instead the logic is deciding that it's probably valid but should havehttp://
glued on the start.Maybe we could have code that notices that
http://these](https://simonwillison.net/2023/Aug/6/annotated-presentations/)
is NOT a valid URL - you cannot have](
in the middle of the hostname portion - and hence decides not to URLize it at all.(Background: the reason I'm seeing this is that my Django SQL Dashboard software tries to URLize text it displays, but has no way of knowing if a database column contains Markdown - this broken example came from āhttps://simonwillison.net/dashboard/tags-with-descriptions/ - screenshot here: https://code.djangoproject.com/attachment/ticket/35533/screenshot-of-markdown.jpg )