Opened 13 years ago

Closed 5 years ago

#18517 closed Bug (wontfix)

URLField does not support url with underscore

Reported by: guoqiao Owned by: nobody
Component: Core (URLs) Version: 1.6
Severity: Normal Keywords: URLFiled
Cc: guoqiao Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description (last modified by Claude Paroz)

if your url has a '_' in it, like this:

the URLField will complain that it is not a valid url. the problem lies in the '_' symbol. I find in the source code as Following:

class URLValidator(RegexValidator):
    regex = re.compile(
        r'^(?:http|ftp)s?://' # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
        r'localhost|' #localhost...
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
        r'(?::\d+)?' # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)

the related part is this line:

r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'

It's clear that '_' is not included in the pattern.
I think this is not reaonable for there's a lot of url has a '_' in it.

Change History (8)

comment:1 by guoqiao, 13 years ago

Cc: guoqiao added
Component: UncategorizedCore (URLs)
Easy pickings: set
Keywords: URLFiled added
Type: UncategorizedBug

comment:2 by Claude Paroz, 13 years ago

Description: modified (diff)

Reformatted, please use preview before posting the ticket.

comment:3 by Claude Paroz, 13 years ago

Resolution: wontfix
Status: newclosed

The Django URLValidator is aiming to check if URLs are valid according to official rules (RFC 1034/1035) which forbid the usage of underscores in hostnames (read also http://www.quora.com/Domain-Name-System-DNS/Why-are-underscores-not-allowed-in-DNS-host-names).

If you want to relax rules for your particular usage, just subclass fields/validators and adapt them. But I don't think we should change the default validator for broken existing URLs.

About the example you provided, see https://github.com/rtfd/readthedocs.org/issues/148

comment:4 by Aymeric Augustin, 13 years ago

To be honest, the URLValidator doesn't aim for strict conformance to RFC. Rather, it attempts to catch typos in URLs entered manually in the admin.

Of course, it follows the RFC whenever possible, but it's also written with real-life use cases in mind.

I agree with Claude -- but please don't write tickets about edge cases where the validator diverges from the RFC :)

Last edited 13 years ago by Aymeric Augustin (previous) (diff)

comment:5 by slaughninja, 11 years ago

Version: 1.41.6

If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).

Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.

in reply to:  5 comment:6 by Shai Berger, 11 years ago

Replying to slaughninja:

If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).

RFC1738 may allow underscores in URLs generally, but not in the hostname.

Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.

What Claude said; if you disagree, the place to argue about it is the DevelopersMailingList.

comment:7 by Ramiro, 5 years ago

Resolution: wontfix
Status: closednew

Underscores are very much a real thing in hostnames. Please reconsider this case on the latest version of code. Many sites (especially malicious ones) may employ the use of underscores. So I am trying to use Django's validators to tell me about *actual* malformed URL's, not those that might just be typos. It's a validator afterall and IMO it should follow the standard and not have it's own standard.

comment:8 by Mariusz Felisiak, 5 years ago

Resolution: wontfix
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top