Opened 12 years ago
Closed 5 years ago
#18517 closed Bug (wontfix)
URLField does not support url with underscore
Reported by: | guoqiao | Owned by: | nobody |
---|---|---|---|
Component: | Core (URLs) | Version: | 1.6 |
Severity: | Normal | Keywords: | URLFiled |
Cc: | guoqiao | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description (last modified by )
if your url has a '_' in it, like this:
the URLField will complain that it is not a valid url. the problem lies in the '_' symbol. I find in the source code as Following:
class URLValidator(RegexValidator): regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain... r'localhost|' #localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE)
the related part is this line:
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'
It's clear that '_' is not included in the pattern.
I think this is not reaonable for there's a lot of url has a '_' in it.
Change History (8)
comment:1 by , 12 years ago
Cc: | added |
---|---|
Component: | Uncategorized → Core (URLs) |
Easy pickings: | set |
Keywords: | URLFiled added |
Type: | Uncategorized → Bug |
comment:2 by , 12 years ago
Description: | modified (diff) |
---|
comment:3 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
The Django URLValidator is aiming to check if URLs are valid according to official rules (RFC 1034/1035) which forbid the usage of underscores in hostnames (read also http://www.quora.com/Domain-Name-System-DNS/Why-are-underscores-not-allowed-in-DNS-host-names).
If you want to relax rules for your particular usage, just subclass fields/validators and adapt them. But I don't think we should change the default validator for broken existing URLs.
About the example you provided, see https://github.com/rtfd/readthedocs.org/issues/148
comment:4 by , 12 years ago
To be honest, the URLValidator doesn't aim for strict conformance to RFC. Rather, it attempts to catch typos in URLs entered manually in the admin.
Of course, it follows the RFC whenever possible, but it's also written with real-life use cases in mind.
I agree with Claude -- but please don't write tickets about edge cases where the validator diverges from the RFC :)
follow-up: 6 comment:5 by , 11 years ago
Version: | 1.4 → 1.6 |
---|
If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).
Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.
comment:6 by , 11 years ago
Replying to slaughninja:
If the Django URLValidator is aiming to meet the RFC1034/1035 then it should be named DomainNameValidator. The RFC for URLs would be RFC1738, which kinda, sorta allows for underscores to be used unquoted (it is a reserved character).
RFC1738 may allow underscores in URLs generally, but not in the hostname.
Also, the realitity is that there hosts out there that use underscores in their name, especially in subdomains. Even if it is not standard or recommended, hosts like that are still out there (and often outside the control of the dev) and the current behaviour renders it unusable for production.
What Claude said; if you disagree, the place to argue about it is the DevelopersMailingList.
comment:7 by , 5 years ago
Resolution: | wontfix |
---|---|
Status: | closed → new |
Underscores are very much a real thing in hostnames. Please reconsider this case on the latest version of code. Many sites (especially malicious ones) may employ the use of underscores. So I am trying to use Django's validators to tell me about *actual* malformed URL's, not those that might just be typos. It's a validator afterall and IMO it should follow the standard and not have it's own standard.
comment:8 by , 5 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Reformatted, please use preview before posting the ticket.