Opened 12 years ago
Closed 10 years ago
#20264 closed Bug (invalid)
URLValidator should allow underscores in local hostname
Reported by: | Arthur Debert | Owned by: | nobody |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | bmispelon@…, Simon Charette | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
Underscores are valid in local hostnames, as per RFC 1178.
The current validator is too strict. Local hostname can include (and also start) with an underscore.
Change History (10)
comment:1 by , 12 years ago
Has patch: | set |
---|
comment:2 by , 12 years ago
Cc: | added |
---|---|
Resolution: | → invalid |
Status: | new → closed |
Hi,
RFC 1178 says "Don't use non-alphanumeric characters in a name" and according to its first paragraph, it's merely a set of guidelines.
For more technical details, the wikipedia article on hostnames has some good references: https://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names
The valid characters according to RFC 952 are [a-z0-9-], which is what django uses.
comment:3 by , 12 years ago
Hi
I've ran into this from such URLs we are seening in the wild. A great number of tools ( popular browsers, curl, wget , BIND) will allow for such addresses (underscore in the local part). I wonder if a de-facto standard could be in place here?
This stackoverflow answer gives a different reading from RFC 2181 http://stackoverflow.com/a/2183140/65490 .
Thank you
comment:4 by , 12 years ago
Resolution: | invalid |
---|---|
Status: | closed → new |
My apologies, it seems I got confused between host names and domain names, which have different sets of allowed characters (the domain names do allow underscores, as noted in the wikipedia article that I linked to).
I'm reopening this but leaving it as Unreviewed
for now.
You seem to have a good case that the current regex is incorrect, but I want to double-check.
Thanks.
comment:5 by , 12 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
An underscore is not valid as per RFC-952 (we are talking hostname in URLs and not DNS records, I am aware that DNS records like srv require/allow underscores). RFC-1123 section 2.1 later on relaxed that but still didn't allow an underscore. RFC-2821 increased the length, but again, no adding of underscore.
To answer:
I've ran into this from such URLs we are seening in the wild. A great number of tools ( popular browsers, curl, wget , BIND) will allow for such addresses (underscore in the local part).
Browsers do whatever they think is best, from their point of view it obviously makes no sense to disallow underscores. BIND obviously has no reason to disallow it either since it's valid for DNS records (yes, most likely even for A-records, which still doesn't make them valid with regard to the HTTP rfc + related ones)
comment:6 by , 11 years ago
FWIW
per http://networkadminkb.com/KB/a156/windows-2003-dns-and-the-underscore.aspx
"Is the underscore a supported character for DNS hostnames (my_hostname.domain.com)?
Yes, RFC 2181 added support for the underscore and other non-English characters. Prior to RFC 2181, RFC 1035 explicitly limited the character for use in hostnames to English letters, numbers, and a hyphen."
comment:7 by , 11 years ago
Cc: | added |
---|
comment:8 by , 11 years ago
Did you read RFC 2181? That link appears to be opportunistic MSFT interpretation of RFC 2181 to address NETBIOS naming. See http://technet.microsoft.com/en-us/library/cc959336.aspx for more of that justification. RFC 2181 doesn't specifically say anything about changing what is valid in a hostname it just talks about what can be in a DNS entry and completely punts on whether clients will or should accept it. Compare that to RFC 1123 which is very specific on its subject and it doesn't look to me like 2181 is reasonably interpreted as overriding 1123 in general. OTOH 1123 is completely clear that it updates 952.
comment:9 by , 10 years ago
Has patch: | unset |
---|---|
Resolution: | invalid |
Status: | closed → new |
Since all major browsers support underscores, and DNS allows underscore domains, it's bound that when URLValidator is used to validate a user's input it will fail unexpectedly from the user's perspective.
Strictly abiding by the RFC with complete disregard to pragmatism makes no sense. The least this highly opinionated class can offer is a strict flag.
comment:10 by , 10 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Please see TicketClosingReasons/DontReopenTickets, thanks!
Pull request sent from https://github.com/arthur-debert/django/tree/ticket_20264