#33968 closed New feature (wontfix)
Make EmailValidator and URLValidator IDNA 2008 compliant
Reported by: | j-bernard | Owned by: | nobody |
---|---|---|---|
Component: | Core (Other) | Version: | 4.0 |
Severity: | Normal | Keywords: | IDNA EAI EmailValidator UrlValidator RFC |
Cc: | Florian Apolloner | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This ticket is the second of a list of tickets aiming at bringing Email Address Internationalization (EAI) compliance to Django by supporting International Domain Name (IDN) with regards to the latest standard (IDNA 2008) and fixing some processing on internationalized domains or email addresses.
Previous ticket: #33967
Domain validation is not fully compliant with IDNA 2008 (either in EmailValidator or UrlValidator) as defined in RFC5891 section-4.2
A domain name cannot be validated properly with a regex, therefore, an IDN validation should be performed with an appropriate library.
The current validation ignores IDNA errors. Instead, IDNA should be used for domain validation and the regex validation should be skipped for domains as it may lack some specific rules and then end up with invalid domains being accepted.
Moreover, the current validation is made by performing a conversion to A-Label with the Python encodings.idna
module which implements a deprecated standard (IDNA 2003).
This conversion should be made IDNA 2008 compliant. The most used Python IDNA 2008 package is idna, which is among the most downloaded Python packages according to PyPI (4th as for the current month) and referred in the official Python documentation.
Change History (4)
comment:1 by , 2 years ago
Cc: | added |
---|---|
Component: | Core (Mail) → Core (Other) |
Resolution: | → wontfix |
Status: | new → closed |
Type: | Uncategorized → New feature |
comment:3 by , 6 months ago
As of July 2024, some major email platforms and apps were inconsistent on which version of IDNA to follow. Gmail and Microsoft's Outlook.com appear to still use IDNA 2003 (or maybe Unicode UTS #46 transitional) to encode recipient domains. Apple's Mail apps and Thunderbird use IDNA 2008 (or UTS #46 non-transitional).
Django's EmailValidator can incorrectly reject domains that are now valid (under IDNA 2008) but were disallowed by IDNA 2003. (Example: editor@މިހާރު.example.mv
.) This is arguably a bug, but I haven't been able to find any real-world cases of such domains being used for email.
On the URLValidator side, things seem more clear. Browsers follow WHATWG's URL standard, which specifies that:
This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure.
Django's URLValidator correctly allows domains that are valid under UTS #46 ("Unicode IDNA"), and it has since #20003 was fixed in 2015. (The apparent use of IDNA 2003/punycode() in URLValidator is in dead code; I'll open a separate ticket to clean that up.)
It's true that the regular expressions in URLValidator and DomainNameValidator could also allow some strings that are not technically valid domains. But there isn't currently any complete Python implementation of UTS #46. (The idna package provides a partial implementation, and it rejects domains that WHATWG would allow.)
Given all that, it seems better for URLValidator to allow some invalid domains, rather than incorrectly rejecting some valid ones. Here are a couple of real world URLs that both browsers and URLValidator (correctly) allow:
https://މިހާރު.com
- domain valid under IDNA 2008 but not IDNA 2003 (the name of a Maldivian newspaper in the local language and script, redirects to a Romanized version of their name)http://👓.ws
- domain valid under UTS #46 but not IDNA 2008 (emoji domain owned by an eyeglasses retailer)
Thanks for this ticket, however adding a new dependency is always controversial and it isn't a light decision so a strong consensus on the mailing list is required. Please first start a discussion on the DevelopersMailingList, where you'll reach a wider audience and see what other think, and follow the guidelines with regards to requesting features.
Personally, I don't think it's worth complexity. My initial response would be similar to the Python's, i.e. "If you need the IDNA 2008 standard from RFC 5891 and RFC 5895, use a third-party validator".