Opened 3 days ago
Last modified 17 minutes ago
#36014 assigned Bug
EmailValidator rejects valid email addresses in some international domains — at Version 2
Reported by: | Mike Edmunds | Owned by: | |
---|---|---|---|
Component: | Core (Other) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Claude Paroz | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
django.core.validators.EmailValidator rejects valid email addresses that use an international domain name (IDN) written in certain scripts. Example (Django 5.1.4, Python 3.12.4):
from django.core.validators import validate_email validate_email("editor@עִתוֹן.example.il") # (validates successfully) validate_email("editor@މިހާރު.example.mv") # django.core.exceptions.ValidationError: ['Enter a valid email address.']
The domain in the failing (second) example* is written in the Thaana script used for Dhivehi, the official language of the Maldives. Although both it and Hebrew (first example) are right-to-left scripts, Thaana requires Unicode markers that are supported by IDNA 2008 encoding but were prohibited in the (now obsolete) IDNA 2003 standard.
Cause: Django's EmailValidator includes a call to punycode() which ends up rejecting domains that weren't permitted by IDNA 2003. (URLValidator used to have similar code, but moved it to an unused path as part of #20003. URLValidator has allowed all IDNs—including both the ones above—since 2015.)
Suggested fix: EmailValidator should instead borrow its domain validation logic from DomainNameValidator, like URLValidator does (since Django 5.1).
Prior discussion: See #33968 and related Django developers discussion from 2022. Those proposed changing django.utils.encoding.punycode() to use IDNA 2008 encoding via the third party idna package. Besides the controversy around adding a new dependency, it's unclear whether idna.encode() would be the correct choice in all cases. IDNA 2008 rejects some domains that IDNA 2003 allowed, so switching could introduce compatibility issues with existing data. (Also, most email services and apps seem instead to be following UTS #46, which builds on IDNA 2008. But the selection of UTS #46 options varies by vendor.)
Compatibility: this shouldn't break any existing code or data. But it would start allowing email addresses that Django's SMTP EmailBackend is not currently capable of mailing, like the Dhivehi example above. (django.core.mail calls punycode()—IDNA 2003—to prepare addresses for SMTP.) #35714 could address this by enabling SMTPUTF8 and offloading IDNA encoding to the SMTP server. Also, I intend the upcoming patch for #35581 to simplify subclassing the SMTP EmailBackend so users could override IDN handling with their choice of encoding options.
* The Dhivehi word މިހާރު ("now") is the name of an actual Maldivian newspaper. They have registered މިހާރު.com, although it currently redirects to a romanization of their name and doesn't seem to have email enabled. (As Julien Bernard pointed out in the earlier discussion, one of the barriers to IDN and international email adoption is poor support by web technologies. See also ICANN's Universal Acceptance initiative and annual readiness reports.)
(Unlike the previous two tickets, I don't have a patch for this and wasn't planning to work on it. But I wanted to capture the details while all the IDNA stuff is still fresh in my mind.)