Opened 8 years ago
Last modified 10 months ago
#27029 assigned Cleanup/optimization
Make EmailValidator accept non-ASCII characters
Reported by: | Ramin Farajpour Cami | Owned by: | j-bernard |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Florian Apolloner, Carlos Palol, Collin Anderson | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
from django.core.validators import validate_email validate_email('うえあいお@email.com')
if you check this url email chacker with うえあいお@address.com , this is not valid email address ,
Thanks,
Ramin
Change History (36)
comment:1 by , 8 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
follow-up: 3 comment:2 by , 8 years ago
Has patch: | set |
---|---|
Resolution: | duplicate |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Version: | 1.10 → master |
Reopening as I have a patch which targets this specific issue.
comment:3 by , 8 years ago
Replying to claudep:
Reopening as I have a patch which targets this specific issue.
Hi,
Thanks a lot,
comment:4 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:5 by , 8 years ago
I'm just wondering, is there still a use case to keep ASCII-only validation (and hence provide validate_email_ascii
)?
comment:6 by , 8 years ago
Not sure, maybe you want to ask on the DevelopersMailingList. I guess usage might be a bit difficult until #25594 is fixed.
comment:7 by , 8 years ago
Triage Stage: | Ready for checkin → Accepted |
---|
I just tested Firefox and Chrome email validation, and they don't accept non-ASCII in the local part.
In any case, I think we should provide both validators (ASCII-only and Unicode). It might be a bit too soon to unconditionally allow Unicode in emails.
comment:8 by , 8 years ago
Patch needs improvement: | set |
---|
comment:9 by , 8 years ago
Summary: | invalid email addresses input django validate_email → Make EmailVadliator accept non-ASCII characters |
---|---|
Type: | Bug → Cleanup/optimization |
See #21859 for a documentation request to clarify the state of ASCII/Unicode email addresses in Django.
comment:10 by , 8 years ago
Summary: | Make EmailVadliator accept non-ASCII characters → Make EmailValidator accept non-ASCII characters |
---|
follow-up: 13 comment:12 by , 8 years ago
Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.
comment:13 by , 8 years ago
Replying to claudep:
Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.
Will be fix in new version 1.11? i can make suggestions to fix?
follow-up: 15 comment:14 by , 8 years ago
Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)
comment:15 by , 8 years ago
Replying to claudep:
Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)
Owesome, i have one question, you have CONTRIBUTORS LIST i can add this list? or no i should try for send a lot of report for added to list contributors?
comment:16 by , 8 years ago
Yes, you are supposed to have done significant work for Django to be listed there. Of course, that's very subjective, but filling a couple of reports isn't sufficient for that.
comment:17 by , 8 years ago
Very good, i like working with Django community always in security and see code issue for fix it,i see you write PR for this report so I can't add my name to CONTRIBUTORS LIST, :((
Thanks,
comment:18 by , 8 years ago
Hi,
I see here way Contributing to Django
https://docs.djangoproject.com/en/dev/internals/contributing/
You write PR for patch issue ,this means my name not added?
comment:20 by , 8 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Status: | new → assigned |
I've added a new patch for this.
comment:21 by , 8 years ago
Patch needs improvement: | set |
---|
I am against this patch, adding more regular expressions is the wrong way to go. I'd like to propose to change the current email validator to just check if "@" is in the address and be done with it. See also https://davidcel.is/posts/stop-validating-email-addresses-with-regex/ -- I think this is something which should have a bit of discussion on the mailing list.
comment:22 by , 8 years ago
Cc: | added |
---|
comment:23 by , 8 years ago
Ideas about simplification are discussed in #26423 and on the django-developers mailing list.
comment:24 by , 7 years ago
if we do allow non-ascii, I wonder if we should be sure the email is "printable" (not allow hidden characters like '\u200b') https://docs.python.org/3/library/stdtypes.html#str.isprintable
comment:25 by , 3 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:26 by , 2 years ago
Commenting here since #33967 has been closed as a duplicate.
Unicode in local-part is allowed by the latest standards, therefore EmailValidator
is preventing valid email addresses to be used in Django. Making the current regex allow Unicode characters instead of [0-9A-Z]
would do the trick.
#26423 won't solve this as HTML5 validator does not allow Unicode in local-part either.
comment:27 by , 2 years ago
I submitted this PR. I made it change as little as possible to at least get Unicode local-part valid.
comment:28 by , 2 years ago
Patch needs improvement: | unset |
---|
comment:29 by , 2 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:30 by , 2 years ago
The new PR seems OK™ — for strings `\w` is equivalent to [a-zA-Z0-9_]
with ASCII, and the unicode examples then pass.
I worry slightly about bringing in a host of lookalike address vulnerabilities. 🤔
I think this needs a discussion to decide the way forward.
- I'm not convinced this is really a distinct issue to #26423.
- The mailing list discussion was essentially unanimous to radically simplify here (rather than continue to tweak).
Florian's comment:21 more or less sums it up:
...propose to change the current email validator to just check if "@" is in the address and be done with it.
We've said similar with URLValidator a number of times.
I'm not sure we shouldn't (again) mark this as a duplicate of #26423, re-purpose that to simplify the validation, make sure How to customise validation shows the way forward clearly, and then close everything else in this area as wontfix
. 🤔
comment:31 by , 2 years ago
Different projects have different requirements. What about providing different validators: a simple one where only <somechar>
@<somechar>
presence is checked, a more elaborate one like the current one, and an equivalent to the previous allowing unicode. The question then is to decide which would be the default.
comment:33 by , 2 years ago
Here is a little context for my use case. In general, I create my own validator whenever it's needed to override the default Django behavior but I have a particular case where django-allauth app is used and is using the EmailValidator. In that case, I cannot easily override it.
My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.
If you don't mind implementing a more complex specific validator for internationalized email addresses it would be better to avoid using only a regex. I kept it the simplest as I could in my PR because I'm aware that changing the validator is touchy.
comment:34 by , 22 months ago
Patch needs improvement: | set |
---|
My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.
I don't think we can just swap out the current validation for a looser one. Folks will be depending on the existing behaviour.
Maybe we can ship a couple of variants, but I'm not sure what switching method we might allow. I think we need a story there in order to proceed. 🤔
I looked at django-allauth
— it's using the validate_email
instance, in various, deeply-nested places — I think an issue over there, to look at making that pluggable, is needed really. (In the meantime, one could monkey patch validate_email
with whatever validator you wanted to adjust that… — again, not something I think we can just swap out from beneath it.)
comment:35 by , 15 months ago
Cc: | added |
---|
comment:36 by , 10 months ago
Cc: | added |
---|---|
Description: | modified (diff) |
I'm just noticing today that unicode.org has recommendations on email security:
https://www.unicode.org/reports/tr39/#Email_Security_Profiles
- It must be in NFKC format
- It must have level = <restriction level> or less, from
Restriction_Level_Detection
- It must not have mixed number systems according to
Mixed_Number_Detection
- It must satisfy dot-atom-text from RFC 5322 §3.2.3, where atext is extended as follows:
- Where C ≤ U+007F, C is defined as in §3.2.3. (That is, C ∈ [!#-'*+\-/-9=?A-Z\-~]. This list copies what is already in §3.2.3, and follows HTML5 for ASCII.)
- Where C > U+007F, both of the following conditions are true:
- C has
Identifier_Status=Allowed
from General Security Profile - If C is the first character, it must be
XID_Start
from Default Identifier_Syntax in [UAX31]
- C has
It doesn't recommend which "restriction level" to use, and maybe we should allow the user to decide what level to use (defaulting to 1: ASCII-Only).
(Also, it would be nice if Python implemented "Mixed-Script Detection", "Restriction-Level Detection" and "Mixed-Number Detection" as part of unicodedata.)
Sure thing!
Duplicate of #26423