Opened 8 years ago

Last modified 10 months ago

#27029 assigned Cleanup/optimization

Make EmailValidator accept non-ASCII characters

Reported by: Ramin Farajpour Cami Owned by: j-bernard
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: Florian Apolloner, Carlos Palol, Collin Anderson Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Collin Anderson)

from django.core.validators import validate_email
validate_email('うえあいお@email.com')

if you check this url email chacker with うえあいお@address.com , this is not valid email address ,

Thanks,
Ramin

Change History (36)

comment:1 by Claude Paroz, 8 years ago

Resolution: duplicate
Status: newclosed

Sure thing!
Duplicate of #26423

comment:2 by Claude Paroz, 8 years ago

Has patch: set
Resolution: duplicate
Status: closednew
Triage Stage: UnreviewedAccepted
Version: 1.10master

Reopening as I have a patch which targets this specific issue.

in reply to:  2 comment:3 by Ramin Farajpour Cami, 8 years ago

Replying to claudep:

Reopening as I have a patch which targets this specific issue.

Hi,

Thanks a lot,

comment:4 by Tim Graham, 8 years ago

Triage Stage: AcceptedReady for checkin

comment:5 by Claude Paroz, 8 years ago

I'm just wondering, is there still a use case to keep ASCII-only validation (and hence provide validate_email_ascii)?

comment:6 by Tim Graham, 8 years ago

Not sure, maybe you want to ask on the DevelopersMailingList. I guess usage might be a bit difficult until #25594 is fixed.

comment:7 by Claude Paroz, 8 years ago

Triage Stage: Ready for checkinAccepted

I just tested Firefox and Chrome email validation, and they don't accept non-ASCII in the local part.

In any case, I think we should provide both validators (ASCII-only and Unicode). It might be a bit too soon to unconditionally allow Unicode in emails.

comment:8 by Claude Paroz, 8 years ago

Patch needs improvement: set

comment:9 by Tim Graham, 8 years ago

Summary: invalid email addresses input django validate_emailMake EmailVadliator accept non-ASCII characters
Type: BugCleanup/optimization

See #21859 for a documentation request to clarify the state of ASCII/Unicode email addresses in Django.

comment:10 by Tim Graham, 8 years ago

Summary: Make EmailVadliator accept non-ASCII charactersMake EmailValidator accept non-ASCII characters

comment:11 by Ramin Farajpour Cami, 8 years ago

Hi Tim,

Please merge PR ,
Thanks

comment:12 by Claude Paroz, 8 years ago

Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.

in reply to:  12 comment:13 by Ramin Farajpour Cami, 8 years ago

Replying to claudep:

Hey RaminFP,
I plan to come with a new patch, with two versions of the email validator. I don't think that non-ASCII local parts of email addresses are widespread enough to set it by default. The idea is that you could easily opt-in for the Unicode validator of your choice when you define the field in your models.

Will be fix in new version 1.11? i can make suggestions to fix?

comment:14 by Claude Paroz, 8 years ago

Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)

in reply to:  14 comment:15 by Ramin Farajpour Cami, 8 years ago

Replying to claudep:

Yes, the plan is clearly to include that in 1.11. We still have some months ahead :-)

Owesome, i have one question, you have CONTRIBUTORS LIST i can add this list? or no i should try for send a lot of report for added to list contributors?

comment:16 by Claude Paroz, 8 years ago

Yes, you are supposed to have done significant work for Django to be listed there. Of course, that's very subjective, but filling a couple of reports isn't sufficient for that.

comment:17 by Ramin Farajpour Cami, 8 years ago

Very good, i like working with Django community always in security and see code issue for fix it,i see you write PR for this report so I can't add my name to ​CONTRIBUTORS LIST, :((

Thanks,

comment:18 by Ramin Farajpour Cami, 8 years ago

Hi,
I see here way Contributing to Django
https://docs.djangoproject.com/en/dev/internals/contributing/
You write PR for patch issue ,this means my name not added?

comment:19 by Tim Graham, 8 years ago

Yes, we add to AUTHORS based on code contributions not bug reports.

comment:20 by Wout De Puysseleir, 8 years ago

Owner: changed from Ramin Farajpour Cami to Wout De Puysseleir
Patch needs improvement: unset
Status: newassigned

PR

I've added a new patch for this.

comment:21 by Florian Apolloner, 8 years ago

Patch needs improvement: set

I am against this patch, adding more regular expressions is the wrong way to go. I'd like to propose to change the current email validator to just check if "@" is in the address and be done with it. See also https://davidcel.is/posts/stop-validating-email-addresses-with-regex/ -- I think this is something which should have a bit of discussion on the mailing list.

comment:22 by Florian Apolloner, 8 years ago

Cc: Florian Apolloner added

comment:23 by Tim Graham, 8 years ago

Ideas about simplification are discussed in #26423 and on the django-developers mailing list.

comment:24 by Collin Anderson, 7 years ago

if we do allow non-ascii, I wonder if we should be sure the email is "printable" (not allow hidden characters like '\u200b') https://docs.python.org/3/library/stdtypes.html#str.isprintable

comment:25 by Mariusz Felisiak, 3 years ago

Owner: Wout De Puysseleir removed
Status: assignednew

comment:26 by j-bernard, 2 years ago

Commenting here since #33967 has been closed as a duplicate.

Unicode in local-part is allowed by the latest standards, therefore EmailValidator is preventing valid email addresses to be used in Django. Making the current regex allow Unicode characters instead of [0-9A-Z] would do the trick.

#26423 won't solve this as HTML5 validator does not allow Unicode in local-part either.

Last edited 2 years ago by j-bernard (previous) (diff)

comment:27 by j-bernard, 2 years ago

I submitted this PR. I made it change as little as possible to at least get Unicode local-part valid.

comment:28 by Jacob Walls, 2 years ago

Patch needs improvement: unset

Improvement flag was set on prior PR proposing additional regular expressions. Current PR simplifies the existing one per comment.

comment:29 by Mariusz Felisiak, 2 years ago

Owner: set to j-bernard
Status: newassigned

comment:30 by Carlton Gibson, 2 years ago

The new PR seems OK™ — for strings `\w` is equivalent to [a-zA-Z0-9_] with ASCII, and the unicode examples then pass.

I worry slightly about bringing in a host of lookalike address vulnerabilities. 🤔

I think this needs a discussion to decide the way forward.

  1. I'm not convinced this is really a distinct issue to #26423.
  2. The mailing list discussion was essentially unanimous to radically simplify here (rather than continue to tweak).

Florian's comment:21 more or less sums it up:

...propose to change the current email validator to just check if "@" is in the address and be done with it.

We've said similar with URLValidator a number of times.

I'm not sure we shouldn't (again) mark this as a duplicate of #26423, re-purpose that to simplify the validation, make sure How to customise validation shows the way forward clearly, and then close everything else in this area as wontfix. 🤔

Last edited 2 years ago by Carlton Gibson (previous) (diff)

comment:31 by Claude Paroz, 2 years ago

Different projects have different requirements. What about providing different validators: a simple one where only <somechar>@<somechar> presence is checked, a more elaborate one like the current one, and an equivalent to the previous allowing unicode. The question then is to decide which would be the default.

comment:32 by Carlton Gibson, 2 years ago

I think that sounds quite reasonable Claude.

comment:33 by j-bernard, 2 years ago

Here is a little context for my use case. In general, I create my own validator whenever it's needed to override the default Django behavior but I have a particular case where django-allauth app is used and is using the EmailValidator. In that case, I cannot easily override it.
My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.

If you don't mind implementing a more complex specific validator for internationalized email addresses it would be better to avoid using only a regex. I kept it the simplest as I could in my PR because I'm aware that changing the validator is touchy.

comment:34 by Carlton Gibson, 22 months ago

Patch needs improvement: set

My suggestion would then be to make the default validator more permissive to get some flexibility in the kind of use case that I have. One can still include another validation layer on top of that.

I don't think we can just swap out the current validation for a looser one. Folks will be depending on the existing behaviour.

Maybe we can ship a couple of variants, but I'm not sure what switching method we might allow. I think we need a story there in order to proceed. 🤔

I looked at django-allauth — it's using the validate_email instance, in various, deeply-nested places — I think an issue over there, to look at making that pluggable, is needed really. (In the meantime, one could monkey patch validate_email with whatever validator you wanted to adjust that… — again, not something I think we can just swap out from beneath it.)

comment:35 by Carlos Palol, 15 months ago

Cc: Carlos Palol added

comment:36 by Collin Anderson, 10 months ago

Cc: Collin Anderson added
Description: modified (diff)

I'm just noticing today that unicode.org has recommendations on email security:

https://www.unicode.org/reports/tr39/#Email_Security_Profiles

  • It must be in NFKC format
  • It must have level = <restriction level> or less, from Restriction_Level_Detection
  • It must not have mixed number systems according to Mixed_Number_Detection
  • It must satisfy dot-atom-text from RFC 5322 §3.2.3, where atext is extended as follows:
    • Where C ≤ U+007F, C is defined as in §3.2.3. (That is, C ∈ [!#-'*+\-/-9=?A-Z\-~]. This list copies what is already in §3.2.3, and follows HTML5 for ASCII.)
    • Where C > U+007F, both of the following conditions are true:
      • C has Identifier_Status=Allowed from General Security Profile
      • If C is the first character, it must be XID_Start from Default Identifier_Syntax in [UAX31]

It doesn't recommend which "restriction level" to use, and maybe we should allow the user to decide what level to use (defaulting to 1: ASCII-Only).

(Also, it would be nice if Python implemented "Mixed-Script Detection", "Restriction-Level Detection" and "Mixed-Number Detection" as part of unicodedata.)

Note: See TracTickets for help on using tickets.
Back to Top