#21379 closed Bug (fixed)
Add support for Unicode usernames
Reported by: | anonymous | Owned by: | nobody |
---|---|---|---|
Component: | contrib.auth | Version: | dev |
Severity: | Normal | Keywords: | 1.10 |
Cc: | jorgecarleitao | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
class AbstractUser(AbstractBaseUser, PermissionsMixin): """ An abstract base class implementing a fully featured User model with admin-compliant permissions. Username, password and email are required. Other fields are optional. """ username = models.CharField(_('username'), max_length=30, unique=True, help_text=_('Required. 30 characters or fewer. Letters, numbers and ' '@/./+/-/_ characters'), validators=[ validators.RegexValidator(re.compile('^[\w.@+-]+$'), _('Enter a valid username.'), 'invalid') ])
re.compile should use re.U
Attachments (1)
Change History (23)
comment:1 by , 11 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:2 by , 11 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
The above regexp doesn't do what it seems to be doing: on Python 2, [\w.@+-]
is equivalent to [a-zA-Z0-9.@+-]
.
In Python 3, this will also match all "accented" characters.
The regexp should be updated for consistency between versions:
- If the goal is to allow any letter-like char, add
re.UNICODE
to there.compile
call - If it should instead only allow ascii letters, the simplest way would be to use the explicit regexp, since the
re.ASCII
flag exists only in Py3.
Two examples, showing that the "jean-rené" username is accepted in Python 3 and not in Python 2:
On Python 3:
>>> import re >>> from django.core import validators >>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid') >>> v('foo.bar') >>> v('foo bar') Traceback (most recent call last): File "<console>", line 1, in <module> File "/home/xelnor/dev/venvs/bluesys-tools-py3/lib/python3.3/site-packages/django/core/validators.py", line 39, in __call__ raise ValidationError(self.message, code=self.code) django.core.exceptions.ValidationError: ['Enter a valid username.'] >>> v('jean-rené')
And on Python 2:
>>> import re >>> from django.core import validators >>> v = validators.RegexValidator(re.compile('^[\w.@+-]+$'), "Enter a valid username.", 'invalid') >>> v('foo.bar') >>> v('foo bar') Traceback (most recent call last): File "<console>", line 1, in <module> File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__ raise ValidationError(self.message, code=self.code) ValidationError: [u'Enter a valid username.'] >>> v('jean-rené') Traceback (most recent call last): File "<console>", line 1, in <module> File "/home/xelnor/dev/venvs/bluesys-tools/lib/python2.7/site-packages/django/core/validators.py", line 39, in __call__ raise ValidationError(self.message, code=self.code) ValidationError: [u'Enter a valid username.']
comment:3 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
The behavior difference between py2 and py3 is clearly a bug, not sure which fix is correct.
by , 11 years ago
Attachment: | test_issue21379.diff added |
---|
comment:4 by , 11 years ago
I digged a bit and the validation is defined in two places:
(1)- UserCreationForm and UserChangeForm uses '^[\w.@+-]+$'
, which is compiled to re.compile('^[\w.@+-]+$', re.UNICODE)
(https://github.com/django/django/blob/master/django/forms/fields.py#L542)
(2)- AbstractUser ORM field validation uses '^[\w.@+-]+$'
but it does not pass flags to the RegexValidator, so it compiles as re.compile('^[\w.@+-]+$')
.
In #20694 was pointing out that AbstractUser is rejecting non-ascii in python2, which is consistent with (2). @aaugustin pointed out that we cannot change AbstractUser because it is backward incompatible and proposed to use UserCreationForm. However, I'm not sure this can be fixed using a custom UserCreationForm. My concern is that the validation is performed by both the validator constructed from UserCreationForm.username, and by the validator of the AbstractUser.username. Because both are tested and both must validate, this gives different results whether we use python2 or python3 because of (2).
To support this, I attach a diff to be run using both versions:
PYTHONPATH=..:$PYTHONPATH python2.7 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username
PYTHONPATH=..:$PYTHONPATH python3.3 runtests.py --settings=test_sqlite django.contrib.auth.tests.test_forms.UserCreationFormTest.test_invalid_non_ascii_username
This diff has a test for this ticket and also prints which regex was used on validador.RegexValidator.__call__
(for the sake of this discussion)
print(self.regex.pattern, self.regex.flags)
In Python 2, this prints
^[\w.@+-]+$ 32 # 32 = re.UNICODE ^[\w.@+-]+$ 0 # 0 = default flag of RegexValidator
and username=u'jsmithé'
is correctly invalidated
In Python 3, this prints
^[\w.@+-]+$ 32 ^[\w.@+-]+$ 32
and username=u'jsmithé'
is not invalidated, failing the test.
comment:5 by , 11 years ago
Cc: | added |
---|
comment:6 by , 9 years ago
I think we should allow non latin characters, users should be able to have usernames in their native language (hebrew, chinese etc)
I propose we use unicode normalization
unicodedata.normalize(input, 'NFKD')
this could be used for usernames, passwords and other kind of input and should allow non latin.
comment:8 by , 9 years ago
Patch needs improvement: | unset |
---|---|
Version: | 1.5 → master |
Should be ready for review.
comment:10 by , 9 years ago
Severity: | Normal → Release blocker |
---|
Marking as blocker, just as we don't forget to make a decision before releasing 1.10.
comment:11 by , 9 years ago
So beside the original PR which also added Unicode username on Python 2, we have now the alternative PR to keep ASCII-only usernames on Python 2 and Unicode usernames on Python 3. This is more or less the same situation as Django 1.9, but with the ability to change the default behavior, and with improvements regarding Unicode normalization.
comment:12 by , 9 years ago
Keywords: | 1.10 added |
---|---|
Severity: | Release blocker → Normal |
comment:13 by , 9 years ago
Summary: | class AbstractUser: validators should compile re with Unicode → Add support for Unicode usernames |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Could you elaborate on the issue? Non-ascii characters are not allowed by design and if you want to allow unicode in usernames, you need to use a custom user model (see #20694).