#15778 closed Bug (fixed)
Command createsuperuser fails under some system user names
Reported by: | Owned by: | Alex Gaynor | |
---|---|---|---|
Component: | contrib.auth | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Commands 'createsuperuser' and 'syncdb' can not create a superuser account no way because of database error,
if a system account username contains an 8-bit character.
It fails when the name is automatically searched in the database, even if the user wants to write an ascii username manually.
This is typical for usernames created by Microsoft Windows.
File "C:\Python26\lib\site-packages\django\contrib\auth\management\commands\createsuperuser.py", line 72, in handle User.objects.get(username=default_username) ... File "C:\Python26\lib\site-packages\django\db\backends\sqlite3\base.py", line 234, in execute return Database.Cursor.execute(self, query, params) DatabaseError : You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
Versions: Django 1.3, Python 2.6.4 windows, Sqlite3 3.5.9, dbapi 2.4.1
It is easier to fix it once then to circumvent it twice.
The middle part of the patch:
- default_username = getpass.getuser().replace(' ', '').lower() + default_username = str(getpass.getuser().decode('ascii', 'ignore')).replace(' ', '').lower()
Attachments (2)
Change History (28)
by , 14 years ago
Attachment: | django-createsuperuser-8bit-username-bug.diff added |
---|
comment:1 by , 14 years ago
Needs tests: | set |
---|
comment:2 by , 14 years ago
To reproduce
"Adrián" and "Júlia" are nice frequent Spain names. Microsoft recommended to enter a localized full user name and full company name in Windows installation guides in the past. Localized user names are still very frequent in our country and I got from my company a preinstalled computer with a localized user name. The user name returned by "getpass.getuser()" by Python on Windows is usually encoded in any of possible one byte character sets, typically 'cp1252' for West Europe.
The most convincing and the only complete test would be to create such account on Windows, but probably you mean by the requirement "test" any possible simplified verification. So, I wrote a short temporary patch to Django:
--- a/django/contrib/auth/management/commands/createsuperuser.py 2011-02-22 12:33:04.000000000 +0100 +++ b/django/contrib/auth/management/commands/createsuperuser.py 2011-04-08 18:51:04.682233473 +0200 @@ -59,6 +59,7 @@ # Try to determine the current system user's username to use as a default. try: default_username = getpass.getuser().replace(' ', '').lower() + default_username = 'J\xfalia'.lower() # the name 'Julia' with accented 'u' - fails except (ImportError, KeyError): # KeyError will be raised by os.getpwuid() (called by getuser()) # if there is no corresponding entry in the /etc/passwd file
startproject, startapp, ENGINE is 'django.db.backends.sqlite3'
python manage.py createsuperuser
Python 2.6
This synthetic test anytime fails. (even on Linux, although all possible usernames are in ascii on Linux)
DatabaseError:
"You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings."
Python 2.5
never fails. (probable because its '_sqlite3.so' does not contains that typical error message string.) It normally displays:
Username (Leave blank to use u'j\xfalia'):
Function getpass.getuser() also depends on the precompiled package.
Python from http://www.python.org/ftp/python/2.6.4/python-2.6.4.msi is the described above, it gets international characters (fails).
Python from Cygwin strips international characters from the username in Windows.
comment:3 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
The arcane error message from sqlilte means a non-utf8 bytestring was passed as default_username on the call to User.objects.get(username=default_username)
. 'J\xfalia' is the latin1 (same as Windows cp1252 for this case) encoding for "Júlia".
If instead you passed in 'J\xc3\xbalia', the utf-8 encoding, the User.objects.get
call would work. Due to other code in this area it would not be accepted as a unsername, but you'd get past that exception.
This error from sqlite is new with 2.6, see #7921 for details and the explanation of why you can pass utf-8 encoded bytestrings. Django adapted to the change with 2.6 by installing an adapter to convert all bytestrings passed down to the database to unicode, assuming they have a utf-8 encoding. If they don't and the attempt to decode from utf-8 fails, you will see the error message you are seeing.
If there was some way to know the encoding of the bytestring returned by getpass.getuser()
then the best thing would be to use that known encoding to transform the bytestring into a unicode object.
comment:4 by , 14 years ago
Yes, but that was a somewhat different situation.
I would prefer the original proposed solution to strip international characters from the default_username, because an evantually conversion to unicode would only postpone the problem for later. (Convert something correctly and display that correctly, what would be finally must rejected?)
I thing that a simple solution can be incorporaded in any bugfix release, it need not to wait for Django 1.4
(I know a better solution with striping the accents, not dropping characters, but it is not important now at all, imho. The encoding is not 'utf-8'. It is sys.getfilesystemencoding(), usually 'mbcs', which is a generic name for differnent windows default encodings of the actual installation.)
comment:5 by , 14 years ago
Patch needs improvement: | set |
---|
If we are going to fix it, we should make some effort to fix it reasonably. If I were named Júlia I'd find it a bit irritating for the system to suggest a "good" username for me was jlia. It's not that hard to "translate" accented unicode characters to their non-accented ASCII equivalents, using unicodedata.normalize:
>>> import unicodedata >>> yipes = u'¡¢£¥§©ª«¬®¯°±²³µ¶·¹º»¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ' >>> unicodedata.normalize('NFKD', yipes).encode('ascii', 'ignore') 'a 231oAAAAAACEEEEIIIINOOOOOUUUUYaaaaaaceeeeiiiinooooouuuuyy' >>>
Not perfect, but far better than stripping accented characters entirely.
comment:6 by , 14 years ago
This is much better. :-) I preferred a simple solution and strong arguments such as example names similar to a member of the community and the person who requested a test :-)
I thought so: Without a patch or with any failing patch the person probably lost several hours, because the simple bypass "./manage.py syncdb --noinput; ./manage.py createsuperuser --username=USERNAME" can be easily missed. With a simple patch he lost only several minutes, because unfortunately only a journalist is able to notice every missing letter. (Maybe a user can not login first.)
I am not sure enough that "getpass.getuser()" returns results in the encoding "sys.getfilesystemencoding()" for every language and version of Windows and Python. Otherwise the conversion fails. I only hope that Microsoft was never so proud to implement the right to left picture alphabets to the home directory paths. We should use some "decode(..., 'ignore')".
I suggest:
default_username = getpass.getuser().decode(sys.getfilesystemencoding(), 'ignore') default_username = unicodedata.normalize('NFKD', default_username) \ .encode('ascii', 'ignore').replace(' ', '').lower()
Hopefully the "unicodedata.normalize('NFKD'...)" does not raise an exception, otherwise it should be added to captured exceptions, not only (ImportError, KeyError).
comment:7 by , 14 years ago
Alternative: If some character can not be decoded (maybe bad detected encoding perhaps for a non-latin alphabet), better is to give no suggestions:
try: - default_username = getpass.getuser().replace(' ', '').lower() - except (ImportError, KeyError): + default_username = getpass.getuser().decode(sys.getfilesystemencoding()) + default_username = unicodedata.normalize('NFKD', default_username) \ + .encode('ascii', 'ignore').replace(' ', '').lower() + except (ImportError, KeyError, UnicodeDecodeError): + # UnicodeDecodeError - We are not sure what will do a non-latin Windows version.
by , 14 years ago
Attachment: | createsuperuser.diff added |
---|
comment:9 by , 14 years ago
#15162 has similar guess of the encoding: locale.getdefaultlocale()[1]
- It returns more explicit names e.g 'cp1250', 'cp1252' than sys.getfilesystemencoding() which returns 'mbcs'.
- Decoded characters are the same for both methods. Encoding 'cp1250' is a subset defined for 251 characters, 'mbcs' is defined for all 256 characters.
Conclusion: I think that better is:
locale.getdefaultlocale()[1]
Characters from non-default encoding are replaced by question marks before any conversion by 'getpass.getuser()'. E. g. 'ů' on West Europe Windows or vice-versa 'ù' on East Europe Windows. Better is no suggestion than a bad one. Therefore I add:
+ if not RE_VALID_USERNAME.match(default_username): + default_username = ''
My final patch createsuperuser.diff is ready.
comment:10 by , 14 years ago
Easy pickings: | unset |
---|---|
Owner: | changed from | to
Patch needs improvement: | unset |
Mister Gaynor, please write the unit test for this patch. Thank you.
comment:11 by , 14 years ago
The test can be either formal only and very complicated, because it is not easy to modify output of getpass.getuser() and sys.getfilesystemencoding() or even simulate type of operating system.
This is similar requirement like julien asked tests replicating this bug. I would advice good manual testing and checkout.
Do you want really rely on tests like ?
self.assertEqual('ABC'.lower(), 'abc')
comment:12 by , 14 years ago
I have tested manually polite refusal with expecting suggesting no username for little obscure cases on Greek three weeks ago. (Greek would not be possible to be tested without buing Greek Windows and installing it totally blind. Greek keyboard was not enough because Greek characters are replaced by ascii '?' by getuser() on non Greek Windows, which is not the real case.) Do test only 'Júlia' ('J\xfalia') or 'Adrián'.
Believe that testing is much more complicated than good fixing. I read piece of Windows documentation due to testing, but Django development is not reading how business plan of Microsoft has blemished Windows between the lines.
comment:13 by , 14 years ago
What about using the django.utils.encoding.smart_str() function to deal with the encoding of a non-ASCII username?
comment:14 by , 14 years ago
Patch needs improvement: | set |
---|
The patch is bad.
What .encode('ascii', 'ignore') does is that it ignores the UnicodeError. So what is the point of catching the UnicodeDecodeError (a subclass of UnicodeError) that would never get raised? Then the code block default_username = will never get executed. I suggest fixing the 'ignore' positional argument to 'strict'.
comment:15 by , 14 years ago
The patch is bad.
What .encode('ascii', 'ignore')
does is that it ignores the UnicodeError
. So what is the point of catching the UnicodeDecodeError
(a subclass of UnicodeError
) that would never get raised? Then the code block default_username = ''
will never get executed. I suggest removing the 'ignore'
positional argument; if it is removed, that defaults to 'strict'
, which catches the UnicodeError
-- which is what we want.
comment:17 by , 14 years ago
Patch needs improvement: | unset |
---|
The patch is very good. Please apply it.
comment:18 by , 14 years ago
Needs tests: | unset |
---|
I have created a new user account on my Windows 7 operating system and have added the omega Greek character into the user account name. My account was named userΩtest. And when I reached the point of having to type "yes" at the Django createsuperuser prompt, the prompt was like this: Username (leave blank to use 'userotest')
. So the patch worked. The Greek letter "Ω" (omega) was successfully converted to an "o" character. This patch is bulletproof. You could mock getpass, but what is the point? Please incorporate this patch already.
comment:19 by , 14 years ago
Yes, the createsuperuser.diff patch is the final patch and is tested with very well results and should to be commited to the trunk ASAP.
comment:20 by , 14 years ago
milestone: | 1.3 |
---|---|
Needs tests: | set |
Freddie, we're not putting it in without unit tests.
comment:21 by , 14 years ago
(that comment was me, and the previous ones were Freddie__
from irc, who has some entitlement issues to work through)
comment:22 by , 14 years ago
I am finding the repeated demands to get this in NOW off-putting but I'm also a bit puzzled by the insistence on tests -- testing behavior that requires a current system username value that contains a non-ascii char seems a bit outside of the scope of where we would normally require tests? For this particular case I would have trusted the reports of people who can recreate the issue, and manual review of the code (which I don't have time for at the moment).
comment:24 by , 14 years ago
Replying to SmileyChris:
In [16182]:
What's up with the :params:
and :returns:
stuff in the docstrings?
comment:25 by , 14 years ago
Thanks for fixing. I acknowledge that it is satisfactory closed.
It is not important to catch UnicodeDecodeError
in get_default_username
in [16182]. It can be safely removed, which improves readibility:
def get_default_username(check_db=True): ... default_username = get_system_username() - try: - default_username = unicodedata.normalize('NFKD', default_username)\ - .encode('ascii', 'ignore').replace(' ', '').lower() - except UnicodeDecodeError: - return '' + default_username = unicodedata.normalize('NFKD', default_username)\ + .encode('ascii', 'ignore').replace(' ', '').lower() if not RE_VALID_USERNAME.match(default_username): return ''
Catching of UnicodeDecodeError
was only important in the original patch due to decode(locale.getdefaultlocale...)
, not due to other commands, which can be demonstrated:
# Verificaion that `unicodedata.normalize` does not raises anythyng # for the whole range of valid unicode characters. It is also clear from the documentation. import unicodedata for i in range(0x110000): dummy = unicodedata.normalize('NFKD', unichr(i)).encode('ascii', 'ignore')
- Answer to objection comment:14: The prove or disprove of possibility
UnicodeDecodeError
aftergetpass.getuser().decode
is not easy, because important things are outside of free software. It is safer to catch it. - Objection comment:13 Why? Because the proposed solution would not work at all.
- Comments comment:11 and comment:12 was me - patch author and reporter.
comment:26 by , 14 years ago
Thanks for the follow-up hynekcer. I'll just leave it in there since it doesn't hurt anyone (and it'd be better to still catch a decoding issue on some strange setup than choke).
Could you provide some tests replicating this bug?