Opened 4 years ago

Last modified 4 years ago

#32439 closed Bug

Dumpdata fails on Windows due to non-utf8 system locale — at Initial Version

Reported by: helmstedt Owned by: nobody
Component: Uncategorized Version: 3.1
Severity: Normal Keywords: windows, utf8, encoding, dumpdata
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The command: "python manage.py dumpdata -o output.json" fails on Windows with a database with characters outside of the system locale. An example of the error is:

"CommandError: Unable to serialize database: 'charmap' codec can't encode character '\u0107' in position 8: character maps to <undefined>" (The character is "ć" in this case.)

The reason for the error is, I think, described in https://stackoverflow.com/questions/64457733/django-dumpdata-fails-on-special-characters/65186947#65186947 with a "hacky" solution. I quote:

"To save json data in django the TextIOWrapper is used:

The default encoding is now locale.getpreferredencoding(False) (...)

In documentation of locale.getpreferredencoding fuction we can read:

Return the encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess.

Here I found "hacky" but working method to overwrite these settings:

In file settings.py of your django project add these lines:

import _locale
_locale._getdefaultlocale = (lambda *args: ['en_US', 'utf8'])"

In Python I can my "inspect _locale._getdefaultlocale" variable in my (Danish) Windows installation:

import _locale
_locale._getdefaultlocale()

('da_DK', 'cp1252')

Because the default encoding on my system is cp1252 instead of utf-8, dumpdata tries to create a json file encoded in cp1252 instead of utf-8 and fails when it encounters a character not supported by this encoding.

I can confirm that the "hacky" solution to override those values will make the data dump work.

Since there doesn't seem to be a settting in Windows to actually specify the default locale encoding to utf8, Django should provide a way to force utf8-encoding (and override system encoding) when using the dumpdata command.

Traceback is attached.

Change History (1)

by helmstedt, 4 years ago

Attachment: traceback.txt added

Traceback of the error

Note: See TracTickets for help on using tickets.
Back to Top