#23271 closed Bug (fixed)
Makemessages can corrupt existing .po files on Windows
Reported by: | Daniel Menzel | Owned by: | nobody |
---|---|---|---|
Component: | Internationalization | Version: | dev |
Severity: | Release blocker | Keywords: | makemessages utf8 unicode |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Description
Seen on German Windows7 SP1 with 64bit Python 3.4.1 and gettext 0.18.1.
When you have an existing .po file with translations, e.g.
msgid "" msgstr "" "Project-Id-Version: \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2014-03-03 10:44+0100\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: \n" "Language-Team: \n" "Language: de\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" msgid "Size" msgstr "Größe"
and then run
manage.py makemessages --no_location --no_wrap -l de
to update the .po file, you get a corrupted .po file:
msgid "" msgstr "" "Project-Id-Version: \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2014-03-03 10:44+0100\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: \n" "Language-Team: \n" "Language: de\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" msgid "Size" msgstr "Größe"
Investigation
Setting environment variables like LANG, LANGUAGE, LC_ALL, LC_MESSAGES, etc. has no effect on the outcome. Also calling chcp 65001 on the Windows console does not fix the problem.
However, it seems like the described behavior was introduced as a side-effect with this commit https://github.com/django/django/commit/dbb48d2bb99a5f660cf2d85f137b8d87fc12d99f:
All file accesses in makemessages.py were changed to explicitly use utf-8, but the stdout of the gettext binaries (like msgmerge etc.) are still interpreted with the Windows encoding cp1252, as popen_wrapper sets universal_newline=True which in turn uses the encoding returned by locale.getpreferredencoding().
As a test I patched popen_wrapper to interpret output of external processes with utf-8 encoding instead of locale.getpreferredencoding():
def popen_wrapper_utf8(args, os_err_exc_type=django.core.management.base.CommandError): """Monkey-patch for django.core.management.utils.popen_wrapper""" try: p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE, close_fds=os.name != 'nt', universal_newlines=True) except OSError as e: six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s: %s' % (args[0], e.strerror)), sys.exc_info()[2]) output, errors = p.communicate() # Additional utf-8 decoding output = output.encode(locale.getpreferredencoding(False)).decode('utf-8') # return ( output, force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True), p.returncode )
This has the desired effect and prevents the corruption of the .po files.
I also tried changing the value returned by locale.getpreferredencoding() to "utf-8", but that seems impossible on Windows, as Python uses the win32 API GetACP(), which according to MSDN http://msdn.microsoft.com/en-us/library/windows/desktop/dd318070(v=vs.85).aspx only returns ANSI codepages and thus will never return "utf-8".
Change History (13)
comment:1 by , 10 years ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 10 years ago
I am also without a Windows dev environment. If we have to remove the patch from 1.7, and we think we can do it cleanly, then it's an option.
comment:3 by , 10 years ago
Version: | 1.7-rc-2 → master |
---|
[cdfefbec721b59695e28] has been reverted on the 1.7.x branch. I've applied [67870137b9e1f1] instead to fix #22686, but I'd like to keep the more extensive fix on master. The release blocking flag is now only applicable to master.
comment:5 by , 10 years ago
Has patch: | set |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:6 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:7 by , 10 years ago
Has patch: | unset |
---|---|
Resolution: | fixed |
Status: | closed → new |
Triage Stage: | Ready for checkin → Accepted |
Python 3 looks good, but now I have test failures on Python 2 and Windows. Can you reproduce?
comment:9 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:10 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
Regression fails on my machine (Windows 7 64, Django development version head, with
locale.getpreferredencoding()=='cp1255'
====================================================================== ERROR: test_po_file_encoding_when_updating (i18n.test_extraction.BasicExtractorTests) Update of PO file doesn't corrupt it with non-UTF-8 encoding on Python3+Windows (#23271) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\data\dev\django\django\tests\i18n\test_extraction.py", line 407, in test_po_file_encoding_when_updating management.call_command('makemessages', locale=['pt_BR'], verbosity=0) File "C:\data\dev\django\django\django\core\management\__init__.py", line 118, in call_command return command.execute(*args, **defaults) File "C:\data\dev\django\django\django\core\management\base.py", line 398, in execute output = self.handle(*args, **options) File "C:\data\dev\django\django\django\core\management\commands\makemessages.py", line 322, in handle self.write_po_file(potfile, locale) File "C:\data\dev\django\django\django\core\management\commands\makemessages.py", line 455, in write_po_file msgs, errors, status = gettext_popen_wrapper(args) File "C:\data\dev\django\django\django\core\management\commands\makemessages.py", line 39, in gettext_popen_wrapper stdout, stderr, status_code = popen_wrapper(args, os_err_exc_type=os_err_exc_type) File "C:\data\dev\django\django\django\core\management\utils.py", line 27, in popen_wrapper output, errors = p.communicate() File "C:\Python34\lib\subprocess.py", line 959, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "C:\Python34\lib\subprocess.py", line 1234, in _communicate stdout = stdout[0] IndexError: list index out of range
Also, this seems also to be the source of #21928 which isn't resolved for other people.
The problem is that Popen still tries to encode msgmerge output stream using DEFAULT_LOCALE_ENCODING, when the data is actually in UTF-8.
A way around it is to call Popen with universal_newlines=true, and then take care of line ending inside gettext_popen_wrapper. I'll post a patch soon.
(BTW the current solution is for Windows only. Linux uses have to set undocumented environment variables. I think we should strive to solve this for Linux users as well).
comment:11 by , 10 years ago
Well, I patched the code and sent a pull request at
https://github.com/django/django/pull/4532
This is my first attempt to contribute code and I seem to have messed up with the process. I am sorry about it. I hope my fix will get considered anyway...
I'm guilty for the regression, but I'm afraid I will not be able to debug this issue as I don't have access to Windows machines.
If noone else can fix it, I can revert the patch in the 1.7 branch to remove the release blocking for 1.7.