Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#23271 closed Bug (fixed)

Makemessages can corrupt existing .po files on Windows

Reported by: Daniel Menzel Owned by: nobody
Component: Internationalization Version: dev
Severity: Release blocker Keywords: makemessages utf8 unicode
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Description

Seen on German Windows7 SP1 with 64bit Python 3.4.1 and gettext 0.18.1.

When you have an existing .po file with translations, e.g.

msgid ""
msgstr ""
"Project-Id-Version: \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2014-03-03 10:44+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: \n"
"Language-Team: \n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

msgid "Size"
msgstr "Größe"

and then run

manage.py makemessages --no_location --no_wrap -l de

to update the .po file, you get a corrupted .po file:

msgid ""
msgstr ""
"Project-Id-Version: \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2014-03-03 10:44+0100\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: \n"
"Language-Team: \n"
"Language: de\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n != 1);\n"

msgid "Size"
msgstr "Größe"

Investigation

Setting environment variables like LANG, LANGUAGE, LC_ALL, LC_MESSAGES, etc. has no effect on the outcome. Also calling chcp 65001 on the Windows console does not fix the problem.

However, it seems like the described behavior was introduced as a side-effect with this commit https://github.com/django/django/commit/dbb48d2bb99a5f660cf2d85f137b8d87fc12d99f:
All file accesses in makemessages.py were changed to explicitly use utf-8, but the stdout of the gettext binaries (like msgmerge etc.) are still interpreted with the Windows encoding cp1252, as popen_wrapper sets universal_newline=True which in turn uses the encoding returned by locale.getpreferredencoding().

As a test I patched popen_wrapper to interpret output of external processes with utf-8 encoding instead of locale.getpreferredencoding():

def popen_wrapper_utf8(args, os_err_exc_type=django.core.management.base.CommandError):
    """Monkey-patch for django.core.management.utils.popen_wrapper"""
    try:
        p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE,
                  close_fds=os.name != 'nt', universal_newlines=True)
    except OSError as e:
        six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s: %s' %
                                                     (args[0], e.strerror)), sys.exc_info()[2])
    output, errors = p.communicate()

    # Additional utf-8 decoding
    output = output.encode(locale.getpreferredencoding(False)).decode('utf-8')
    #

    return (
        output,
        force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True),
        p.returncode
    )

This has the desired effect and prevents the corruption of the .po files.

I also tried changing the value returned by locale.getpreferredencoding() to "utf-8", but that seems impossible on Windows, as Python uses the win32 API GetACP(), which according to MSDN http://msdn.microsoft.com/en-us/library/windows/desktop/dd318070(v=vs.85).aspx only returns ANSI codepages and thus will never return "utf-8".

Change History (13)

comment:1 by Claude Paroz, 10 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

I'm guilty for the regression, but I'm afraid I will not be able to debug this issue as I don't have access to Windows machines.

If noone else can fix it, I can revert the patch in the 1.7 branch to remove the release blocking for 1.7.

comment:2 by Andrew Godwin, 10 years ago

I am also without a Windows dev environment. If we have to remove the patch from 1.7, and we think we can do it cleanly, then it's an option.

Last edited 10 years ago by Andrew Godwin (previous) (diff)

comment:3 by Claude Paroz, 10 years ago

Version: 1.7-rc-2master

[cdfefbec721b59695e28] has been reverted on the 1.7.x branch. I've applied [67870137b9e1f1] instead to fix #22686, but I'd like to keep the more extensive fix on master. The release blocking flag is now only applicable to master.

comment:4 by Tim Graham, 10 years ago

Ramiro said he would have some time to look at this before 1.8 alpha.

comment:5 by Tim Graham, 10 years ago

Has patch: set
Triage Stage: AcceptedReady for checkin

comment:6 by Ramiro Morales <ramiro@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 6fb9dee470d57882e378247fd2706d5f9867b5f9:

Fixed #23271 -- Don't corrupt PO files on Windows when updating them.

Make sure PO catalog text fetched from gettext programs via standard
output isn't corrupted by mismatch between assumed (UTF-8) and real
(CP1252) encodings. This can cause mojibake to be written when creating
or updating PO files.

Also fixes #23311.

Thanks to contributor with Trac nick 'danielmenzel' for the report,
excellent research and fix.

comment:7 by Tim Graham, 10 years ago

Has patch: unset
Resolution: fixed
Status: closednew
Triage Stage: Ready for checkinAccepted

Python 3 looks good, but now I have test failures on Python 2 and Windows. Can you reproduce?

comment:8 by Ramiro Morales <cramm0@…>, 10 years ago

In 002a8ffe478b7520a64c9176f9f640633f643b9c:

Fixed breakage by 6fb9dee4 under Python2+Windows.

Refs #23271

comment:9 by Ramiro Morales, 10 years ago

Resolution: fixed
Status: newclosed

comment:10 by daphshez, 10 years ago

Resolution: fixed
Status: closednew

Regression fails on my machine (Windows 7 64, Django development version head, with

locale.getpreferredencoding()=='cp1255'
======================================================================
ERROR: test_po_file_encoding_when_updating (i18n.test_extraction.BasicExtractorTests)
Update of PO file doesn't corrupt it with non-UTF-8 encoding on Python3+Windows (#23271)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\data\dev\django\django\tests\i18n\test_extraction.py", line 407, in test_po_file_encoding_when_updating
    management.call_command('makemessages', locale=['pt_BR'], verbosity=0)
  File "C:\data\dev\django\django\django\core\management\__init__.py", line 118, in call_command
    return command.execute(*args, **defaults)
  File "C:\data\dev\django\django\django\core\management\base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "C:\data\dev\django\django\django\core\management\commands\makemessages.py", line 322, in handle
    self.write_po_file(potfile, locale)
  File "C:\data\dev\django\django\django\core\management\commands\makemessages.py", line 455, in write_po_file
    msgs, errors, status = gettext_popen_wrapper(args)
  File "C:\data\dev\django\django\django\core\management\commands\makemessages.py", line 39, in
gettext_popen_wrapper
    stdout, stderr, status_code = popen_wrapper(args, os_err_exc_type=os_err_exc_type)
  File "C:\data\dev\django\django\django\core\management\utils.py", line 27, in popen_wrapper
    output, errors = p.communicate()
  File "C:\Python34\lib\subprocess.py", line 959, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "C:\Python34\lib\subprocess.py", line 1234, in _communicate
    stdout = stdout[0]
IndexError: list index out of range

Also, this seems also to be the source of #21928 which isn't resolved for other people.

The problem is that Popen still tries to encode msgmerge output stream using DEFAULT_LOCALE_ENCODING, when the data is actually in UTF-8.

A way around it is to call Popen with universal_newlines=true, and then take care of line ending inside gettext_popen_wrapper. I'll post a patch soon.

(BTW the current solution is for Windows only. Linux uses have to set undocumented environment variables. I think we should strive to solve this for Linux users as well).

comment:11 by daphshez, 10 years ago

Well, I patched the code and sent a pull request at

https://github.com/django/django/pull/4532

This is my first attempt to contribute code and I seem to have messed up with the process. I am sorry about it. I hope my fix will get considered anyway...

comment:12 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 57202a11:

Fixed #23271 -- Fixed makemessages crash/test failure for some locales.

comment:13 by Tim Graham <timograham@…>, 10 years ago

In c45fd57f:

[1.8.x] Fixed #23271 -- Fixed makemessages crash/test failure for some locales.

Backport of 57202a112a966593857725071ecd652a87c157fb from master

Note: See TracTickets for help on using tickets.
Back to Top