Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#28773 closed Bug (fixed)

manage.py makemessages throws syntax error due to incorrectly generated django.pot

Reported by: Hendy Irawan Owned by: nobody
Component: Internationalization Version: 1.11
Severity: Normal Keywords: gettext, makemessages, Windows
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Hendy Irawan)

This only happens after there are several translations (so it's not happening on a clean project with very few translations).
And I'm on Windows. Seems related to #28409.

(venv) C:\Users\ceefour\git\samara\samaraweb>python manage.py makemessages -l id_ID --keep-pot -v3
examining files with the extensions: .py, .txt and .html
ignoring file .gitignore in .
ignoring file README.md in .
ignoring file 0001_initial.cpython-36.pyc in .\edu\migrations\__pycache__
ignoring file 0002_category.cpython-36.pyc in .\edu\migrations\__pycache__
ignoring file 0003_profile.cpython-36.pyc in .\edu\migrations\__pycache__
ignoring file 0004_profile_price_range.cpython-36.pyc in .\edu\migrations\__pycache__
ignoring file __init__.cpython-36.pyc in .\edu\migrations\__pycache__
ignoring file admin.cpython-36.pyc in .\edu\__pycache__
ignoring file apps.cpython-36.pyc in .\edu\__pycache__
ignoring file models.cpython-36.pyc in .\edu\__pycache__
ignoring file urls.cpython-36.pyc in .\edu\__pycache__
ignoring file urls_root.cpython-36.pyc in .\edu\__pycache__
ignoring file views.cpython-36.pyc in .\edu\__pycache__
ignoring file views_root.cpython-36.pyc in .\edu\__pycache__
ignoring file __init__.cpython-36.pyc in .\edu\__pycache__
ignoring file context_processors.cpython-36.pyc in .\samaraweb\__pycache__
ignoring file middleware.cpython-36.pyc in .\samaraweb\__pycache__
ignoring file settings.cpython-36.pyc in .\samaraweb\__pycache__
ignoring file urls.cpython-36.pyc in .\samaraweb\__pycache__
ignoring file wsgi.cpython-36.pyc in .\samaraweb\__pycache__
ignoring file __init__.cpython-36.pyc in .\samaraweb\__pycache__
ignoring file samara_dev_public_edu_brand.sql in .\snapshot
ignoring file samara_dev_public_edu_category.sql in .\snapshot
ignoring file samara_dev_public_edu_city.sql in .\snapshot
ignoring file samara_dev_public_edu_country.sql in .\snapshot
ignoring file samara_dev_public_edu_place.sql in .\snapshot
ignoring file samara_dev_public_edu_populatedplace.sql in .\snapshot
ignoring file samara_dev_public_edu_profile.sql in .\snapshot
ignoring file samara_dev_public_edu_profile_categories.sql in .\snapshot
ignoring file samara_dev_public_edu_state.sql in .\snapshot
processing file __init__.py in .\edu
processing file admin.py in .\edu
processing file apps.py in .\edu
processing file 0001_initial.py in .\edu\migrations
processing file 0002_category.py in .\edu\migrations
processing file 0003_profile.py in .\edu\migrations
processing file 0004_profile_price_range.py in .\edu\migrations
processing file __init__.py in .\edu\migrations
processing file models.py in .\edu
processing file base.html in .\edu\templates\edu
processing file footer.html in .\edu\templates\edu
processing file header.html in .\edu\templates\edu
processing file home.html in .\edu\templates\edu
processing file index.html in .\edu\templates\edu
processing file loc_detail.html in .\edu\templates\edu
processing file org_detail.html in .\edu\templates\edu
processing file privacy.html in .\edu\templates\edu
processing file terms.html in .\edu\templates\edu
processing file tests.py in .\edu
processing file urls.py in .\edu
processing file urls_root.py in .\edu
processing file views.py in .\edu
processing file views_root.py in .\edu
processing file manage.py in .
processing file __init__.py in .\samaraweb
processing file context_processors.py in .\samaraweb
processing file middleware.py in .\samaraweb
processing file settings.dev.py in .\samaraweb
processing file settings.prd.py in .\samaraweb
processing file settings.py in .\samaraweb
processing file urls.py in .\samaraweb
processing file wsgi.py in .\samaraweb
processing locale id_ID
CommandError: errors happened while running msgmerge
C:\Users\ceefour\git\samara\samaraweb\edu\locale\django.pot:61:3: syntax error
C:\Users\ceefour\git\samara\samaraweb\edu\locale\django.pot:61: keyword "edu" unknown
C:\Users\ceefour\git\samara\samaraweb\edu\locale\django.pot:61: keyword "templates" unknown
C:\Users\ceefour\git\samara\samaraweb\edu\locale\django.pot:61: keyword "edu" unknown
C:\Users\ceefour\git\samara\samaraweb\edu\locale\django.pot:61: keyword "loc_detail" unknown
C:\Users\ceefour\git\samara\samaraweb\edu\locale\django.pot:61: keyword "html" unknown
msgmerge: found 6 fatal errors

Sometimes happens with msguniq instead of msgmerge.

Here's a snippet of the generated django.pot, where syntax error occurs (at the second message). Note that all messages up to this point have correct syntax:

#: .\edu\templates\edu\home.html:45 .\edu\templates\edu\index.html:11
#: .\edu\templates\edu\loc_detail.html:11
#, python-format
msgid "Best Nursery & Preschools in %(loc_name)s"
msgstr ""

#: .\edu\templates\edu\home.html:46
 .\edu\templates\edu\loc_detail.html:12
msgid ""
"Your child goes to school soon? It's time to find the favorite nursery/"
"preschool and kindergarten for your child."
msgstr ""

Note that the .po files are fine, and I can run manage.py compilemessages just fine (and I can still create messages in the manual way, but with this feature broken, it's not scalable).
Only makemessages is not working.

gettext:

(venv) C:\Users\ceefour\git\samara\samaraweb>gettext -V
gettext (GNU gettext-runtime) 0.19.8.1
Copyright (C) 1995-1997, 2000-2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Ulrich Drepper.
 gettext: write error

Change History (17)

comment:1 by Hendy Irawan, 7 years ago

Description: modified (diff)

comment:2 by Tim Graham, 7 years ago

Can you provide a sample project that reproduces the problem and point to the code in Django that's at fault?

comment:3 by Hendy Irawan, 7 years ago

Okay, I'm trying to create a sample project, but now I run into this: (https://github.com/pypa/pip/issues/4076, https://github.com/gsnedders/python-webencodings/pull/9)

xgettext: .\venv\Lib\site-packages\pip\_vendor\webencodings\__init__.py:1: Unknown encoding "utf8". Proceeding with ASCII instead.
xgettext: Non-ASCII string at .\venv\Lib\site-packages\pip\_vendor\webencodings\__init__.py:64.
          Please specify the source encoding through --from-code or through a comment
          as specified in http://www.python.org/peps/pep-0263.html.

What's the official way to resolve this in Django?

Version 0, edited 7 years ago by Hendy Irawan (next)

in reply to:  2 comment:4 by Hendy Irawan, 7 years ago

Replying to Tim Graham:

Can you provide a sample project that reproduces the problem and point to the code in Django that's at fault?

Hi Tim, I've successfully reproduced this bug 100% in my system: https://github.com/ceefour/messagesbug

The file with syntax error is messagesbug/edu/locale/id_ID/LC_MESSAGES/django.po .
You can look at the file and you can see the syntax error at line 69 .

Using my system (Windows 10, gettext 0.19.8.1) this bug is 100% reproducible using this project:

  1. pip install -r requirements.txt
  2. Delete the django.po file
  3. Regenerate the django.po file:

python manage.py makemessages -l id_ID -v3

Generated file will contain syntax error again.

comment:5 by Claude Paroz, 7 years ago

Many thanks for the sample project. I tested it on my Debian box with the same 0.19.8.1 version of gettext. I was not able to reproduce the bug. I'm getting that instead of what you get:

#: edu/templates/edu/home.html:69 edu/templates/edu/music_country.html:34
#, python-format

If this is a Windows-related bug, I'm sorry but I can't help. It remains to be seen if it is a "Django on Windows" error or a "gettext on Windows" error.

in reply to:  5 comment:6 by Hendy Irawan, 7 years ago

Replying to Claude Paroz:

Many thanks for the sample project. I tested it on my Debian box with the same 0.19.8.1 version of gettext. I was not able to reproduce the bug. I'm getting that instead of what you get:

#: edu/templates/edu/home.html:69 edu/templates/edu/music_country.html:34
#, python-format

If this is a Windows-related bug, I'm sorry but I can't help. It remains to be seen if it is a "Django on Windows" error or a "gettext on Windows" error.

Thanks for the (negative) confirmation.

Note that adding "--no-location" works as advertised, and is probably "least evil" workaround, compared to manually adding messages.

Since adding the location in controlled by Django, it's probably a "django on windows" bug? I'd like to diagnose more, but I could use some help in what exactly happens... or specifically, who creates the django.pot file? (I assume the django.pot file is written before language-specific django.po files get (re)created, so I'll start there)

BTW (and self-note), "-l id_ID" is not even needed. Just running "makemessages --keep-pot" is sufficient to trigger the bug (even if that command doesn't generate any file other than django.pot).

Last edited 7 years ago by Hendy Irawan (previous) (diff)

in reply to:  5 comment:7 by Hendy Irawan, 7 years ago

Replying to Claude Paroz:

If this is a Windows-related bug, I'm sorry but I can't help. It remains to be seen if it is a "Django on Windows" error or a "gettext on Windows" error.

I did a bit of diagnose. Edited venv\Lib\site-packages\django\core\management\commands\makemessages.py at write_pot_file() to print the file as it is written.
And at least in this stage, Django is writing the correct lines:

#: .\edu\templates\edu\home.html:69
#: .\edu\templates\edu\music_country.html:34
#, python-format
msgid ""
"With the <strong>huge benefits to learning to play a musical instrument</"

So after the file is written by write_pot_file, something else is processing and overwriting that file with broken syntax.

Self notes:

  • write_pot_file is called by process_locale_dir() line 591
  • after write_pot_file, then (for build_file in build_files).cleanup, it noop with this bug
  • process_locale_dir is called by process_files (line 499)
  • process_files is called by build_potfiles (line 402)
  • build_potfiles calls Running ['msguniq', '--to-code=utf-8', 'D:
    work
    messagesbug
    messagesbug
    edu
    locale
    django.pot']

When I try to run it manually: (extension *must* be .pot, otherwise you'll get error)

(venv) D:\work\messagesbug\messagesbug>msguniq --to-code=utf-8 edu\locale\orig.django.pot
warning: Could not open file /usr/share/gettext/styles/po-default.css
...
#: .\edu\templates\edu\home.html:66 .\edu\templates\edu\home.html:68
#: .\edu\templates\edu\music_country.html:15
#: .\edu\templates\edu\music_country.html:40
#: .\edu\templates\edu\music_country.html:44
#, python-format
msgid "Best Music Schools & Teachers in %(loc_name)s"
msgstr ""

 .\edu\templates\edu\music_country.html:34
#, python-format
msgid ""
"With the <strong>huge benefits to learning to play a musical instrument</"

So indeed it is a potential bug with {{msguniq}} on Windows. Good progress. So next step is to make a good trigger :)

Another update: makemessages.py is creating django.pot with CRLF (Windows) line-ending, which is *sometimes* processed incorrectly by msguniq.
If the file is converted to LF line-ending, then msguniq will process it correctly.

I'm tempted to _declare_ .pot (and hence .po) files must be LF-ended, but that seems beyond our (Django) jurisdiction. Since .py files can be either LF and CRLF, why can't .pot/.po files? So I'm going to report this to gettext especially msguniq.

Last edited 7 years ago by Hendy Irawan (previous) (diff)

in reply to:  5 ; comment:8 by Hendy Irawan, 7 years ago

Replying to Claude Paroz:

Can you try it on your Debian box:

eolbug.pot file : (save it as CRLF EOL)

#: .\edu\templates\edu\home.html:69
#: .\edu\templates\edu\music_country.html:34
#, python-format
msgid "Hi"
msgstr ""

then run:

msguniq --to-code=utf-8 eolbug.pot

do you also get the same bug ?

Even if this is eventually fixed by msguniq, it will take some time.

In the meantime, can I request Django provide a workaround by always writing the pot file using LF EOL?

in reply to:  5 comment:9 by Hendy Irawan, 7 years ago

in reply to:  8 ; comment:10 by Claude Paroz, 7 years ago

Replying to Hendy Irawan:

Can you try it on your Debian box:

(...)

do you also get the same bug ?

Yes, I can reproduce the msguniq bug. Great debugging!

Even if this is eventually fixed by msguniq, it will take some time.

In the meantime, can I request Django provide a workaround by always writing the pot file using LF EOL?

We could, but could that be a problem for some Windows app to display the file content properly?

in reply to:  10 comment:11 by Hendy Irawan, 7 years ago

Replying to Claude Paroz:

Even if this is eventually fixed by msguniq, it will take some time.

In the meantime, can I request Django provide a workaround by always writing the pot file using LF EOL?

We could, but could that be a problem for some Windows app to display the file content properly?

AFAIK, the *only* program that can't manipulate LF files properly is Notepad. I don't know any programmer who uses Notepad to edit code (even for txt files, a lot of programmers I know use Notepad++ by default).
Any reasonable text editor that a Python programmer uses can edit LF files without issue (and they usually have other gimmicks like syntax highlighting, etc.), namely Notepad++, VSCode, PyCharm, Eclipse, etc.

Considering the two alternatives: (1) broken makemessages regardless of what text editor you use, (2) working makemessages. I definitely vote for LF-only.
When gettext finally fixes this (probably will take a while) we can revert this workaround.

Another thing is that what matters is the *input* to gettext tools need to be LF. I think the .po files (output which is editable by user) themselves can still be CRLF (and I believe that's the default for gettext for Windows). What matters is that django needs to ensure that whatever gets processed by gettext needs to be forced first to LF.

comment:12 by Claude Paroz, 7 years ago

Triage Stage: UnreviewedAccepted

Thanks for your input, the workaround should be rather straightforward, do you plan to provide a patch?

comment:13 by Hendy Irawan, 7 years ago

Thanks for being Accepted. Sorry I think I cannot send a patch.

comment:14 by Claude Paroz, 7 years ago

Has patch: set

Made a pull request.
Not sure if adding a test for that makes much sense.

comment:15 by Tim Graham, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:16 by Claude Paroz <claude@…>, 7 years ago

Resolution: fixed
Status: newclosed

In 4f5526e3:

Fixed #28773 -- Forced pot files to use UNIX-style newlines

Thanks Hendy Irawan for the analysis and report.

comment:17 by Claude Paroz <claude@…>, 7 years ago

In aba31aa8:

[2.0.x] Fixed #28773 -- Forced pot files to use UNIX-style newlines

Thanks Hendy Irawan for the analysis and report.
Backport of 4f5526e346861c0b2ffa2ea7229747c883e14432 from master.

Note: See TracTickets for help on using tickets.
Back to Top