Opened 18 years ago
Closed 18 years ago
#3063 closed defect (fixed)
[patch] i18n broken for msgids with extended characters
Reported by: | Antti Kaihola | Owned by: | hugo |
---|---|---|---|
Component: | Internationalization | Version: | |
Severity: | normal | Keywords: | i18n |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Message ids with non-us-ascii characters in a UTF-8 encoded .po file are not recognized. Django returns the original string even though a translation is provided.
The reason is that Python's GNUTranslations.gettext()
expects the message id as a unicode string, but gettext()
in trans_real.py
calls it with UTF-8 encoded message ids.
Even if this is fixed by decoding the message id before calling t.gettext()
, there is another problem with missing messages: Python's GNUTranslations.gettext()
returns UTF-8 encoded strings, but when a message is not found, it just returns the original unicode message id, and the template engine chokes when trying to join mixed unicode and normal strings.
The Python gettext
documentation recommends to use ugettext()
instead of gettext()
so everything is handled in unicode.
I know it's exceptional to use non-us-ascii message ids, but my strong opinion is that since we're beginning to live in a unicode world, it shouldn't be a problem anymore. And I'm sure many people have a situation where an existing single-language non-English site is i18n'ed and it's simply most practical just to wrap existing non-English messages in {%trans%
} tags and _()
calls. A multi-lingual site might even use only languages with non-us-ascii character sets.
I'll post a patch which fixes both problems described above.
Attachments (4)
Change History (9)
by , 18 years ago
Attachment: | ugettext.diff added |
---|
by , 18 years ago
Attachment: | tests_i18n_non-us-ascii_msgids.diff added |
---|
Additions to the template test suite for non-us-ascii UTF-8 message ids
by , 18 years ago
Attachment: | tests_i18n_non-us-ascii_msgids.2.diff added |
---|
Oops, diff root path was wrong. Corrected here.
comment:1 by , 18 years ago
Component: | Internationalization → Translations |
---|---|
Summary: | i18n broken for msgids with extended characters → [patch] i18n broken for msgids with extended characters |
comment:2 by , 18 years ago
Component: | Translations → Internationalization |
---|
Oops, I suppose the "Translations" component refers to the actual .po files. Changed back to i18n. I hope hugo still spots this.
comment:3 by , 18 years ago
Keywords: | i18n added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:4 by , 18 years ago
Triage Stage: | Accepted → Design decision needed |
---|
This is a really expensive change, performance-wise: every string now passes through a call to encode() prior to display. Since the alternative is do a more or less once-off translation of the source templates to UTF-8, I'm not convinced the performance hit is justified. I'm normally a big advocate for supporting more than just UTF-8 encodings because of the benefits to East Asian locales, but in this case even I'm not sure. Moving back to "design decision needed" for a bit more thinking.
comment:5 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
[patch] Fixes translation of non-us-ascii message ids