Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#21725 closed Bug (fixed)

Javascript translations fail with non-BMP characters

Reported by: Ned Batchelder Owned by: MattBlack
Component: Internationalization Version: 1.6
Severity: Normal Keywords: nlsprint14
Cc: eromijn@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If a translated string includes a non-BMP character (above 0xFFFF), then javascript_catalog in views/i18n.py fails:

Traceback (most recent call last):
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/views/i18n.py", line 264, in javascript_catalog
    csrc.append("catalog['%s'] = '%s';\n" % (javascript_quote(k), javascript_quote(v)))
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/utils/functional.py", line 176, in wrapper
    return func(*args, **kwargs)
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/utils/text.py", line 305, in javascript_quote
    return str(ustring_re.sub(fix, s))
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001d543' in position 31: ordinal not in range(128)

(this is running 1.4.8, but the javascript_quote code hasn't changed since then.)

It should be possible to fix javascript_quote to turn the character into a surrogate pair.

Change History (13)

comment:1 by Claude Paroz, 11 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:2 by MattBlack, 11 years ago

Owner: changed from nobody to MattBlack
Status: newassigned

comment:3 by Baptiste Mispelon <bmispelon@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 1c1dffca757b0b6acaf99d893d68847250ab4146:

Fixed #21725 -- Fixed JavaScript quoting encoding.

Thanks to nedbatchelder for the report.

comment:4 by Ned Batchelder, 11 years ago

Is this really a full fix? Why does this function replace BMP characters with four-digit \uXXXX escapes, but allow non-BMP characters through unchanged? I would have thought that Javascript would need surrogate pairs.

comment:5 by Honza Král, 11 years ago

Resolution: fixed
Status: closednew

I am getting consistent failures since this was merge in python 2 (not python 3) on mac:

======================================================================
FAIL: test_javascript_quote_unicode (utils_tests.test_text.TestUtilsText)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/honza/work/django/tests/utils_tests/test_text.py", line 162, in test_javascript_quote_unicode
    self.assertEqual(text.javascript_quote(input), output)
AssertionError: u"<script>alert(\\'Hello \\\\xff.\\n Wel\\ud835\\udd43come\\there\\r\\');<\\/scr [truncated]... != u"<script>alert(\\'Hello \\\\xff.\\n Wel\U0001d543come\\there\\r\\');<\\/script> [truncated]...
- <script>alert(\'Hello \\xff.\n Wel\ud835\udd43come\there\r\');<\/script>?                                   ^^^^^^^^^^^^
+ <script>alert(\'Hello \\xff.\n Wel\ud835\udd43come\there\r\');<\/script>?

comment:6 by Baptiste Mispelon, 11 years ago

I did some digging and as it turns out, javascript_quote is not used anymore when doing javascript translation since a506b6981bc48caec30bca3de94d2ac3e6fc1660.

In fact, this function is undocumented, barely tested, and was only used internally for the javascript_catalogue view.

On top of that, it was also completely broken on Python2 if you ever passed it non-ascii input.

I think we should just delete it altogether.

As for the test failure, they pass on my machine both with Python 2 and 3 and our CI server is also in the green. There's a chance it might be related to a wide/narrow build of Python.

comment:7 by Baptiste Mispelon, 11 years ago

Has patch: set

Here's a pull request that adds a regression test to the javascript_catalog suite that makes sure that non-BMP characters in translation files are correctly handled.

It also deprecates javascript_quote altogether (it was undocumented and not used anymore after a506b6981bc48caec30bca3de94d2ac3e6fc1660).

Finally, it also skips the reported failing test on narrow python builds.

comment:9 by Sasha Romijn, 11 years ago

Cc: eromijn@… added
Keywords: nlsprint14 added
Triage Stage: AcceptedReady for checkin

PR 2339 looks fine to me, and tests run (although I tested on a non-wide python).

comment:10 by Baptiste Mispelon <bmispelon@…>, 11 years ago

In 926e18d7d126fcf7f4b2d25ce4155423ac6e2f90:

Deprecated django.utils.text.javascript_quote.

Refs #21725.

comment:11 by Baptiste Mispelon, 11 years ago

Resolution: fixed
Status: newclosed

Thanks for the review.

comment:12 by Claude Paroz <claude@…>, 11 years ago

In ac699cdc174a825e6b78c6f3c6e967bc961413c8:

Really hidden warnings in javascript_quote tests

Refs #21725.

comment:13 by Tim Graham <timograham@…>, 10 years ago

In df3f3bbe2927b9bad80088c6adbf5e8c5ba778c9:

Removed utils.text.javascript_quote() per deprecation timeline; refs #21725.

Note: See TracTickets for help on using tickets.
Back to Top