Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#31570 closed Bug (fixed)

Translations of one language in different territories can override each other

Reported by: Shai Berger Owned by: Carlton Gibson
Component: Internationalization Version: 2.2
Severity: Release blocker Keywords:
Cc: Claude Paroz Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The fix of #30439 created a new problem: Under some circumstances, translations for one language in different territories can override each other. In the case we've run into, users in New Zealand (en-NZ) received translations for South Africa (en-ZA). We've seen this behavior on 2.2.12, and downgrading to 2.2.11 resolved it, hence marking this as a bug in 2.2, but I believe the same issue exists in 3.0 and master.

The circumstances where this happens are not trivial to reproduce; it involves having applications with translations for the "bare" language (en in our case), and may depend on the order of loading of different translations. I will try to come up with a test later.

The issue arises, AFAICT, when the TranslationCatalog installs a reference to the bare-language dictionary ("catalog" in gettext terms) as the first in the list of catalogs for a territorial variation (when an app has only the bare-language translation), and is then willing to update it with the contents of another territorial variant.

The following diff seems to resolve the issue, but I am far from certain that it is the right solution.

  • django/utils/translation/trans_real.py

    diff --git a/django/utils/translation/trans_real.py b/django/utils/translation/trans_real.py
    index eed4705f6e..8042f6fdc4 100644
    a b class TranslationCatalog:  
    9696                cat.update(trans._catalog)
    9797                break
    9898        else:
    99             self._catalogs.insert(0, trans._catalog)
     99            self._catalogs.insert(0, trans._catalog.copy())
    100100            self._plurals.insert(0, trans.plural)
    101101
    102102    def get(self, key, default=None):

Attachments (1)

test-31570.tgz (4.1 KB ) - added by Shai Berger 5 years ago.
Test project showing a problem. Depends on nothing but Django. manage.py test passes on 2.2.11 and 3.0.4, fails on 2.2.12, 3.0.[56] and master.

Download all attachments as: .zip

Change History (22)

comment:1 by Carlton Gibson, 5 years ago

Hey Shai. Thanks for the report. πŸ€”

In the case we've run into, users in New Zealand (en-NZ) received translations for South Africa (en-ZA).
...
I will try to come up with a test later.

Yes, please. I'm struggling to see where the issue is.

I'm wondering if ​the test for the equivalence of the plural forms is robust enough:

    def update(self, trans):
        # Merge if plural function is the same, else prepend.
        for cat, plural in zip(self._catalogs, self._plurals):
            if trans.plural.__code__ == plural.__code__:
                cat.update(trans._catalog)
                break
       else: 
           ...

Q: What guarantees that the __code__ object of the ​generated plural functions evaluates equal or not? (Current status: no idea: it's not as simple as pointer address, but I can't yet find the implementation so... )

Update:

Are the relevant sources, but that's not to have followed the flow fully.
endupdate

There are a number of cases in the existing tests where we enter the different plural forms else-block, beyond the test added for #30439, so maybe there is a subtle behaviour change here. I'd be very grateful for that test to be able to pin it down.

Last edited 5 years ago by Carlton Gibson (previous) (diff)

comment:2 by Shai Berger, 5 years ago

I see that the problem description wasn't clear enough:

We are working on a website for the K-6 range of schools and students. The "K" year (for "Kindergarten") is called "Year 1" in New Zealand, but "Grade R" in South Africa, and some Kiwis were pretty confused when they saw "Grade R" on the site.

comment:3 by Carlton Gibson, 5 years ago

Hey Shai. Thanks for the clarification.

I was assuming that en-NZ and en-ZA have the same number of plural forms? (Is that correct?)

If so, we shouldn't end up in the branch where your suggested/possible edit is. i.e. the catalogs should be merged as they were in the past...

comment:4 by Shai Berger, 5 years ago

The whole TranslationCatalog class is new in the fix for #30439.

The issue is not about mis-comparing plural forms at all. The issue is incorrect handling, in that code, of the "base language" (en) catalog. All the en variants have the same plural forms, of course, and they do compare equal. But the code fails to account for the fact that the en catalog is loaded for en-NZ. When the stars align "correctly", it is loaded in such a way that cat in cat.update in line 96 is the en catalog, and it is updated with the NZ translations when those are loaded, and then with the ZA translations when those are loaded. It is the same (is) dict that is then used for all three languages.

comment:5 by Carlton Gibson, 5 years ago

When the stars align "correctly", it is loaded in such a way that cat in cat.update in line 96 is the en catalog, and it is updated with the NZ translations when those are loaded, and then with the ZA translations when those are loaded. It is the same (is) dict that is then used for all three languages.

OK, it's that bit I haven't pinned down as yet. A reproduce project would be super if you can. Otherwise I'll try again tomorrow.

(I'm not/wasn't seeing how the copy() in the else branch on ln99 comes into play if the plurals compare equal...)

Thanks for your patience with me. :)

comment:6 by Shai Berger, 5 years ago

The else: comes into play when the TranslationCatalog is still empty.

And thanks for your own patience for all my hand-waving with no failing-test code to back it up yet...

comment:7 by Carlton Gibson, 5 years ago

The else: comes into play when the TranslationCatalog is still empty.

Yes, of course. I think I had a dose of code blindness yesterday. 🀯

Nonetheless, I've had another pop at this this morning and can't seem to get it lined up to reproduce the issue.
I can keep trying but a sample project would be a big help.

by Shai Berger, 5 years ago

Attachment: test-31570.tgz added

Test project showing a problem. Depends on nothing but Django. manage.py test passes on 2.2.11 and 3.0.4, fails on 2.2.12, 3.0.[56] and master.

comment:8 by Shai Berger, 5 years ago

Well, I could not make the stars align just right to reproduce the exact behavior I saw in the large project, but there's enough in attachment:test-31570.tgz​ to show that there is a regression.

The project includes three apps -- bare_en and bare_en2 have their own translations, but only for en; the translation for trans (into four English dialects) is done in the project. In "real life", separate apps with their own translations are 3rd-party apps, of course. There's two such apps because I hoped to get an exact reproduction, and when I gave up on that, I just left them there.

I was unable to get the regional dialects to trump each other -- I see en being used instead of en-nz and en-ca, while en-au survives. I don't yet fully understand this.

comment:9 by Carlton Gibson, 5 years ago

Hey Shai β€” Thank you! Super. πŸ‘
I'll look at this again today.

comment:10 by Carlton Gibson, 5 years ago

Triage Stage: Unreviewed β†’ Accepted

OK, super regression confirmed. Thank you Shai.

comment:11 by Claude Paroz, 5 years ago

Carlton has a point with the plural function comparison broken. I cannot even understand why I wrote that :-( If it worked, this bug wouldn't have surface in this use case (as catalogs should be merged as before). Proper comparison of plural equations is not trivial, I fear it's almost impossible to do it perfectly.

comment:12 by Claude Paroz, 5 years ago

Still, I understand I did it because it somewhat magically works in many cases (I didn't dig either in C code to understand why, as it goes beyond my abilities).

Another way to empirically compare the plural functions would be to compare their result on numbers from 1 to 101 (or 105, or…). It would not be perfect but might work for our use case with a rather small performance penalty.

Last edited 5 years ago by Claude Paroz (previous) (diff)

comment:13 by Claude Paroz, 5 years ago

Something like that:

diff --git a/django/utils/translation/trans_real.py b/django/utils/translation/trans_real.py
index eed4705f6e..0dca4230d2 100644
--- a/django/utils/translation/trans_real.py
+++ b/django/utils/translation/trans_real.py
@@ -92,7 +92,7 @@ class TranslationCatalog:
     def update(self, trans):
         # Merge if plural function is the same, else prepend.
         for cat, plural in zip(self._catalogs, self._plurals):
-            if trans.plural.__code__ == plural.__code__:
+            if _plurals_equal(trans.plural, plural):
                 cat.update(trans._catalog)
                 break
         else:
@@ -259,6 +259,17 @@ class DjangoTranslation(gettext_module.GNUTranslations):
         return tmsg
 
 
+def _plurals_equal(plur1, plur2):
+    """
+    Compare plural functions by comparing their result on 0 - 101.
+    Not perfect, but probably sufficient for our use case.
+    """
+    for num in range(102):
+        if plur1(num) != plur2(num):
+            return False
+    return True
+
+
 def translation(language):
     """
     Return a translation object in the default 'django' domain.

in reply to:  11 comment:14 by Shai Berger, 5 years ago

Replying to Claude Paroz:

Carlton has a point with the plural function comparison broken. I cannot even understand why I wrote that :-( If it worked, this bug wouldn't have surface in this use case (as catalogs should be merged as before). Proper comparison of plural equations is not trivial, I fear it's almost impossible to do it perfectly.

As far as I can tell, this is not the issue here. See comment:4.

in reply to:  11 comment:15 by Shai Berger, 5 years ago

Replying to Claude Paroz:

[...] plural function comparison broken. [...] If it worked, this bug wouldn't have surface in this use case

After further investigation: This claim is technically correct, but is not helpful for solving this bug.

To my surprise, I found that there are actually two different implementations of the plural function in the English translation files. From the apps in the default project template, admin and auth have

"Plural-Forms: nplurals=2; plural=(n != 1);\n"

but this line is not found in the en translations (which exist) for contenttypes and sessions, nor in the files initially generated by makemessages. The __code__ comparison considers the default different from the explicit plural equation, but otherwise works.

Now, given that the default plural function is lambda n: int(n != 1), it is indeed the case that if the plural function comparison worked perfectly, all the plural functions involved would have been considered equal, and the problem would not arise.

However, the point of #30439 was exactly the support of files with different plural-functions for the same language. So the effect of the comparison failure is just to allow us to debug with English, a problem which could pop up in another language, where different functions are more common.

As an aside, this was the major missing piece which allowed me to reproduce exactly the behavior I saw in production: Adding the explicit plural forms to my apps' translation files, and making the test run two rounds of translations (as only when the first round ends, all languages are loaded), that is

  • trans/tests.py

    diff trans/tests.py~
    old new class TestTranslations(SimpleTestCase):  
    1010             'en_CA': 'canuck',
    1111             'en': 'local country person',
    1212         }
    13          for language, nick in country_people_nicknames.items():
    14              with self.subTest(language=language):
     13         for rnd in "12":
     14          for language, nick in country_people_nicknames.items():
     15             with self.subTest(language=language, round=rnd):
    1516                 activate(language)
    1617                 self.assertEqual(_('local country person'), nick)

Allowed me to get as output

======================================================================
FAIL: test_region_translations (trans.tests.TestTranslations) (language='en', round='1')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/slate/tmp/test_31570/trans/tests.py", line 17, in test_region_translations
    self.assertEqual(_('local country person'), nick)
AssertionError: 'canuck' != 'local country person'
- canuck
+ local country person


======================================================================
FAIL: test_region_translations (trans.tests.TestTranslations) (language='en_NZ', round='2')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/slate/tmp/test_31570/trans/tests.py", line 17, in test_region_translations
    self.assertEqual(_('local country person'), nick)
AssertionError: 'canuck' != 'kiwi'
- canuck
+ kiwi


======================================================================
FAIL: test_region_translations (trans.tests.TestTranslations) (language='en', round='2')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/slate/tmp/test_31570/trans/tests.py", line 17, in test_region_translations
    self.assertEqual(_('local country person'), nick)
AssertionError: 'canuck' != 'local country person'
- canuck
+ local country person

That is, the en-nz (and en) dialect was overridden by en-ca.

Last edited 5 years ago by Shai Berger (previous) (diff)

comment:16 by Carlton Gibson, 4 years ago

Owner: changed from nobody to Carlton Gibson
Status: new β†’ assigned

comment:17 by Carlton Gibson, 4 years ago

Has patch: set

​PR

I've spent far too long on this and still didn't get 100% to the bottom of it, particularly with the fallbacks here.
I did learn some things though... πŸ˜…

The key is that once we have the app with only the subset of all the supported languages, _add_installed_apps_translations() ends up getting a translation ​who's `_catalog` is the exact same dictionary as has been previously used. This is what Shai said already. The suggested copy() resolves that issue.

Thanks for the report Shai, and the reproduce: it's quite particular. :)

comment:18 by Mariusz Felisiak, 4 years ago

Triage Stage: Accepted β†’ Ready for checkin

comment:19 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: β†’ fixed
Status: assigned β†’ closed

In 98fada7:

[3.1.x] Fixed #31570 -- Corrected translation loading for apps providing territorial language variants with different plural equations.

Regression in e3e48b00127c09eafe6439d980a82fc5c591b673.

Thanks to Shai Berger for report, reproduce and suggested fix.

Backport of dd1ca50b096bf0351819aabc862e91a9797ddaca from master

comment:20 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 2638627d:

[3.0.x] Fixed #31570 -- Corrected translation loading for apps providing territorial language variants with different plural equations.

Regression in e3e48b00127c09eafe6439d980a82fc5c591b673.

Thanks to Shai Berger for report, reproduce and suggested fix.

Backport of dd1ca50b096bf0351819aabc862e91a9797ddaca from master

comment:21 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 027840d7:

[2.2.x] Fixed #31570 -- Corrected translation loading for apps providing territorial language variants with different plural equations.

Regression in e3e48b00127c09eafe6439d980a82fc5c591b673.

Thanks to Shai Berger for report, reproduce and suggested fix.

Backport of dd1ca50b096bf0351819aabc862e91a9797ddaca from master.

Note: See TracTickets for help on using tickets.
Back to Top