Context Navigation

← Previous Ticket
Next Ticket →

#35967 assigned Bug

TransactionTestCase.serialized_rollback reads from real database rather than test when using read replica for a model instance created in a migration with a ManyToManyField

Reported by:	Jake Howard	Owned by:	Simon Charette
Component:	Testing framework	Version:	dev
Severity:	Normal	Keywords:
Cc:	Ryan Cheley, Jacob Walls, Simon Charette	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

(Yes, this is a rather specific bug...)

When:

Using a database router to create a read-replica database, configured as a MIRROR in tests, and
Using TransactionTestCase.serialized_rollback, and
Having a model instance created in a migration which has a ManyToMany field

The serializer for serialized_rollback tries to read from the non-test database. If that database doesn't exist yet (for example, in CI), this throws an error:

django.db.utils.OperationalError: no such table: auth_user

If migrations are run (manage.py migrate), thus creating the tables for the non-test database, tests pass correctly. Prooving it's reading from the wrong connection.

I've created a minimal reproduction of this issue, and confirmed it happens on SQLite, PostgreSQL and Django 4.2, 5.0, 5.1 and main

Change History (11)

comment:1 by Ryan Cheley, 5 weeks ago

Cc:	Ryan Cheley added

comment:2 by Sarah Boyce, 4 weeks ago

Cc:	Jacob Walls added

comment:3 by Jacob Walls, 4 weeks ago

Component:	Database layer (models, ORM) → Core (Serialization)

Thanks for the crystal clear reproduction. I haven't tested to be sure, but it certainly looks a lot like #23979.

This hack passes your test case, but I'm not going to opine on whether it's correct; I just put this together as a learning exercise and to verify my hunch that this was more to do with the serializer and less to do with the testing framework. If someone more knowledgable can opine on correctness and backwards compatibility, I'm happy to push this forward.

django/core/serializers/base.py

diff --git a/django/core/serializers/base.py b/django/core/serializers/base.py
index 1fbca9244b..f2f0d4d8d6 100644

                Module for abstract serializer/unserializer base classes.
 from io import StringIO
 from django.core.exceptions import ObjectDoesNotExist
 from django.db import models
+from django.db import models, DEFAULT_DB_ALIAS
 DEFER_FIELD = object()
-…
+               class Serializer:
         use_natural_primary_keys=False,
         progress_output=None,
         object_count=0,
+        using=DEFAULT_DB_ALIAS,
         **options,
     ):
         """
-…
+               class Serializer:
                         self.selected_fields is None
                         or field.attname in self.selected_fields
                     ):
                         self.handle_m2m_field(obj, field)
+                        self.handle_m2m_field(obj, field, using=using)
             self.end_object(obj)
             progress_bar.update(count)
             self.first = self.first and False
-…
+               class Serializer:
             "subclasses of Serializer must provide a handle_fk_field() method"
+        )
     def handle_m2m_field(self, obj, field):
+    def handle_m2m_field(self, obj, field, *, using=DEFAULT_DB_ALIAS):
         """
         Called to handle a ManyToManyField.
         """

django/core/serializers/python.py

diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py
index 57edebbb70..93898d3801 100644

                class Serializer(base.Serializer):
             value = self._value_from_field(obj, field)
         self._current[field.name] = value
     def handle_m2m_field(self, obj, field):
+    def handle_m2m_field(self, obj, field, *, using=DEFAULT_DB_ALIAS):
         if field.remote_field.through._meta.auto_created:
             if self.use_natural_foreign_keys and hasattr(
                 field.remote_field.model, "natural_key"
-…
+               class Serializer(base.Serializer):
                         getattr(obj, field.name)
                         .select_related(None)
                         .only("pk")
+                        .using(using)
                         .iterator()
+                    )

django/db/backends/base/creation.py

diff --git a/django/db/backends/base/creation.py b/django/db/backends/base/creation.py
index 6856fdb596..8adfa0d7ca 100644

                class BaseDatabaseCreation:
         # Serialize to a string
         out = StringIO()
+        serializers.serialize("json", get_objects(), indent=None, stream=out)
+        serializers.serialize(
+            "json", get_objects(), indent=None, stream=out, using=self.connection.alias
+        )
         return out.getvalue()
     def deserialize_db_from_string(self, data):

comment:4 by Jacob Walls, 4 weeks ago

Owner:	set to Jacob Walls
Status:	new → assigned
Triage Stage:	Unreviewed → Accepted

I'm looking into something a little more backwards compatible.

comment:5 by Simon Charette, 4 weeks ago

If someone more knowledgable can opine on correctness and backwards compatibility, I'm happy to push this forward.

Hey Jacob I admittedly haven't looked into the issue in depth but I think that forcing the usage of the current alias like you did here is a certainly a key towards the solution.

I have a hunch that the actual problems lives in djang.test.utils.setup_databases though and particularly how BaseDatabaseCreation.create_test_db is implemented. setup_databases currently follows this sequence of operations for each DATABASES entries

Create the test db replacement
Repoint settings.DATABASES[alias][name] to the test db replacement
Peform migrations
Serialize content

I think that instead what it should do is 1, 2, 3 for each DATABASES entries (making sure that all the test databases are setup) and then do 4 for each of them. Something like

django/db/backends/base/creation.py

diff --git a/django/db/backends/base/creation.py b/django/db/backends/base/creation.py
index 6856fdb596..7a0e2a0622 100644

                def create_test_db(
         # who are testing on databases without transactions or who are using
         # a TransactionTestCase still get a clean database on every test run.
         if serialize:
+            # XXX: Emit a deprecation warnings when `serialize` is provided.
             self.connection._test_serialized_contents = self.serialize_db_to_string()
         call_command("createcachetable", database=self.connection.alias)

django/test/utils.py

diff --git a/django/test/utils.py b/django/test/utils.py
index ddb85127dc..a2fe8b14cc 100644

                def setup_databases(
     test_databases, mirrored_aliases = get_unique_databases_and_mirrors(aliases)
     old_names = []
+    serialize_connections = []
     for db_name, aliases in test_databases.values():
         first_alias = None
-…
+               def setup_databases(
             if first_alias is None:
                 first_alias = alias
                 with time_keeper.timed("  Creating '%s'" % alias):
-                    serialize_alias = (
-                        serialized_aliases is None or alias in serialized_aliases
+                    )
                     connection.creation.create_test_db(
                         verbosity=verbosity,
                         autoclobber=not interactive,
                         keepdb=keepdb,
-                        serialize=serialize_alias,
+                    )
+                    if serialized_aliases is None or alias in serialized_aliases:
+                        serialize_connections.append(connection)
                 if parallel > 1:
                     for index in range(parallel):
                         with time_keeper.timed("  Cloning '%s'" % alias):
-…
+               def setup_databases(
                     connections[first_alias].settings_dict
+                )
+    # Serialize content of test databases only once all of them are setup
+    # to account for database routing during serialization.
+    for serialize_connection in serialize_connections:
+        serialize_connection._test_serialized_contents = (
+            serialize_connection.serialize_db_to_string()
+        )
     # Configure the test mirrors.
     for alias, mirror_alias in mirrored_aliases.items():
         connections[alias].creation.set_as_test_mirror(

comment:6 by Simon Charette, 4 weeks ago

Cc:	Simon Charette added

comment:7 by Jacob Walls, 4 weeks ago

Thanks for the idea, I had a look. Turns out minimally adjusting comment:5 to run (by adding .creation in one of a couple places) doesn't fix OP's reproduction. It also doesn't pass the unit test I had written to go with comment:3 below, but that is to be expected since the test has nothing to do with setup_databases(). (It does pass with comment:3.)

tests/backends/base/test_creation.py

diff --git a/tests/backends/base/test_creation.py b/tests/backends/base/test_creation.py
index 7e760e8884..84eb3f4a5f 100644

                class TestDbCreationTests(SimpleTestCase):
 class TestDeserializeDbFromString(TransactionTestCase):
     available_apps = ["backends"]
+    databases = {"default", "other"}
     def test_circular_reference(self):
         # deserialize_db_from_string() handles circular references.
-…
+               class TestDeserializeDbFromString(TransactionTestCase):
         self.assertIn('"model": "backends.schoolclass"', data)
         self.assertIn(f'"schoolclasses": [{sclass.pk}]', data)
+    def test_serialize_db_to_string_with_m2m_field_and_router(self):
+        class OtherRouter:
+            def db_for_read(self, model, **hints):
+                return "other"
+        with override_settings(DATABASE_ROUTERS=[OtherRouter()]):
+            obj1 = Object.objects.create()
+            obj2 = Object.objects.create()
+            obj2.related_objects.set([obj1])
+            with mock.patch("django.db.migrations.loader.MigrationLoader") as loader:
+                # serialize_db_to_string() serializes only migrated apps, so mark
+                # the backends app as migrated.
+                loader_instance = loader.return_value
+                loader_instance.migrated_apps = {"backends"}
+                data = connection.creation.serialize_db_to_string()
+        self.assertIn(f'"related_objects": [{obj1.pk}]', data)
+        # Test serialize() directly, in all four cases (json/xml, natural key/without)
 class SkipTestClass:
     def skip_function(self):

My plan was to repeat these tests with plain calls to serialize() in tests/serializers and test all four cases (json vs. xml, natural key vs. without). If this test is fair, wouldn't this be an issue with the serializers? Maybe I'm missing a reason this test isn't realistic.

comment:8 by Simon Charette, 4 weeks ago

Sorry for sending on a wild goose chase Jacob, the patch above was meant to be a draft that needs tweaking and not a final solution.

Once adjusted like the following it addresses the problem reported by Jake

django/test/utils.py

diff --git a/django/test/utils.py b/django/test/utils.py
index ddb85127dc..7d66140efa 100644

                def setup_databases(
     test_databases, mirrored_aliases = get_unique_databases_and_mirrors(aliases)
     old_names = []
+    serialize_connections = []
     for db_name, aliases in test_databases.values():
         first_alias = None
-…
+               def setup_databases(
             if first_alias is None:
                 first_alias = alias
                 with time_keeper.timed("  Creating '%s'" % alias):
-                    serialize_alias = (
-                        serialized_aliases is None or alias in serialized_aliases
+                    )
                     connection.creation.create_test_db(
                         verbosity=verbosity,
                         autoclobber=not interactive,
                         keepdb=keepdb,
                         serialize=serialize_alias,
+                        serialize=False,
+                    )
+                    if serialized_aliases is None or alias in serialized_aliases:
+                        serialize_connections.append(connection)
                 if parallel > 1:
                     for index in range(parallel):
                         with time_keeper.timed("  Cloning '%s'" % alias):
-…
+               def setup_databases(
             connections[mirror_alias].settings_dict
+        )
+    # Serialize content of test databases only once all of them are setup
+    # to account for database mirroring and routing during serialization.
+    for serialize_connection in serialize_connections:
+        serialize_connection._test_serialized_contents = (
+            serialize_connection.creation.serialize_db_to_string()
+        )
     if debug_sql:
         for alias in connections:
             connections[alias].force_debug_cursor = True

The above include three tweaks that were not included in comment:5

It explicitly pass serialize=False to create_test_db as we want to defer this operation to a time when all connections are repointed to test databases
Use .creation.serialize_db_to_string as you've noticed
Make sure to perform serialization only once mirrors have been appropriately setup

I want to re-iterate that what appears to be the problem here, at least to me, is less about the routing of queries during serialization and more that we attempt to perform any form of reads against non-test databases before DATABASES entries are all re-pointed to their test equivalent.

If this test is fair, wouldn't this be an issue with the serializers? Maybe I'm missing a reason this test isn't realistic.

This is something that is effectively hard to test as it relates to the nature of the test suite bootstraping sequence but we do have a few examples in tests.test_runner where the reported scenario could be reproduced.

Last edited 4 weeks ago by Simon Charette (previous) (diff)

comment:9 by Jacob Walls, 4 weeks ago

Component:	Core (Serialization) → Testing framework

Perfect, thanks for clarifying.

comment:10 by Simon Charette, 3 weeks ago

Has patch:	set

Jacob & Jake, I submitted this PR for review and I'd like to have your thoughts on it. I didn't manage to create an exact integration test as it would require creating a nested test database and ensure that no database queries can be performed against the testing database but it seems the chosen approach in there so far has been to rely heavily on mocking.

comment:11 by Jacob Walls, 3 weeks ago

Owner:	changed from Jacob Walls to Simon Charette

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#35967 assigned Bug

TransactionTestCase.serialized_rollback reads from real database rather than test when using read replica for a model instance created in a migration with a ManyToManyField

Description

Change History (11)

comment:1 by Ryan Cheley, 5 weeks ago

comment:2 by Sarah Boyce, 4 weeks ago

comment:3 by Jacob Walls, 4 weeks ago

django/core/serializers/base.py

django/core/serializers/python.py

django/db/backends/base/creation.py

comment:4 by Jacob Walls, 4 weeks ago

comment:5 by Simon Charette, 4 weeks ago

django/db/backends/base/creation.py

django/test/utils.py

comment:6 by Simon Charette, 4 weeks ago

comment:7 by Jacob Walls, 4 weeks ago

tests/backends/base/test_creation.py

comment:8 by Simon Charette, 4 weeks ago

django/test/utils.py

comment:9 by Jacob Walls, 4 weeks ago

comment:10 by Simon Charette, 3 weeks ago

comment:11 by Jacob Walls, 3 weeks ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us