Opened 13 months ago
Last modified 12 months ago
#34907 assigned Bug
loaddata crashes on objects with natural keys when don't exist on passed database.
Reported by: | Florian | Owned by: | Gaurav Jain |
---|---|---|---|
Component: | Core (Serialization) | Version: | 4.2 |
Severity: | Normal | Keywords: | natural key, multi-db, loaddata |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Hi all,
I'm running into a bug when trying to use fixtures with natural keys and using multiple databases.
If I have two tables A and B, that are routed by a database router to db1 and db2 respectively, then I can load 'regular' fixtures of A in db2. Django will correctly recognize that A data does not need to be loaded into db2, and hence give the (correct) output that 0 objects are loaded from 1 fixture.
However, if I keep everything the same, except that I use natural keys for the data, then loaddata will give an OperationalError exception.
I've created a minimal reproduction here:
https://github.com/fvkluck/improved-dollop
In its readme I've put the reproduction, the full stacktrace, and explain where I got my data. Shortest version of the repro is:
- git clone https://github.com/fvkluck/improved-dollop
- cd improved-dollop/repro
- python manage.py migrate
- python manage.py migrate --database=secondary
- python manage.py loaddata someprimarymodel_natural.json --database=secondary
This leads to an exception (OperationalError). Expected is: '0 objects were loaded from 1 fixture'
Best,
Florian
Change History (9)
comment:2 by , 13 months ago
Component: | Database layer (models, ORM) → Core (Serialization) |
---|---|
Summary: | OperationalError during loaddata of fixtures with natural keys on multi-db → loaddata crashes on objects with natural keys when don't exist on passed database. |
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 13 months ago
Thanks for looking into it. The patch seems to fix it for me (filling in some small details like importing the router):
-
django/core/serializers/python.py
diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py index 7ec894aa00..2ef5a1765a 100644
a b other serializers. 6 6 7 7 from django.apps import apps 8 8 from django.core.serializers import base 9 from django.db import DEFAULT_DB_ALIAS, models 9 from django.db import DEFAULT_DB_ALIAS, models, router 10 10 from django.utils.encoding import is_protected_type 11 11 12 12 … … def Deserializer( 118 118 else: 119 119 raise 120 120 data = {} 121 if not router.allow_migrate(using, Model): 122 continue 121 123 if "pk" in d: 122 124 try: 123 125 data[Model._meta.pk.attname] = Model._meta.pk.to_python(d.get("pk"))
The test script outputs 'Ok' when running all tests with this patch applied.
I can't really figure out what would be a logical unit test for the bug. I've been looking at tests/serializers/test_natural.py
for inspiration. But I guess I'm not familiar enough with this part to come up with a good test case that fails without this patch. What I've tried is modifying natural_key_test
to use a different database (modifications marked with comments):
class TestRouter: def allow_migrate(self, db, app_label, model_name=None, **hints): return False @override_settings(DATABASE_ROUTES=[TestRouter]) # added router def natural_key_test_multidb(self, format): book1 = { "data": "978-1590597255", "title": "The Definitive Guide to Django: Web Development Done Right", } book2 = {"data": "978-1590599969", "title": "Practical Django Projects"} # Create the books. adrian = NaturalKeyAnchor.objects.create(**book1) james = NaturalKeyAnchor.objects.create(**book2) # Serialize the books. string_data = serializers.serialize( format, NaturalKeyAnchor.objects.all(), indent=2, use_natural_foreign_keys=True, use_natural_primary_keys=True, ) # Delete one book (to prove that the natural key generation will only # restore the primary keys of books found in the database via the # get_natural_key manager method). james.delete() # Deserialize and test. books = list(serializers.deserialize(format, string_data, using='other')) # added using self.assertCountEqual( [(book.object.title, book.object.pk) for book in books], [ (book1["title"], None), # None i.s.o. adrian.pk (book2["title"], None), ], )
It feels natural to me that None
is returned when using a different database in which the object doesn't exist. This test succeeds already without the patch. I guess what's missing is that the router is not used when setting up the database, but I'm currently somewhat lost in getting it to do that. Any hints?
I'm going to look into it some more tomorrow.
comment:4 by , 12 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:6 by , 12 months ago
Cc: | added |
---|---|
Has patch: | unset |
Owner: | removed |
Status: | assigned → new |
comment:7 by , 12 months ago
Hey @Florian, it looks like there's a typo in DATABASE_ROUTES. Although the test works without the patch, applying it breaks other tests connected to loaddata, particularly those linked to a multidb setup, such as django/tests/multiple_database/tests.py: FixtureTestCase.test_pseudo_empty_fixtures.
Due to the patch suppressing this error, loaddata proceeds silently with the subsequent steps. Consequently, it fails to assert the expected string in the mentioned test case and We're encountering a ''RuntimeWarning: No fixture data found for 'pets'. (The file format might be invalid.)
Here is the patchset based on suggested changes. https://github.com/django/django/pull/17528. PR checks failed due to failed tests.
comment:8 by , 12 months ago
Cc: | removed |
---|---|
Has patch: | set |
Owner: | set to |
Status: | new → assigned |
comment:9 by , 12 months ago
Patch needs improvement: | set |
---|
Thanks for the report.
Django doesn't need to search for an object in the database when you don't use natural key, that's why it works.
When natural keys are used Django tries to find a matching object in the database when building an instance based on JSON values. We could probably avoid this with:
django/core/serializers/python.py
Does it work for you?