#33984 closed Bug (fixed)
Related managers cache gets stale after saving a fetched model with new PK
Reported by: | joeli | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.1 |
Severity: | Release blocker | Keywords: | |
Cc: | Keryn Knight | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I'm upgrading our codebase from Django 3.2 to 4.1, and came upon a regression when running our test suite. I bisected it down to commit 4f8c7fd9d9 Fixed #32980 -- Made models cache related managers: https://github.com/django/django/commit/4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6
The main problem is that when you have fetched a model containing a m2m field from the database, and access its m2m field the manager gets cached. If you then set the model's .pk
to None
and do a .save()
to save it as a copy of the old one, the related manager cache is not cleared. Here's some code with inline comments to demonstrate:
# models.py from django.db import models class Tag(models.Model): tag = models.SlugField(max_length=64, unique=True) def __str__(self): return self.tag class Thing(models.Model): tags = models.ManyToManyField(Tag, blank=True) def clone(self) -> 'Thing': # To clone a thing, we first save a list of the tags it has tags = list(self.tags.all()) # Then set its pk to none and save, creating the copy self.pk = None self.save() # In django 3.2 the following sets the original tags for the new instance. # In 4.x it's a no-op because self.tags returns the old instance's manager. self.tags.set(tags) return self @staticmethod def has_bug(): one, _ = Tag.objects.get_or_create(tag='one') two, _ = Tag.objects.get_or_create(tag='two') obj = Thing.objects.create() obj.tags.set([one, two]) new_thing = obj.clone() # new_thing.tags.all() returns the expected tags, but it is actually returning obj.tags.all() # If we fetch new_thing again it returns the actual tags for new_thing, which is empty. # # Even `new_thing.refresh_from_db()` -- which is not required with 3.x -- does not help. # `new_thing._state.related_managers_cache.clear()` works, but feels like something I # shouldn't have to do. return list(new_thing.tags.all()) != list(Thing.objects.get(pk=new_thing.pk).tags.all())
Change History (22)
comment:1 by , 2 years ago
Description: | modified (diff) |
---|
follow-up: 5 comment:2 by , 2 years ago
Cc: | added |
---|---|
Severity: | Normal → Release blocker |
Triage Stage: | Unreviewed → Accepted |
follow-up: 4 comment:3 by , 2 years ago
I am a bit confuse here, Using clone method all the tags are being copied from the old instance still the actual tags for the new instance ( new_thing ) are empty after fetching the new instance again? why the tags of old instance are not cloned ?
comment:4 by , 2 years ago
Replying to Bhuvnesh:
I am a bit confuse here, Using clone method all the tags are being copied from the old instance still the actual tags for the new instance ( new_thing ) are empty after fetching the new instance again? why the tags of old instance are not cloned ?
That is the bug here: in .clone()
after doing self.pk = None
and self.save()
to create a new copy, self.tags
still points to the old thing's related manager. So .clone()
ends up setting the tags for the old instance (that already has said tags). Then it returns a thing that pretends to be the new_thing
but actually returns the tags of the old thing unless you completely refetch it from database.
comment:5 by , 2 years ago
Replying to Mariusz Felisiak:
This is a documented breaking change, however it also broke a documented way for copying model instances, so we need to update docs or fix it.
Thanks, I was looking for that exact link to include in my original report but couldn't find where it was mentioned. Maybe worth noting here that adding self._state.adding = True
before the save in .clone() does not alter the result of this bug report.
FWIW I'd really like to see this fixed rather than addressed in the documentation, as we are relying on this quite heavily in production code to clone objects with deeply nested relations, so changing it to another style would be a pain. Not only that but I don't think there's any other documented way to duplicate objects in the ORM?
comment:6 by , 2 years ago
In theory the "simplest" solution (to keep the caching behaviour and allow the documented duplication method) is to special-case None
as a pk value and always instantiate a fresh manager, circumventing the cache entirely.
I think though that the problem technically extends further, any change to the pk
-- or more specifically, the manager's core filter dependencies -- would become stale subsequently. Unsupported those other cases may be from a documentation standpoint, but I'm sure someone somewhere is doing it...
If there's a path where we (I) can invalidate correctly, and it keeps the overall improvement of #32980, and it's not too invasive or complex, we can try and proceed on that basis. If the change restores the old performance behaviour in the process, there's not much argument (except maybe memory pressure/garbage collection) to keeping the cache functionality, off the top of my head.
Basically there's two things to check there above, and if either of them is a miss, I guess we'd revert #32980 because I can envisage plenty of cases where people use the pk=None
trick to clone an instance, documented as it is.
I may assign this out to myself in the near future if it's otherwise not picked up, assuming I can find the availability.
comment:7 by , 2 years ago
In theory the "simplest" solution (to keep the caching behaviour and allow the documented duplication method) is to special-case None as a pk value and always instantiate a fresh manager, circumventing the cache entirely.
None
is already handled separately for related objects, so maybe it's enough to refresh related_managers_cache
when self.pk is None
and they're cached 🤔 I'm also happy to revert 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6 and re-open #32980.
comment:9 by , 2 years ago
Replying to Bhuvnesh:
Can anyone write a regression test for this one?
A regression test is already in docs:
entry = Entry.objects.all()[0] # some previous entry old_authors = entry.authors.all() entry.pk = None entry._state.adding = True entry.save() entry.authors.set(old_authors) self.assertCountEqual(entry.authors.all(), [...])
Have you check previous comments?
comment:10 by , 2 years ago
Yes , i read the previous comments! Sorry, i didn't realize that i could make out regression tests from documentation too :)
comment:11 by , 2 years ago
Has patch: | set |
---|
PR
Added a condition to refresh related_managers_cache
when self.pk is None
and self._state.adding is True
. If self.pk = None
will be missing then the instance would not be cloned and if self._state.adding = True
is missing then related_manangers_cache would not be refreshed generating inconsistent results, hence both the conditions are necessary. It is also mentioned in docs.
comment:12 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:13 by , 2 years ago
Patch needs improvement: | set |
---|
comment:14 by , 2 years ago
I found another regression in caching related manager. The following test works on Django 4.0 and crashes with 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6:
-
tests/many_to_many/tests.py
diff --git a/tests/many_to_many/tests.py b/tests/many_to_many/tests.py index 53e870ddad..770808a92c 100644
a b class ManyToManyTests(TestCase): 92 92 a5.authors.remove(user_2.username) 93 93 self.assertSequenceEqual(a5.authors.all(), []) 94 94 95 def test_related_manager_refresh(self): 96 user_1 = User.objects.create(username="Jean") 97 user_2 = User.objects.create(username="Joe") 98 a5 = Article.objects.create(headline="Django lets you create web apps easily") 99 a5.authors.add(user_1.username) 100 self.assertSequenceEqual(user_1.article_set.all(), [a5]) 101 # Change the username on a different instance of the same user. 102 user_1_from_db = User.objects.get(pk=user_1.pk) 103 self.assertSequenceEqual(user_1_from_db.article_set.all(), [a5]) 104 user_1_from_db.username = "Paul" 105 user_1_from_db.save() 106 user_2.username = "Jean" 107 user_2.save() 108 # Add a different article. 109 self.a4.authors.add(user_1_from_db.username) 110 self.assertSequenceEqual(user_1_from_db.article_set.all(), [self.a4]) 111 # Refresh the first instance from DB. 112 user_1.refresh_from_db() 113 self.assertEqual(user_1.username, "Paul") 114 self.assertSequenceEqual(user_1.article_set.all(), [self.a4]) 115 95 116 def test_add_remove_invalid_type(self): 96 117 msg = "Field 'id' expected a number but got 'invalid'." 97 118 for method in ["add", "remove"]:
It shows two issues:
- the second call of
user_1_from_db.article_set.all()
should use a new username (Paul
) but it's still using the old one (Jean
), user_1.refresh_from_db()
doesn't clear cached managers.
I'm going to prepare revert of 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6 and reopen #32980.
follow-up: 19 comment:18 by , 2 years ago
Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the init even if you check for the the instance being set along with the relationship?
I.E If you have models and a Model form like this
# models.py from django.db import models class Tag(models.Model): tag = models.SlugField(max_length=64, unique=True) class Thing(models.Model): tag = models.ForeignKey(Tag, on_delete=models.CASCADE, related_name="things") active = models.BooleanField(default=True)
# forms.py from django import forms from example.models import Tag class TagForm(forms.ModelForm): class Meta: model = Tag fields = ["tag"] def __init__(self, *args, **kwargs): super(TagForm, self).__init__(*args, **kwargs) if self.instance and self.instance.things: inactive_things = self.instance.things.filter(active=False) # Do something with it
Then have the following test
from unittest.case import TestCase from example.forms import TagForm class TagFormCase(TestCase): def test_required(self): """Test required fields""" form = TagForm({}) self.assertFalse(form.is_valid()) self.assertEqual( { "tag": ["This field is required."], }, form.errors, )
The test would pass in all Django versions up to 4.1. From 4.1.0 onwards it raises a ValueError of ValueError: 'Tag' instance needs to have a primary key value before this relationship can be used.
In order for the test to pass you need to change the flow control to if self.instance.pk and self.instance.things
. You can't use
self.instance.things.count()
as it raises the exception.
However if you overload the primary key on the model to use a UUID such as:
# models.py import uuid from django.db import models class Tag(models.Model): id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False) tag = models.SlugField(max_length=64, unique=True) class Thing(models.Model): tag = models.ForeignKey(Tag, on_delete=models.CASCADE, related_name="things") active = models.BooleanField(default=True)
then the original test will pass as the uuid will be set prior to saving
follow-ups: 20 21 comment:19 by , 2 years ago
Replying to Steven Mapes:
Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the
__init__
even if you check for the the instance being set along with the relationship?
I believe that's #33952 -- that was fixed in 4.1.1
and this ticket in 4.1.2
.
comment:20 by , 2 years ago
Replying to Keryn Knight:
Replying to Steven Mapes:
Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the
__init__
even if you check for the the instance being set along with the relationship?
I believe that's #33952 -- that was fixed in
4.1.1
and this ticket in4.1.2
.
Ah okay, I just tested it with 3.2.16, 4.0.8, 4.1.0, 4.1.1, 4.1.2 and the tests pass in 3.2.16 and 4.0.8 but still fail in the others
comment:21 by , 2 years ago
Replying to Keryn Knight:
Replying to Steven Mapes:
Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the
__init__
even if you check for the the instance being set along with the relationship?
I believe that's #33952 -- that was fixed in
4.1.1
and this ticket in4.1.2
.
Confirming, exists in 4.1.2
.
This is a documented breaking change, however it also broke a documented way for copying model instances, so we need to update docs or fix it.