#34925 closed Bug (fixed)
refresh_from_db() will not iterate through all of the fields listed in the 'fields' parameter.
Reported by: | Andrew Cordery | Owned by: | Trontelj |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
This one was killing me today. If you pass the 'fields' parameter to refresh_from_db
it is supposed to only remove those fields from _prefetched_objects_cache
as described in the docs: https://docs.djangoproject.com/en/4.2/ref/models/instances/#django.db.models.Model.refresh_from_db.
Unfortunately, it modifies the 'fields' variable as it iterates through it:
Line 694 of django.db.models.base.py:
for field in fields: if field in prefetched_objects_cache: del prefetched_objects_cache[field] fields.remove(field)
Modifying the list that we are iterating over causes some list items to be skipped. For example here we would expected to see 'a', 'b', and 'c', removed from dict d
but 'b' is skipped:
In [8]: d = dict(a=1,b=2, c=3) In [9]: fields = ['a','b','c'] In [10]: for f in fields: ...: print(f) ...: if f in d: ...: del d[f] ...: fields.remove(f) ...: a c In [11]: fields Out[11]: ['b'] In [12]: d Out[12]: {'b': 2}
I beieve that all that needs to be done to correct this is to create a copy of the list by wrapping fields in list(), like so:
In [13]: fields = ['a','b','c'] In [14]: d=dict(a=1,b=2, c=3) In [15]: for f in list(fields): ...: print(f) ...: if f in d: ...: del d[f] ...: fields.remove(f) ...: a b c In [16]: d Out[16]: {} In [17]: fields Out[17]: []
Therefore as far as I can tell the fix to the django code would be as easy as wrapping fields in list():
Line 694 of django.db.models.base.py: modified
for field in list(fields): if field in prefetched_objects_cache: del prefetched_objects_cache[field] fields.remove(field)
Change History (16)
comment:2 by , 15 months ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 4.2 → 5.0 |
comment:3 by , 15 months ago
I searched through the codebase with some admitedly crap regex just looking for similar instances of for x in y: ... y.remove(x), and I found one other:
Line 93 of django/core/management/commands/compilemessages.py:
for dirpath, dirnames, filenames in os.walk(".", topdown=True): for dirname in dirnames: if is_ignored_path( os.path.normpath(os.path.join(dirpath, dirname)), ignore_patterns ): dirnames.remove(dirname) elif dirname == "locale": basedirs.append(os.path.join(dirpath, dirname))
This behavior is explicitly encouraged by python https://docs.python.org/3/library/os.html#os.walk, however in my tests this will still lead to the same kind of member skipping behavior. This particular block of code would still work correctly as long as there was never a 'locale' dirname directly after an ignored dirname in the same path.
For example, imagine the following directory structure, where we wanted to extract the two 'locale' dirs (./locale and ./a/locale) and ignore the 'b' dir.
root: a/ locale/ b/ locale/
If you were to run the following code:
import os locale_paths=[] for dirpath, dirnames, filenames in os.walk(".", topdown=True): # os.walk returns dirnames in arbitrary order https://docs.python.org/3/library/os.html#os.listdir, # so sort the dirnames to force the skip directory 'b' to come directly before the 'locale' directory dirnames.sort() print(f'Iterating through {dirpath}/{dirnames}') for dirname in dirnames: path = os.path.join(dirpath, dirname) print(f'Examining {path}...', end='') if dirname == 'b': print(f'Ignoring {path}') dirnames.remove(dirname) elif dirname == 'locale': print(f'Found {path}') locale_paths.append(path) else: print('')
You would get the following output:
Iterating through ./['a', 'b', 'locale'] Examining ./a... Examining ./b...Ignoring ./b Iterating through ./a/['locale'] Examining ./a/locale...Found ./a/locale Iterating through ./a/locale/[] Iterating through ./locale/[] In [33]: locale_paths Out[33]: ['./a/locale']
Notice that ./locale is never 'examined' nor does it appear in the 'locale_paths' output variable. It does however get traversed as shown by the 'Iterating through ./locale/[]'
message, which makes sense as it was not removed from dirnames. This does mean though that if ./locale/ had had a 'locale' subdirectory in it, that directory would have been picked up, even though the parent was missed.
Naturally, changing the line to for dirname in list(dirnames)
also completely resolves this issue.
Not sure any of that matters or is useful to you guys, but it was interesting to me. Moral of the story is don't ever mutate a list we are iterating through, even if the python docs seem to imply that you can! I think their docs should be amended to specifically state that you should NOT mutate it during the initial iteration, and that instead you should iterate through a copy (ex list(dirnames)) or only mutate it once you have examined every member of the list that you are interested in, right before your code exits.
comment:4 by , 15 months ago
Hi Andrew,
Thanks for you investigation, interested in submitting a patch & test? 🏆
comment:5 by , 15 months ago
Hello, I'm new to this community, but I have three years of experience working with Django, and I'm eager to contribute. Can I work on this ticket?
comment:6 by , 15 months ago
@trontelj sure 👍 Submit a PR and assign yourself. If you need assistance please head over to Discord to seek help from a community member.
comment:9 by , 15 months ago
Has patch: | set |
---|
comment:10 by , 15 months ago
Needs tests: | unset |
---|
I have added tests. The failed tests are not related to the changes I made in my code.
comment:11 by , 15 months ago
Needs tests: | set |
---|
comment:12 by , 15 months ago
Needs tests: | unset |
---|---|
Patch needs improvement: | set |
comment:13 by , 14 months ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
comment:16 by , 9 months ago
Hi there. Can anyone confirm that there are no plans to backport this to 4.2 since it is neither a data loss or security issue? It would be nice to have, but I understand if those of us working with the 4.2 LTS release just have to work around this bug. Thanks.
comment:17 by , 9 months ago
Correct. If applicable, backports are done immediately after merging a patch to the main branch.
Wow it's surprising that this flew under the radar for so long. The bug has been around since we introduced prefetched relationship clearing in #29625.
We should definitely not alter the provided
fields
and obviously not remove members while iterating over it.