Opened 3 weeks ago
Last modified 8 days ago
#36168 new Bug
Backwards migration to replaced migration when other app has squashed migrations can lead to FieldDoesNotExist error due to incorrect state
Reported by: | Klaas van Schelven | Owned by: | |
---|---|---|---|
Component: | Migrations | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Jacob Walls | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I managed to debug my way to the broken part of the code, but did not manage to find a nice clean reproduction. Hence the following report starts with broken code and only then explains the problem, which is the unusual order.
This is wrong because it mutates the state on the MigrationExecutor
, i.e. in certain conditions a different graph (without the pruning of replace_migrations
) is set "globally" on the MigrationExecutor
.
But migration_plan
(the function of which the broken code is part) is called _twice_ in at least some flows. In particular: first as part of migrate
, and then as part of _create_project_state
. This means that, for flows where the broken code is called in the first call to migration_plan
(for presumably good reasons, as per the comment above it), the construction of the second plan uses a graph that is too large, causing a failure in some cases.
This fails in the following combination of conditions:
- a project with squashed migrations in more than one app
- where the squashed migrations do not have follow-up migrations (i.e. when the graph is not simplified for replacements, both the squashing migration and the last migration it replaces show up as leaf nodes)
- explicit migration to a replaced migration from the command line.
because of the explicit migration to a replaced migration (3) the graph is updated in the "wrong code" as per the comment. Then, when trying to _create_project_state
the graph contains leaf nodes for both the original and squashed paths, which means that some things in the project-state-creation happen doubly, which fails.
I have encountered this while working on Bugsink, a self-hosted Error Tracker, on this commit. I have not been able to distill a more clean PoC that actually exhibits failure though.
Replacing the mutation with something that leaves the loader untouched (ugly version below) fixes the problem though.
old_loader = self.loader self.loader = MigrationLoader(self.connection) self.loader.replace_migrations = False self.loader.build_graph() result = self.migration_plan(targets, clean_start=clean_start) self.loader = old_loader return result
Change History (9)
comment:1 by , 3 weeks ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:2 by , 3 weeks ago
ok, finally found a clean reproducer!
https://github.com/vanschelven/squashwithrename
(freshdjango) klaas@pop-os:~/dev/squashwithrename$ rm db.sqlite3 (freshdjango) klaas@pop-os:~/dev/squashwithrename$ python manage.py migrate Operations to perform: Apply all migrations: admin, auth, contenttypes, sessions, squashme, triggerfailingcode Running migrations: Applying contenttypes.0001_initial... OK Applying auth.0001_initial... OK Applying admin.0001_initial... OK Applying admin.0002_logentry_remove_auto_add... OK Applying admin.0003_logentry_add_action_flag_choices... OK Applying contenttypes.0002_remove_content_type_name... OK Applying auth.0002_alter_permission_name_max_length... OK Applying auth.0003_alter_user_email_max_length... OK Applying auth.0004_alter_user_username_opts... OK Applying auth.0005_alter_user_last_login_null... OK Applying auth.0006_require_contenttypes_0002... OK Applying auth.0007_alter_validators_add_error_messages... OK Applying auth.0008_alter_user_username_max_length... OK Applying auth.0009_alter_user_last_name_max_length... OK Applying auth.0010_alter_group_name_max_length... OK Applying auth.0011_update_proxy_permissions... OK Applying auth.0012_alter_user_first_name_max_length... OK Applying sessions.0001_initial... OK Applying squashme.0001_initial... OK Applying squashme.0002_rename_name_foo_rename_squashed_0003_foo_another_field... OK Applying triggerfailingcode.0001_squashed_0002_baz_baz... OK (freshdjango) klaas@pop-os:~/dev/squashwithrename$ python manage.py migrate triggerfailingcode 0001_initial Operations to perform: Target specific migration: 0001_initial, from triggerfailingcode Traceback (most recent call last): File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/state.py", line 297, in rename_field found = fields.pop(old_name) KeyError: 'name' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/datacrypt/dev/squashwithrename/manage.py", line 22, in <module> main() File "/mnt/datacrypt/dev/squashwithrename/manage.py", line 18, in main execute_from_command_line(sys.argv) File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line utility.execute() File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 413, in run_from_argv self.execute(*args, **cmd_options) File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 459, in execute output = self.handle(*args, **options) File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 107, in wrapper res = handle_func(*args, **kwargs) File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 302, in handle pre_migrate_state = executor._create_project_state(with_applied_migrations=True) File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/executor.py", line 91, in _create_project_state migration.mutate_state(state, preserve=False) File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/migration.py", line 91, in mutate_state operation.state_forwards(self.app_label, new_state) File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/operations/fields.py", line 303, in state_forwards state.rename_field( File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/state.py", line 299, in rename_field raise FieldDoesNotExist( django.core.exceptions.FieldDoesNotExist: squashme.foo has no field named 'name'
comment:3 by , 3 weeks ago
The reason this was somewhat hard to reproduce cleanly is the following:
Per my original report:
the graph contains leaf nodes for both the original and squashed paths, which means that some things in the project-state-creation happen doubly, which fails.
however, there's also this code, which is the adding of a model to a project's state. That code does not bother to check whether a given model already exist, which means that the things happening doubly doesn't trigger an error if both paths start with creating a model, for the simple reason that the model is simply overwritten.
That might actually be a separate bug...
comment:4 by , 3 weeks ago
I've downloaded the project and ran the commands. It migrates without error for me on both Django 5.1 and main 🤔
Apologies, missed the second command to run, reproduced
comment:5 by , 3 weeks ago
Cc: | added |
---|---|
Summary: | migration plan in the context of squashmigrations has too many leaves → Backwards migration to replaced migration when other app has squashed migrations can lead to FieldDoesNotExist error due to incorrect state |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
Agreed this can be treated separate to #36166 assuming we want to support backwards migrating to replaced migrations
Thank you for the projects Klaas, really appreciate it ⭐
comment:6 by , 3 weeks ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:7 by , 9 days ago
Thanks for the ping on github. I agree some investigation is needed about whether & where to unset replace_migrations = False
back to True.
Have you looked into pre-walking the graph nodes to find out if any migrations are missing, so we could reload with replace_migrations = False
*before* the loop? I didn't get a chance to do a deeper dive than that.
comment:8 by , 9 days ago
I'm not sure what you mean exactly... in particular with "pre walking". Also I'm not sure what you mean with "the loop"; the whole point is that there are 2 usage locations (and that in certain conditions they need different values for replace_migrations).
I think the graph is cached (and this caching is overridden for the case I highlighted). In my mind, as long as that "clobbering" is done after the function that needs it returns, we're probably good. In my PoC I did that by storing the unclobbered result, but any other solution would do to, I think.
comment:9 by , 8 days ago
Hi Klaas. Sorry if my shorthand was not helpful. What I meant was that I tested your suggested patch against your test project and found that I still couldn't migrate backward to 0001_initial, finding instead this exception:
File "/Users/.../django/django/db/migrations/executor.py", line 229, in _migrate_all_backwards self.unapply_migration(states[migration], migration, fake=fake) ~~~~~~^^^^^^^^^^^ KeyError: <Migration triggerfailingcode.0002_baz_baz>
Because your patch (and the changes in #24900) take place inside a loop over migration targets, by "pre-walking" I wondered if we could avoid rebuilding the graph during the loop and instead just derive whatever information we need to build the graph correctly only once, but I wasn't saying I had confidence in this hypothesis, I was just asking if you had already explored it. From the PR review comments in #24900 it appears the original version was quite different and perhaps more like what I'm suggesting now. (It's hard to say since I don't have the branch anymore.)
Thanks for looking into it, I'd be glad to review a PR.
Hi Klaas, thank you for your efforts into debugging further into the migrations
I will close this until there is either a test to demonstrate undesired behavior or a project to replicate (showing an issue different to #36166)
I would avoid going too low-level when creating a ticket as we can go into this depth in PRs/ticket comments when resolving issues