Opened 3 years ago
Last modified 4 months ago
#33586 new Bug
Cannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom user.
Reported by: | Jeremy Poulin | Owned by: | |
---|---|---|---|
Component: | Migrations | Version: | 4.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I've created a reproducer app:
https://github.com/jaypoulz/rmigration_bug
After installing dependencies, run the repro.sh script:
https://github.com/jaypoulz/rmigration_bug/blob/main/repro.sh
All this script does is clear the DB, run the initial migrations (which will fail because no user exists yet), create a dummy superuser, re-run the migration so that it completes, and then run a reverse migration.
Object A has a single foreign key to a custom User object.
Object B has a foreign key (many to one) relationships to object A. This object never needs to be instantiated, and in fact, only exists to trigger a mismatch in 'Fake' model comparison during the delete call.
ValueError: Cannot query "A object (1)": Must be "A" instance.
This appears to be because delete first checks if object A is referenced by any models that have a CASCADE on delete relationship, and discovers object B. It then compares the model identifier of object B's reference to object A with the instance type of the object that is going to be deleted (also object A).
For some reason, these identifiers do not match.
In other words, Even though there are no instances of B, A cannot be deleted because the model type returned by A via model lookup is not the same as the model type referenced by B.
For some reason, this only occurs when you have a custom User model.
If you comment out the User model, the issue no longer occurs. I've created a branch to prove this as well:
https://github.com/jaypoulz/rmigration_bug/compare/no-custom-user
Change History (28)
comment:1 by , 3 years ago
Description: | modified (diff) |
---|---|
Summary: | Cannot delete object with a foreign key dependency if the object also has a foreign key to a custom User model → Cannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom User model |
comment:2 by , 3 years ago
Description: | modified (diff) |
---|
comment:3 by , 3 years ago
Summary: | Cannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom User model → Cannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom user. |
---|---|
Triage Stage: | Unreviewed → Accepted |
follow-up: 13 comment:4 by , 3 years ago
Cc: | added |
---|
The following patch to the reproduction project demonstrates the core of the issue
-
rmigrate/migrations/0002_create_sample_objects.py
diff --git a/rmigrate/migrations/0002_create_sample_objects.py b/rmigrate/migrations/0002_create_sample_objects.py index 9a17299..efd6cbe 100644
a b 3 3 4 4 def up(apps, schema_editor): 5 5 A = apps.get_model('rmigrate', 'A') 6 B = apps.get_model('rmigrate', 'B') 6 7 User = apps.get_model('rmigrate', 'User') 7 8 assert A._meta.apps is apps 9 assert B._meta.apps is apps 10 assert User._meta.apps is apps 8 11 a = A(user=User.objects.first()) 9 12 a.save() 10 13 11 14 def down(apps, schema_editor): 12 15 A = apps.get_model('rmigrate', 'A') 13 print(A.objects.all()) 14 16 B = apps.get_model('rmigrate', 'B') 17 User = apps.get_model('rmigrate', 'User') 18 assert A._meta.apps is apps 19 assert B._meta.apps is apps 20 assert User._meta.apps is apps 15 21 A.objects.first().delete() 16 22 print(A.objects.all())
The apps
that are attached to the modesl is simply wrong when the migration is applied backward. It seems the .apps
cache invalidation logic in MigrationExecutor._migrate_all_backwards
and its subsequent call doesn't account for a particular case that still needs to be determined.
The following patch that invalidates the apps
cache before unapplying migrations demonstrates that
-
django/db/migrations/executor.py
diff --git a/django/db/migrations/executor.py b/django/db/migrations/executor.py index eb738cf457..b0e0bef0d8 100644
a b def _migrate_all_backwards(self, plan, full_plan, fake): 216 216 self.progress_callback("render_success") 217 217 218 218 for migration, _ in plan: 219 states[migration].__dict__.pop('apps', None) 219 220 self.unapply_migration(states[migration], migration, fake=fake) 220 221 applied_migrations.remove(migration)
By avoiding the reuse of cached apps and forcing their re-creation at the expense of slower application time the issue cannot be reproduced anymore.
comment:5 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 2 years ago
I haven't tested it since opening the issue, but I assume the problem is still relevant. I provided a reproducer dummy app in the comments.
Happy to help test the final results. :)
comment:8 by , 2 years ago
I was also able to reproduce the issue and also tried the solution proposed by simon and it works perfectly. But as he said, it is going to cost it the execution time. So should I submit a patch or not?
comment:9 by , 2 years ago
My interpretation of Simon's work is that the patch in question would address the deficiency in the .apps cache invalidation logic.
The apps that are attached to the modesl is simply wrong when the migration is applied backward. It seems the .apps cache invalidation logic in MigrationExecutor._migrate_all_backwards and its subsequent call doesn't account for a particular case that still needs to be determined.
While the workaround provided does allow us to move forward, I don't think submitting it as a fix is the best way forward. I would liken it to putting a poster up to cover a hole in the wall. It gets the job done, but masks the root cause.
comment:10 by , 2 years ago
Jeremy is right.
We should try to figure out what is wrong in our caching strategy and address the identified issue(s); disabling caching was more of a way to demonstrate there is an issue with it than an actual solution.
comment:12 by , 2 years ago
These lines were loading apps
when the migration is applied backward. So instead of popping, simply not loading them inside MigrationExecutor._migrate_all_backwards
should do the work.
But i think this is also not the correct solution.
-
django/db/migrations/executor.py
diff --git a/django/db/migrations/executor.py b/django/db/migrations/executor.py index eb738cf457..988fb48cd9 100644
a b class MigrationExecutor: 200 200 # process. 201 201 break 202 202 if migration in migrations_to_run: 203 if "apps" not in state.__dict__:204 state.apps # Render all -- performance critical205 203 # The state before this migration 206 204 states[migration] = state 207 205 # The old state keeps as-is, we continue with the new state
comment:13 by , 2 years ago
Replying to Simon Charette:
By avoiding the reuse of cached apps and forcing their re-creation at the expense of slower application time the issue cannot be reproduced anymore.
Hey simon , can you please explain where are we forcing their re-creation?
comment:14 by , 2 years ago
Proposed solution: Loading apps while generating post migration state inside _migrate_all_backwards
.
-
django/db/migrations/executor.py
diff --git a/django/db/migrations/executor.py b/django/db/migrations/executor.py index eb738cf457..7f1f6bd1cc 100644
a b class MigrationExecutor: 200 200 # process. 201 201 break 202 202 if migration in migrations_to_run: 203 if "apps" not in state.__dict__:204 state.apps # Render all -- performance critical205 203 # The state before this migration 206 204 states[migration] = state 207 205 # The old state keeps as-is, we continue with the new state … … class MigrationExecutor: 227 225 for index, (migration, _) in enumerate(full_plan): 228 226 if migration == last_unapplied_migration: 229 227 for migration, _ in full_plan[index:]: 228 if "apps" not in state.__dict__: 229 state.apps # Render all -- performance critical 230 230 if migration in applied_migrations: 231 231 migration.mutate_state(state, preserve=False) 232 232 break
Ps: passing all tests.
comment:15 by , 2 years ago
These lines were loading apps when the migration is applied backward.
Effectively, it's exactly what they were meant to do for performance reasons so unless we can demonstrate
As the comment references to this use to be done there for performance reason. You can find more about the rationale by using git blame
in the area surrounding the bits of codes you are proposing to alter.
It's a great sign if the test suite still passes but it's not enough to merge this change. You should be able to demonstrate and explain why this change is needed by writing a minimal test case from the issue reported by Jeremy that can be integrated to the suite and demonstrates the origin of the issue. From there we'd need numbers on the effect it has has on performance to determine if that's the right place to make the changes.
This might seem daunting but solving issues in this area of the code base is inherently complex as a lot of time went into making sure it performs reasonably well. The good news is that since 5aa55038ca9ac44b440b56d1fc4e79c876e51393 was merged a lot of efforts went into making sure non of the core operations that Django provides relies on state.apps
in their state_forwards
which makes the layer in charge of emulating state_backwards
way faster so there are reasons to believe that this eager rendering of models might not be necessary anymore until the plan is fully applied.
It is still required for most database_backwards
operations though (see #29898) as well as post_migrate
signals emission so I get a feeling that avoiding the eager rendering as your patch suggests doing will do more harm than good.
My gut tells me that we're failing do some cache invalidation during one of the state_forwards
operation and that this where the issue should be solved.
comment:16 by , 2 years ago
you were right, the problem was inside RunPython.state_forwards
as it is not mutating any state . but it looks like like we need to reload apps cache before running backwards migration after forward migration.
def state_forwards(self, app_label, state): # RunPython objects have no state effect. To add some, combine this # with SeparateDatabaseAndState. pass
one thing we can do is clear apps cache inside database_backwards
for RunPython Operation. This should also not affect the performance much.
-
django/db/migrations/operations/special.py
diff --git a/django/db/migrations/operations/special.py b/django/db/migrations/operations/special.py index 94a6ec72de..6acdfb6cb3 100644
a b class RunPython(Operation): 193 193 self.code(from_state.apps, schema_editor) 194 194 195 195 def database_backwards(self, app_label, schema_editor, from_state, to_state): 196 from_state.clear_delayed_apps_cache() 196 197 if self.reverse_code is None: 197 198 raise NotImplementedError("You cannot reverse this operation") 198 199 if router.allow_migrate(
Ps: Passing all tests.
comment:19 by , 2 years ago
Has patch: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
you were right, the problem was inside
RunPython.state_forwards
as it is not mutating any state
Sorry for not being clear but I think you've misinterpreted me here, I never stated the issue was in RunPython.state_forwards
and it's expected that it doesn't mutate state as it's an operation that is meant to be used for data migrations. The symptoms of the issue, corruption of project state, can be observed in RunPython.state_forwards
(and it's resulting call in the function provided to RunPython
initialization as reported in this ticket) but it's almost certainly not its cause. In other words, RunPython.state_forwards
cannot be the origin of the state corruption because it doesn't mutate it.
one thing we can do is clear apps cache inside database_backwards for RunPython Operation. This should also not affect the performance much.
Do you have any numbers to support your performance claim? This is important as you are forcing the eviction of a cache layer dedicated to making performance better.
Ps: Passing all tests.
That's expected as explained in comment:15 but it cannot be intepreted as the proper solution to the problem.
Do i need to write regression test also for this?
Absolutely! You need to demonstrate what caused the app registry to get corrupted in the first place and not only hide its side effects; that's the crux of the issue.
Even by putting the performance regression aspect aside I don't think the proposed solution is correct. Rendered models are also provided to post_migrate
signal receivers which means that it could be corrupted there as well and the solution here is not to clear the cache at this location as well as it goes against the whole reason to be of the cache layer (why should it exists if we need to clear it everytime we need to rely on it).
comment:20 by , 2 years ago
Most probably this error is caused due mutate_state
.
In the above issue for forward migration (Up) the concrete model for instance of model A is
{'__module__': '__fake__', '__doc__': 'A(id, user)', '_meta': <Options for A>, 'DoesNotExist': <class '__fake__.A.DoesNotExist'>, 'MultipleObjectsReturned': <class '__fake__.A.MultipleObjectsReturned'>, 'id': <django.db.models.query_utils.DeferredAttribute object at 0x000002554F5E8A60>, 'user_id': <django.db.models.fields.related_descriptors.ForeignKeyDeferredAttribute object at 0x000002554F5E8A00>, 'user': <django.db.models.fields.related_descriptors.ForwardManyToOneDescriptor object at 0x000002554F5E89A0>, 'objects': <django.db.models.manager.ManagerDescriptor object at 0x000002554F5E88B0>, 'b_set': <django.db.models.fields.related_descriptors.ReverseManyToOneDescriptor object at 0x000002554F527D00>}
For Reverse migration (down ) the concrete model for instance of model A is
{'__module__': '__fake__', '__doc__': 'A(id, user)', '_meta': <Options for A>, 'DoesNotExist': <class '__fake__.A.DoesNotExist'>, 'MultipleObjectsReturned': <class '__fake__.A.MultipleObjectsReturned'>, 'id': <django.db.models.query_utils.DeferredAttribute object at 0x000002053C944550>, 'user_id': <django.db.models.fields.related_descriptors.ForeignKeyDeferredAttribute object at 0x000002053C9445B0>, 'user': <django.db.models.fields.related_descriptors.ForwardManyToOneDescriptor object at 0x000002053C9441C0>, 'objects': <django.db.models.manager.ManagerDescriptor object at 0x000002053C945120>}
Due to this difference check_rel_lookup_compatibility
is returning False.
Edit:
So the real cause of this error is due to preventing rendering of related models up to the point where it's necessary (just before RunPython) in #27666 . Also it was mentioned here that if someone wants to work with on model classes or model instances, they must render model states like from_state.clear_delayed_apps_cache()
to make
related models available.
A ticket(#27737) is also available stating that changes made in #27666 might cause error.
comment:22 by , 2 years ago
Reading the issue, you've not addressed the points in https://code.djangoproject.com/ticket/33586#comment:19 (and previous to that).
Do you have any numbers to support your performance claim? This is important as you are forcing the eviction of a cache layer dedicated to making performance better.
Creating some benchmarks (using pyperf perhaps) looks like a step forward.
From comment:15
You should be able to demonstrate and explain why this change is needed by writing a minimal test case from the issue reported by Jeremy that can be integrated to the suite and demonstrates the origin of the issue. From there we'd need numbers on the effect it has has on performance to determine if that's the right place to make the changes.
So there's a couple of avenues already, but there were other points made.
comment:23 by , 2 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:24 by , 21 months ago
Is this issue fixed?
I was working on this issue and I found this error ValueError: Cannot query "A object(1)": Must be "A" instance.
Then I tried this
I removed the migrations file.
Then applied migrations.
Unapplied all migrations using: python manage.py migrate rmigrate zero
and it worked!
(venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py makemigrations ←[36;1mMigrations for 'rmigrate':←[0m ←[1mrmigrate\migrations\0001_initial.py←[0m - Create model User - Create model A - Create model B (venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate ←[36;1mOperations to perform:←[0m ←[1m Apply all migrations: ←[0madmin, auth, contenttypes, rmigrate, sessions ←[36;1mRunning migrations:←[0m No migrations to apply. (venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate rmigrate zero ←[36;1mOperations to perform:←[0m ←[1m Unapply all migrations: ←[0mrmigrate ←[36;1mRunning migrations:←[0m Rendering model states...←[32;1m DONE←[0m Unapplying admin.0003_logentry_add_action_flag_choices...←[32;1m OK←[0m Unapplying admin.0002_logentry_remove_auto_add...←[32;1m OK←[0m Unapplying admin.0001_initial...←[32;1m OK←[0m Unapplying rmigrate.0001_initial...←[32;1m OK←[0m (venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate ←[36;1mOperations to perform:←[0m ←[1m Apply all migrations: ←[0madmin, auth, contenttypes, rmigrate, sessions ←[36;1mRunning migrations:←[0m Applying rmigrate.0001_initial...←[32;1m OK←[0m Applying admin.0001_initial...←[32;1m OK←[0m Applying admin.0002_logentry_remove_auto_add...←[32;1m OK←[0m Applying admin.0003_logentry_add_action_flag_choices...←[32;1m OK←[0m (venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py makemigrations No changes detected (venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate ←[36;1mOperations to perform:←[0m ←[1m Apply all migrations: ←[0madmin, auth, contenttypes, rmigrate, sessions ←[36;1mRunning migrations:←[0m No migrations to apply.
comment:26 by , 21 months ago
Replying to Mariusz Felisiak:
Is this issue fixed?
No.
okay, may I know how this file 0002_create_sample_objects.py is created during the migration
comment:27 by , 8 months ago
Cc: | added |
---|
comment:28 by , 4 months ago
Is it possible that migration_plan
is returning the wrong plan for backwards migrations? As per my investigation here it appears that when the migration to be unapplied is encountered, that some of the other migrations hadn't applied their state.
I think it makes sense to start with all apps having their forward state applied before storing the project state for a migration to be unapplied.
Thanks for the detailed report. I was able to reproduce this issue.
It seems that
check_rel_lookup_compatibility(model, target_opts, field)
fails becausetarget_opts
doesn't contain a reverse descriptorb_set
.