Opened 3 years ago

Last modified 4 months ago

#33586 new Bug

Cannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom user.

Reported by: Jeremy Poulin Owned by:
Component: Migrations Version: 4.0
Severity: Normal Keywords:
Cc: Simon Charette, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Jeremy Poulin)

I've created a reproducer app:
https://github.com/jaypoulz/rmigration_bug

After installing dependencies, run the repro.sh script:
https://github.com/jaypoulz/rmigration_bug/blob/main/repro.sh

All this script does is clear the DB, run the initial migrations (which will fail because no user exists yet), create a dummy superuser, re-run the migration so that it completes, and then run a reverse migration.

Object A has a single foreign key to a custom User object.
Object B has a foreign key (many to one) relationships to object A. This object never needs to be instantiated, and in fact, only exists to trigger a mismatch in 'Fake' model comparison during the delete call.

https://github.com/jaypoulz/rmigration_bug/blob/main/rmigrate/migrations/0002_create_sample_objects.py#L15

ValueError: Cannot query "A object (1)": Must be "A" instance.

This appears to be because delete first checks if object A is referenced by any models that have a CASCADE on delete relationship, and discovers object B. It then compares the model identifier of object B's reference to object A with the instance type of the object that is going to be deleted (also object A).

For some reason, these identifiers do not match.

In other words, Even though there are no instances of B, A cannot be deleted because the model type returned by A via model lookup is not the same as the model type referenced by B.

For some reason, this only occurs when you have a custom User model.

If you comment out the User model, the issue no longer occurs. I've created a branch to prove this as well:
https://github.com/jaypoulz/rmigration_bug/compare/no-custom-user

Change History (28)

comment:1 by Jeremy Poulin, 3 years ago

Description: modified (diff)
Summary: Cannot delete object with a foreign key dependency if the object also has a foreign key to a custom User modelCannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom User model

comment:2 by Jeremy Poulin, 3 years ago

Description: modified (diff)

comment:3 by Mariusz Felisiak, 3 years ago

Summary: Cannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom User modelCannot delete object (A) referenced by another object (B) if said object (A) has a foreign key to a custom user.
Triage Stage: UnreviewedAccepted

Thanks for the detailed report. I was able to reproduce this issue.

It seems that check_rel_lookup_compatibility(model, target_opts, field) fails because target_opts doesn't contain a reverse descriptor b_set.

comment:4 by Simon Charette, 3 years ago

Cc: Simon Charette added

The following patch to the reproduction project demonstrates the core of the issue

  • rmigrate/migrations/0002_create_sample_objects.py

    diff --git a/rmigrate/migrations/0002_create_sample_objects.py b/rmigrate/migrations/0002_create_sample_objects.py
    index 9a17299..efd6cbe 100644
    a b  
    33
    44def up(apps, schema_editor):
    55    A = apps.get_model('rmigrate', 'A')
     6    B = apps.get_model('rmigrate', 'B')
    67    User = apps.get_model('rmigrate', 'User')
    7 
     8    assert A._meta.apps is apps
     9    assert B._meta.apps is apps
     10    assert User._meta.apps is apps
    811    a = A(user=User.objects.first())
    912    a.save()
    1013
    1114def down(apps, schema_editor):
    1215    A = apps.get_model('rmigrate', 'A')
    13     print(A.objects.all())
    14 
     16    B = apps.get_model('rmigrate', 'B')
     17    User = apps.get_model('rmigrate', 'User')
     18    assert A._meta.apps is apps
     19    assert B._meta.apps is apps
     20    assert User._meta.apps is apps
    1521    A.objects.first().delete()
    1622    print(A.objects.all())

The apps that are attached to the modesl is simply wrong when the migration is applied backward. It seems the .apps cache invalidation logic in MigrationExecutor._migrate_all_backwards and its subsequent call doesn't account for a particular case that still needs to be determined.

The following patch that invalidates the apps cache before unapplying migrations demonstrates that

  • django/db/migrations/executor.py

    diff --git a/django/db/migrations/executor.py b/django/db/migrations/executor.py
    index eb738cf457..b0e0bef0d8 100644
    a b def _migrate_all_backwards(self, plan, full_plan, fake):  
    216216            self.progress_callback("render_success")
    217217
    218218        for migration, _ in plan:
     219            states[migration].__dict__.pop('apps', None)
    219220            self.unapply_migration(states[migration], migration, fake=fake)
    220221            applied_migrations.remove(migration)

By avoiding the reuse of cached apps and forcing their re-creation at the expense of slower application time the issue cannot be reproduced anymore.

Last edited 3 years ago by Simon Charette (previous) (diff)

comment:5 by Bhuvnesh, 2 years ago

Owner: changed from nobody to Bhuvnesh
Status: newassigned

comment:6 by Bhuvnesh, 2 years ago

This issue is still open, right?

comment:7 by Jeremy Poulin, 2 years ago

I haven't tested it since opening the issue, but I assume the problem is still relevant. I provided a reproducer dummy app in the comments.

Happy to help test the final results. :)

comment:8 by Bhuvnesh, 2 years ago

I was also able to reproduce the issue and also tried the solution proposed by simon and it works perfectly. But as he said, it is going to cost it the execution time. So should I submit a patch or not?

comment:9 by Jeremy Poulin, 2 years ago

My interpretation of Simon's work is that the patch in question would address the deficiency in the .apps cache invalidation logic.

The apps that are attached to the modesl is simply wrong when the migration is applied backward. It seems the .apps cache invalidation logic in MigrationExecutor._migrate_all_backwards and its subsequent call doesn't account for a particular case that still needs to be determined.

While the workaround provided does allow us to move forward, I don't think submitting it as a fix is the best way forward. I would liken it to putting a poster up to cover a hole in the wall. It gets the job done, but masks the root cause.

Last edited 2 years ago by Jeremy Poulin (previous) (diff)

comment:10 by Simon Charette, 2 years ago

Jeremy is right.

We should try to figure out what is wrong in our caching strategy and address the identified issue(s); disabling caching was more of a way to demonstrate there is an issue with it than an actual solution.

comment:11 by Bhuvnesh, 2 years ago

oh okay! Looks like its time to do some brainstorming.

comment:12 by Bhuvnesh, 2 years ago

These lines were loading apps when the migration is applied backward. So instead of popping, simply not loading them inside MigrationExecutor._migrate_all_backwards should do the work.
But i think this is also not the correct solution.

  • django/db/migrations/executor.py

    diff --git a/django/db/migrations/executor.py b/django/db/migrations/executor.py
    index eb738cf457..988fb48cd9 100644
    a b class MigrationExecutor:  
    200200                # process.
    201201                break
    202202            if migration in migrations_to_run:
    203                 if "apps" not in state.__dict__:
    204                     state.apps  # Render all -- performance critical
    205203                # The state before this migration
    206204                states[migration] = state
    207205                # The old state keeps as-is, we continue with the new state
Last edited 2 years ago by Bhuvnesh (previous) (diff)

in reply to:  4 comment:13 by Bhuvnesh, 2 years ago

Replying to Simon Charette:

By avoiding the reuse of cached apps and forcing their re-creation at the expense of slower application time the issue cannot be reproduced anymore.

Hey simon , can you please explain where are we forcing their re-creation?

comment:14 by Bhuvnesh, 2 years ago

Proposed solution: Loading apps while generating post migration state.

  • django/db/migrations/executor.py

    diff --git a/django/db/migrations/executor.py b/django/db/migrations/executor.py
    index eb738cf457..7f1f6bd1cc 100644
    a b class MigrationExecutor:  
    200200                # process.
    201201                break
    202202            if migration in migrations_to_run:
    203                 if "apps" not in state.__dict__:
    204                     state.apps  # Render all -- performance critical
    205203                # The state before this migration
    206204                states[migration] = state
    207205                # The old state keeps as-is, we continue with the new state
    class MigrationExecutor:  
    227225        for index, (migration, _) in enumerate(full_plan):
    228226            if migration == last_unapplied_migration:
    229227                for migration, _ in full_plan[index:]:
     228                    if "apps" not in state.__dict__:
     229                        state.apps  # Render all -- performance critical
    230230                    if migration in applied_migrations:
    231231                        migration.mutate_state(state, preserve=False)
    232232                break

Ps: passing all tests.

Version 0, edited 2 years ago by Bhuvnesh (next)

comment:15 by Simon Charette, 2 years ago

These lines were loading apps when the migration is applied backward.

Effectively, it's exactly what they were meant to do for performance reasons so unless we can demonstrate

As the comment references to this use to be done there for performance reason. You can find more about the rationale by using git blame in the area surrounding the bits of codes you are proposing to alter.

It's a great sign if the test suite still passes but it's not enough to merge this change. You should be able to demonstrate and explain why this change is needed by writing a minimal test case from the issue reported by Jeremy that can be integrated to the suite and demonstrates the origin of the issue. From there we'd need numbers on the effect it has has on performance to determine if that's the right place to make the changes.

This might seem daunting but solving issues in this area of the code base is inherently complex as a lot of time went into making sure it performs reasonably well. The good news is that since 5aa55038ca9ac44b440b56d1fc4e79c876e51393 was merged a lot of efforts went into making sure non of the core operations that Django provides relies on state.apps in their state_forwards which makes the layer in charge of emulating state_backwards way faster so there are reasons to believe that this eager rendering of models might not be necessary anymore until the plan is fully applied.

It is still required for most database_backwards operations though (see #29898) as well as post_migrate signals emission so I get a feeling that avoiding the eager rendering as your patch suggests doing will do more harm than good.

My gut tells me that we're failing do some cache invalidation during one of the state_forwards operation and that this where the issue should be solved.

comment:16 by Bhuvnesh, 2 years ago

you were right, the problem was inside RunPython.state_forwards as it is not mutating any state . but it looks like like we need to reload apps cache before running backwards migration after forward migration.

def state_forwards(self, app_label, state):
        # RunPython objects have no state effect. To add some, combine this
        # with SeparateDatabaseAndState.
        pass

one thing we can do is clear apps cache inside database_backwards for RunPython Operation. This should also not affect the performance much.

  • django/db/migrations/operations/special.py

    diff --git a/django/db/migrations/operations/special.py b/django/db/migrations/operations/special.py
    index 94a6ec72de..6acdfb6cb3 100644
    a b class RunPython(Operation):  
    193193            self.code(from_state.apps, schema_editor)
    194194
    195195    def database_backwards(self, app_label, schema_editor, from_state, to_state):
     196        from_state.clear_delayed_apps_cache()
    196197        if self.reverse_code is None:
    197198            raise NotImplementedError("You cannot reverse this operation")
    198199        if router.allow_migrate(

Ps: Passing all tests.

Last edited 2 years ago by Bhuvnesh (previous) (diff)

comment:17 by Bhuvnesh, 2 years ago

Do i need to write regression test also for this?

comment:19 by Simon Charette, 2 years ago

Has patch: set
Needs tests: set
Patch needs improvement: set

you were right, the problem was inside RunPython.state_forwards as it is not mutating any state

Sorry for not being clear but I think you've misinterpreted me here, I never stated the issue was in RunPython.state_forwards and it's expected that it doesn't mutate state as it's an operation that is meant to be used for data migrations. The symptoms of the issue, corruption of project state, can be observed in RunPython.state_forwards (and it's resulting call in the function provided to RunPython initialization as reported in this ticket) but it's almost certainly not its cause. In other words, RunPython.state_forwards cannot be the origin of the state corruption because it doesn't mutate it.

one thing we can do is clear apps cache inside database_backwards for RunPython Operation. This should also not affect the performance much.

Do you have any numbers to support your performance claim? This is important as you are forcing the eviction of a cache layer dedicated to making performance better.

Ps: Passing all tests.

That's expected as explained in comment:15 but it cannot be intepreted as the proper solution to the problem.

Do i need to write regression test also for this?

Absolutely! You need to demonstrate what caused the app registry to get corrupted in the first place and not only hide its side effects; that's the crux of the issue.

Even by putting the performance regression aspect aside I don't think the proposed solution is correct. Rendered models are also provided to post_migrate signal receivers which means that it could be corrupted there as well and the solution here is not to clear the cache at this location as well as it goes against the whole reason to be of the cache layer (why should it exists if we need to clear it everytime we need to rely on it).

Last edited 2 years ago by Simon Charette (previous) (diff)

comment:20 by Bhuvnesh, 2 years ago

Most probably this error is caused due mutate_state .
In the above issue for forward migration (Up) the concrete model for instance of model A is

{'__module__': '__fake__', '__doc__': 'A(id, user)', '_meta': <Options for A>, 'DoesNotExist': <class '__fake__.A.DoesNotExist'>, 'MultipleObjectsReturned': <class '__fake__.A.MultipleObjectsReturned'>, 'id': <django.db.models.query_utils.DeferredAttribute object at 0x000002554F5E8A60>, 'user_id': <django.db.models.fields.related_descriptors.ForeignKeyDeferredAttribute object at 0x000002554F5E8A00>, 'user': <django.db.models.fields.related_descriptors.ForwardManyToOneDescriptor object at 0x000002554F5E89A0>, 'objects': <django.db.models.manager.ManagerDescriptor object at 0x000002554F5E88B0>, 'b_set': <django.db.models.fields.related_descriptors.ReverseManyToOneDescriptor object at 0x000002554F527D00>}

For Reverse migration (down ) the concrete model for instance of model A is

{'__module__': '__fake__', '__doc__': 'A(id, user)', '_meta': <Options for A>, 'DoesNotExist': <class '__fake__.A.DoesNotExist'>, 'MultipleObjectsReturned': <class '__fake__.A.MultipleObjectsReturned'>, 'id': <django.db.models.query_utils.DeferredAttribute object at 0x000002053C944550>, 'user_id': <django.db.models.fields.related_descriptors.ForeignKeyDeferredAttribute object at 0x000002053C9445B0>, 'user': <django.db.models.fields.related_descriptors.ForwardManyToOneDescriptor object at 0x000002053C9441C0>, 'objects': <django.db.models.manager.ManagerDescriptor object at 0x000002053C945120>}

Due to this difference check_rel_lookup_compatibility is returning False.

Edit:
So the real cause of this error is due to preventing rendering of related models up to the point where it's necessary (just before RunPython) in #27666 . Also it was mentioned here that if someone wants to work with on model classes or model instances, they must render model states like from_state.clear_delayed_apps_cache() to make
related models available.
A ticket(#27737) is also available stating that changes made in #27666 might cause error.

Edit2:
Can we not just add state.clear_delayed_apps_cache() inside state_forwards() for RunPython operation? The rendering of related models will still be delayed until RunPython operation, but we'll have access to models and instances for both forward and backward migrations. Currently we are clearing delayed apps cache inside database_forwards() .

Last edited 2 years ago by Bhuvnesh (previous) (diff)

comment:21 by Bhuvnesh, 2 years ago

Can someone please provide some feedback on this? Thanks.

comment:22 by Carlton Gibson, 2 years ago

Reading the issue, you've not addressed the points in https://code.djangoproject.com/ticket/33586#comment:19 (and previous to that).

Do you have any numbers to support your performance claim? This is important as you are forcing the eviction of a cache layer dedicated to making performance better.

Creating some benchmarks (using pyperf perhaps) looks like a step forward.

From comment:15

You should be able to demonstrate and explain why this change is needed by writing a minimal test case from the issue reported by Jeremy that can be integrated to the suite and demonstrates the origin of the issue. From there we'd need numbers on the effect it has has on performance to determine if that's the right place to make the changes.

So there's a couple of avenues already, but there were other points made.

comment:23 by Bhuvnesh, 2 years ago

Owner: Bhuvnesh removed
Status: assignednew

comment:24 by madhuri2, 21 months ago

Is this issue fixed?
I was working on this issue and I found this error ValueError: Cannot query "A object(1)": Must be "A" instance.

Then I tried this
I removed the migrations file.
Then applied migrations.
Unapplied all migrations using: python manage.py migrate rmigrate zero
and it worked!

(venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py makemigrations
←[36;1mMigrations for 'rmigrate':←[0m
  ←[1mrmigrate\migrations\0001_initial.py←[0m
    - Create model User
    - Create model A
    - Create model B


(venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate
←[36;1mOperations to perform:←[0m
←[1m  Apply all migrations: ←[0madmin, auth, contenttypes, rmigrate, sessions
←[36;1mRunning migrations:←[0m
  No migrations to apply.


(venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate rmigrate zero
←[36;1mOperations to perform:←[0m
←[1m  Unapply all migrations: ←[0mrmigrate
←[36;1mRunning migrations:←[0m
  Rendering model states...←[32;1m DONE←[0m
  Unapplying admin.0003_logentry_add_action_flag_choices...←[32;1m OK←[0m
  Unapplying admin.0002_logentry_remove_auto_add...←[32;1m OK←[0m
  Unapplying admin.0001_initial...←[32;1m OK←[0m
  Unapplying rmigrate.0001_initial...←[32;1m OK←[0m



(venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate
←[36;1mOperations to perform:←[0m
←[1m  Apply all migrations: ←[0madmin, auth, contenttypes, rmigrate, sessions
←[36;1mRunning migrations:←[0m
  Applying rmigrate.0001_initial...←[32;1m OK←[0m
  Applying admin.0001_initial...←[32;1m OK←[0m
  Applying admin.0002_logentry_remove_auto_add...←[32;1m OK←[0m
  Applying admin.0003_logentry_add_action_flag_choices...←[32;1m OK←[0m   (venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py makemigrations
No changes detected


(venv) PS C:\Users\Madhu\Desktop\Django\ticket33586\rmigration_bug> python manage.py migrate
←[36;1mOperations to perform:←[0m
←[1m  Apply all migrations: ←[0madmin, auth, contenttypes, rmigrate, sessions
←[36;1mRunning migrations:←[0m
  No migrations to apply.

comment:25 by Mariusz Felisiak, 21 months ago

Is this issue fixed?

No.

in reply to:  25 comment:26 by madhuri2, 21 months ago

Replying to Mariusz Felisiak:

Is this issue fixed?

No.

okay, may I know how this file 0002_create_sample_objects.py is created during the migration

comment:27 by Ülgen Sarıkavak, 8 months ago

Cc: Ülgen Sarıkavak added

comment:28 by Timothy Schilling, 4 months ago

Is it possible that migration_plan is returning the wrong plan for backwards migrations? As per my investigation here it appears that when the migration to be unapplied is encountered, that some of the other migrations hadn't applied their state.

I think it makes sense to start with all apps having their forward state applied before storing the project state for a migration to be unapplied.

Note: See TracTickets for help on using tickets.
Back to Top