Opened 2 years ago

Closed 2 years ago

#33928 closed Cleanup/optimization (fixed)

Performance issues with `on_delete=models.SET_NULL` on large tables

Reported by: Renan GEHAN Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords:
Cc: David Wobrock Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello,

I have the following models configuration:

  • Parent model
  • Child model, with a parent_id foreign key to a Parent model, set with on_delete=models.SET_NULL

Each Parent can have a lot of children, in my case roughly 30k.

I'm starting to encounter performance issues that make my jobs timeout, because the SQL queries simply timeout.

I've enabled query logging, and noticed something weird (that is certainly that way on purpose, but I don't understand why).

# Select the parent
SELECT * FROM "parent" WHERE "parent"."id" = 'parent123';

# Select all children
SELECT * FROM "children" WHERE "children"."parent_id" IN ('parent123');

# Update all children `parent_id` column to `NULL`
UPDATE "children" SET "parent_id" = NULL WHERE "children"."id" IN ('child1', 'child2', 'child3', ..., 'child30000');

# Finally delete the parent
DELETE FROM "parent" WHERE "parent"."id" IN ('parent123');

I would have expected the update condition to simply be WHERE "children"."parent_id" = 'parent123', but for some reason it isn't.

In the meantime, I'll switch to on_delete=models.CASCADE, which in my case does the trick, but I was curious about the reason why this happens in the first place.

Thanks in advance

Change History (8)

comment:1 by Simon Charette, 2 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

You are right that is an opportunity for an optimization when SET, SET_DEFAULT, or SET_NULL is used but I wonder if it's worth doing given db_on_delete support (see #21961) make things even better for this use case.

In the meantime, I'll switch to on_delete=models.CASCADE, which in my case does the trick, but I was curious about the reason why this happens in the first place.

This is likely the case because the collector is able to take the fast delete route and avoid object fetching entirely.

If you want to give a shot at a patch you'll want to have a look at Collector.collect and have it skip collection entirely when dealing with SET and friends likely by adding a branch that turns them into fast updates. One branch that worries me is the post-deletion assignment of values to in-memory instances but I can't understand why this is even necessary given all the instances that are collected for field updates are never make their way out of the collector so I would expect it to be entirely unnecessary at least all delete and delete_regress tests pass if I entirely remove it

  • django/db/models/deletion.py

    diff --git a/django/db/models/deletion.py b/django/db/models/deletion.py
    index 2cb3c88444..2eb8e95281 100644
    a b def delete(self):  
    496496                            using=self.using,
    497497                            origin=self.origin,
    498498                        )
    499 
    500         # update collected instances
    501         for instances_for_fieldvalues in self.field_updates.values():
    502             for (field, value), instances in instances_for_fieldvalues.items():
    503                 for obj in instances:
    504                     setattr(obj, field.attname, value)
    505499        for model, instances in self.data.items():
    506500            for instance in instances:
    507501                setattr(instance, model._meta.pk.attname, None)

You'll want to make sure to avoid breaking the admin's collector subclass used to display deletion confirmation pages but from a quick look it doesn't seem to care about field updates.

Tentatively accepting but I think we should revisit when #21961 lands.

Last edited 2 years ago by Simon Charette (previous) (diff)

comment:2 by Simon Charette, 2 years ago

Has patch: set

Renan, lmk if this PR is what you were expecting.

comment:3 by Mariusz Felisiak, 2 years ago

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:4 by David Wobrock, 2 years ago

Cc: David Wobrock added

comment:5 by Mariusz Felisiak, 2 years ago

Patch needs improvement: set

comment:6 by Mariusz Felisiak, 2 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In a9be1dc5:

Refs #33928 -- Removed unnecessary attribute assignment on on-delete updates.

Model instances retrieved for bulk field update purposes are not exposed
to the outside world and thus are not required to be kept update to
date.

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In 0701bb8e:

Fixed #33928 -- Avoided unnecessary queries when cascade updating.

Models that use SET, SET_NULL, and SET_DEFAULT as on_delete handler
don't have to fetch objects for the sole purpose of passing them back to
a follow up UPDATE query filtered by the retrieved objects primary key.

This was achieved by flagging SET handlers as _lazy_ and having the
collector logic defer object collections until the last minute. This
should ensure that the rare cases where custom on_delete handlers are
defined remain uncalled when when dealing with an empty collection of
instances.

This reduces the number queries required to apply SET handlers from
2 to 1 where the remaining UPDATE use the same predicate as the non
removed SELECT query.

In a lot of ways this is similar to the fast-delete optimization that
was added in #18676 but for updates this time. The conditions only
happen to be simpler in this case because SET handlers are always
terminal. They never cascade to more deletes that can be combined.

Thanks Renan GEHAN for the report.

Note: See TracTickets for help on using tickets.
Back to Top