Opened 6 years ago

Closed 6 years ago

#29568 closed Cleanup/optimization (fixed)

Avoid trying to UPDATE a parent model when its child just got INSERT

Reported by: François Dupayrat Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by François Dupayrat)

Hello,

While sorting through queries, I noticed something unusual: let's say you have non-abstract models inheritance:

class Parent(models.Model):
  parent_field = models.BooleanField()

class DirectChild(Parent):
  direct_child_field = models.BooleanField()

class IndirectChild(DirectChild):
  indirect_child_field = models.BooleanField()

When trying to create IndirectChild, I would expect 3 queries to be done:

  • INSERT Parent
  • INSERT DirectChild
  • INSERT IndirectChild

Instead, since they all share the same PK, when trying to save a IndirectChild not yet persisted to DB, 5 queries are done:

  • INSERT Parent
  • UPDATE DirectChild (since it has a PK)
  • INSERT DirectChild (because previous UPDATE failed)
  • UPDATE IndirectChild (since it has a PK)
  • INSERT IndirectChild (because previous UPDATE failed)

Note: when trying to use IndirectChild.objects.create, only 4 queries are made, since QuerySet.create use force_insert=True (thanks to Tim Graham)

I found a fix that worked for me by modifying _save_parents and save_base in django/db/models/base.py: _save_parents return if something was inserted (default to fAlse if there is no parent), and force_insert if the parent was inserted in both methods (because if there parent was inserted, the child can't possibly exist).

Being a beginner, I'm really wary of breaking something. Could someone if it's something wanted by Django and check if there is obvious mistake?
Here is the modified method (without irrelevant code, materialized as [...]):

    def _save_parents([...]):
        [...]
        inserted = False
        for parent, field in meta.parents.items():
            [...]
            parent_inserted = self._save_parents(cls=parent, using=using, update_fields=update_fields)
            updated = self._save_table(cls=parent, using=using, update_fields=update_fields, force_insert=parent_inserted)
            if not updated:
                inserted = True
            [...]
        return inserted

    def save_base([...]):
        [...]
        with transaction.atomic(using=using, savepoint=False):
            parent_inserted = False
            if not raw:
                parent_inserted = self._save_parents(cls, using, update_fields)
            updated = self._save_table(raw, cls, force_insert or parent_inserted, force_update, using, update_fields)
        [...]

Change History (7)

comment:1 by François Dupayrat, 6 years ago

Description: modified (diff)

comment:2 by François Dupayrat, 6 years ago

Description: modified (diff)

comment:3 by Claude Paroz, 6 years ago

I guess a first experiment would be to make the change and run the test suite to see the outcome.

comment:4 by Tim Graham, 6 years ago

Triage Stage: UnreviewedAccepted

WithIndirectChild.objects.create(parent_field=True, direct_child_field=True, indirect_child_field=True), I see four queries at 4d48ddd8f93800a80330ec1dee7b7d4afe6ea95a:

1. INSERT INTO "t29568_parent" ("parent_field") VALUES (true) RETURNING "t29568_parent"."id"
2. UPDATE "t29568_directchild" SET "direct_child_field" = true WHERE "t29568_directchild"."parent_ptr_id" = 1
3. INSERT INTO "t29568_directchild" ("parent_ptr_id", "direct_child_field") VALUES (1, true)
4. INSERT INTO "t29568_indirectchild" ("directchild_ptr_id", "indirect_child_field") VALUES (1, true)

Accepting for investigation.

in reply to:  4 comment:5 by François Dupayrat, 6 years ago

Description: modified (diff)

Replying to Claude Paroz:

I guess a first experiment would be to make the change and run the test suite to see the outcome.

Thanks, I'll get that done.

Replying to Tim Graham:

WithIndirectChild.objects.create(parent_field=True, direct_child_field=True, indirect_child_field=True), I see four queries at 4d48ddd8f93800a80330ec1dee7b7d4afe6ea95a:

1. INSERT INTO "t29568_parent" ("parent_field") VALUES (true) RETURNING "t29568_parent"."id"
2. UPDATE "t29568_directchild" SET "direct_child_field" = true WHERE "t29568_directchild"."parent_ptr_id" = 1
3. INSERT INTO "t29568_directchild" ("parent_ptr_id", "direct_child_field") VALUES (1, true)
4. INSERT INTO "t29568_indirectchild" ("directchild_ptr_id", "indirect_child_field") VALUES (1, true)

Accepting for investigation.

Indeed, I didn't test with IndirectChild.objects.create, but with save() on a IndirectChild not yet saved to DB, since it's done that way by a library (django-tastypie), and outside my control.

I edited the description to reflect that.

comment:6 by François Dupayrat, 6 years ago

Has patch: set

Pull request created: https://github.com/django/django/pull/10197

Tests showed a different behavior than the one listed in the ticket: instead of doing UPDATE queries, SELECT queries were made in-between INSERT INTO queries. Editing Parent, Child and GrandChild to add a Boolean field revert to UPDATE queries.

In both cases, only INSERT INTO queries should be made anyway, so I don't think it's necessary to add fields to Parent, Child or GrandChild.

Summary of queries:

  • with existing I see 6 queries on GrandChild.objects.create()
    INSERT INTO "model_inheritance_grandparent" ("first_name", "last_name", "email", "place_id") VALUES ('', '', '', NULL)
    SELECT (1) AS "a" FROM "model_inheritance_parent" WHERE "model_inheritance_parent"."grandparent_ptr_id" = 1  LIMIT 1
    INSERT INTO "model_inheritance_parent" ("grandparent_ptr_id") SELECT 1
    SELECT (1) AS "a" FROM "model_inheritance_child" WHERE "model_inheritance_child"."parent_ptr_id" = 1  LIMIT 1
    INSERT INTO "model_inheritance_child" ("parent_ptr_id") SELECT 1
    INSERT INTO "model_inheritance_grandchild" ("child_ptr_id") SELECT 1
    
  • by adding a BooleanField with a default value to each of Parent, Child and GrandChild, I still see 6 queries
    INSERT INTO "model_inheritance_grandparent" ("first_name", "last_name", "email", "place_id") VALUES ('', '', '', NULL)
    UPDATE "model_inheritance_parent" SET "parent_field" = 1 WHERE "model_inheritance_parent"."grandparent_ptr_id" = 1
    INSERT INTO "model_inheritance_parent" ("grandparent_ptr_id", "parent_field") SELECT 1, 1
    UPDATE "model_inheritance_child" SET "child_field" = 1 WHERE "model_inheritance_child"."parent_ptr_id" = 1
    INSERT INTO "model_inheritance_child" ("parent_ptr_id", "child_field") SELECT 1, 1
    INSERT INTO "model_inheritance_grandchild" ("child_ptr_id", "grand_child_field") SELECT 1, 1
    
  • with patch I see 4 queries, both for modified and unmodified Models, provided the new fields have default values.
    INSERT INTO "model_inheritance_grandparent" ("first_name", "last_name", "email", "place_id") VALUES ('', '', '', NULL)
    INSERT INTO "model_inheritance_parent" ("grandparent_ptr_id") SELECT 1
    INSERT INTO "model_inheritance_child" ("parent_ptr_id") SELECT 1
    INSERT INTO "model_inheritance_grandchild" ("child_ptr_id") SELECT 1
    

comment:7 by Tim Graham <timograham@…>, 6 years ago

Resolution: fixed
Status: newclosed

In 861638a:

Fixed #29568 -- Prevented unnecessary UPDATE queries creating child models.

Note: See TracTickets for help on using tickets.
Back to Top