Opened 6 weeks ago

Closed 6 weeks ago

Last modified 6 weeks ago

#35947 closed Cleanup/optimization (wontfix)

prefetch_related makes duplicate queries for the same records by different relations

Reported by: Jake Douglas Owned by:
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Prefetching the same records via different relations using prefetch_related results in duplicated queries for the same set of records. This is a problem because many web applications traverse relations by many different paths, even in the same request. The smallest example I could reproduce follows:

from django.db import models

# Create your models here.
class House(models.Model):
    pass

class Story(models.Model):
    house = models.ForeignKey(House, related_name = "stories", on_delete = models.CASCADE)

class Room(models.Model):
    story = models.ForeignKey(Story, related_name = "rooms", on_delete = models.CASCADE)
    house = models.ForeignKey(House, related_name = "rooms", on_delete = models.CASCADE)

class Window(models.Model):
    story = models.ForeignKey(Story, related_name = "windows", on_delete = models.CASCADE)
    room = models.ForeignKey(Room, related_name = "windows", on_delete = models.CASCADE)

House.objects.prefetch_related(
    "rooms__windows",
    "stories__rooms__windows"
).all()

This results in something like this:

SELECT ••• FROM "houses_house"
SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE
SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1)
SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE

The same behavior occurs when the last relation in the chain is one-to-one.

Change History (4)

comment:1 by Simon Charette, 6 weeks ago

Resolution: wontfix
Status: newclosed
Type: UncategorizedCleanup/optimization

I don't think we can fix this without a significant refactor to how prefetch related work given the Django ORM does not maintain a map of object identities by related fields so I'm going to won't fix this one.

In order to implement such a feature the prefetch related logic would have to pass a memoization object of the form dict[tuple[type[Model], models.Field], dict[Any, Model]] that maps model types and from field to another map of from field value to model instances to prefetch_related_objects and prefetch_one_level. All that to support the likely rare cases where the same relationship is prefetched through different paths and happens to target the exact set of objects (as any difference would require an extra query anyway).

If you disagree with the resolution of this ticket feel free to open a thread on the forums about the benefits vs complexity tradeoff of this requested optimization.

comment:2 by Jake Douglas, 6 weeks ago

All that to support the likely rare cases where the same relationship is prefetched through different paths and happens to target the exact set of objects

For the record, here's an example request from our production application:

(313 queries including 279 similar and 242 duplicates)

comment:3 by Simon Charette, 6 weeks ago

For the record, here's an example request from our production application:
(313 queries including 279 similar and 242 duplicates)

The example you provided should limit the number of SQL queries to 6 which you have detailed to be

SELECT ••• FROM "houses_house"
SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE
SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1)
SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE

Which is one for the top-level houses queryset, two for the rooms__windows prefetch, and three for the stories__rooms__windows prefetch where two are duplicates. As explained above, invasive changes would be required to take that number to 4 under circumstances that your specific schema design seems to be exhibiting.

If your application is executing 50x the number of 6 queries that could be reduced to 4 by implementing this optimization there is likely something else to look on your side before implementing this optimization. In other words prefetch_related is not a feature that systemically make elevated number of queries issues go away; it's a tool that bounds the N+1 query problem and not an identity mapper strategy #17.

Version 1, edited 6 weeks ago by Simon Charette (previous) (next) (diff)

comment:4 by Jake Douglas, 6 weeks ago

The example code I provided is not from our application, it's just a simplification that demonstrates the behavior. Most occurrences of duplicate prefetch queries in our application do not involve denormalized relations.

The quoted query stats are from a request in our application and were intended to demonstrate that this behavior is not rare, but I realize the greater context is missing: almost all of the duplicate queries are coming from prefetch_related, with some queries being issued 20+ times for the most commonly prefetched relations in this request.

Note: See TracTickets for help on using tickets.
Back to Top