Opened 2 hours ago

Closed 72 minutes ago

Last modified 28 minutes ago

#35947 closed Cleanup/optimization (wontfix)

prefetch_related makes duplicate queries for the same records by different relations

Reported by: Jake Douglas Owned by:
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Prefetching the same records via different relations using prefetch_related results in duplicated queries for the same set of records. This is a problem because many web applications traverse relations by many different paths, even in the same request. The smallest example I could reproduce follows:

from django.db import models

# Create your models here.
class House(models.Model):
    pass

class Story(models.Model):
    house = models.ForeignKey(House, related_name = "stories", on_delete = models.CASCADE)

class Room(models.Model):
    story = models.ForeignKey(Story, related_name = "rooms", on_delete = models.CASCADE)
    house = models.ForeignKey(House, related_name = "rooms", on_delete = models.CASCADE)

class Window(models.Model):
    story = models.ForeignKey(Story, related_name = "windows", on_delete = models.CASCADE)
    room = models.ForeignKey(Room, related_name = "windows", on_delete = models.CASCADE)

House.objects.prefetch_related(
    "rooms__windows",
    "stories__rooms__windows"
).all()

This results in something like this:

SELECT ••• FROM "houses_house"
SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE
SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1)
SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1)
SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE

The same behavior occurs when the last relation in the chain is one-to-one.

Change History (2)

comment:1 by Simon Charette, 72 minutes ago

Resolution: wontfix
Status: newclosed
Type: UncategorizedCleanup/optimization

I don't think we can fix this without a significant refactor to how prefetch related work given the Django ORM does not maintain a map of object identities by related fields so I'm going to won't fix this one.

In order to implement such a feature the prefetch related logic would have to pass a memoization object of the form dict[tuple[type[Model], models.Field], dict[Any, Model]] that maps model types and from field to another map of from field value to model instances to prefetch_related_objects and prefetch_one_level. All that to support the likely rare cases where the same relationship is prefetched through different paths and happens to target the exact set of objects (as any difference would require an extra query anyway).

If you disagree with the resolution of this ticket feel free to open a thread on the forums about the benefits vs complexity tradeoff of this requested optimization.

comment:2 by Jake Douglas, 28 minutes ago

All that to support the likely rare cases where the same relationship is prefetched through different paths and happens to target the exact set of objects

For the record, here's an example request from our production application:

(313 queries including 279 similar and 242 duplicates)
Note: See TracTickets for help on using tickets.
Back to Top