#35947 closed Cleanup/optimization (wontfix)
prefetch_related makes duplicate queries for the same records by different relations
Reported by: | Jake Douglas | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Prefetching the same records via different relations using prefetch_related
results in duplicated queries for the same set of records. This is a problem because many web applications traverse relations by many different paths, even in the same request. The smallest example I could reproduce follows:
from django.db import models # Create your models here. class House(models.Model): pass class Story(models.Model): house = models.ForeignKey(House, related_name = "stories", on_delete = models.CASCADE) class Room(models.Model): story = models.ForeignKey(Story, related_name = "rooms", on_delete = models.CASCADE) house = models.ForeignKey(House, related_name = "rooms", on_delete = models.CASCADE) class Window(models.Model): story = models.ForeignKey(Story, related_name = "windows", on_delete = models.CASCADE) room = models.ForeignKey(Room, related_name = "windows", on_delete = models.CASCADE) House.objects.prefetch_related( "rooms__windows", "stories__rooms__windows" ).all()
This results in something like this:
SELECT ••• FROM "houses_house" SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1) SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1) SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1) SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE
The same behavior occurs when the last relation in the chain is one-to-one.
Change History (4)
comment:1 by , 6 weeks ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Type: | Uncategorized → Cleanup/optimization |
comment:2 by , 6 weeks ago
All that to support the likely rare cases where the same relationship is prefetched through different paths and happens to target the exact set of objects
For the record, here's an example request from our production application:
(313 queries including 279 similar and 242 duplicates)
comment:3 by , 6 weeks ago
For the record, here's an example request from our production application:
(313 queries including 279 similar and 242 duplicates)
The example you provided should limit the number of SQL queries to 6 which you have detailed to be
SELECT ••• FROM "houses_house" SELECT ••• FROM "houses_room" WHERE "houses_room"."house_id" IN (1) SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE SELECT ••• FROM "houses_story" WHERE "houses_story"."house_id" IN (1) SELECT ••• FROM "houses_room" WHERE "houses_room"."story_id" IN (1) SELECT ••• FROM "houses_window" WHERE "houses_window"."room_id" IN (1) # DUPLICATE
Which is one for the top-level houses queryset, two for the rooms__windows
prefetch, and three for the stories__rooms__windows
prefetch where two are duplicates. As explained above, invasive changes would be required to take that number to 4 under circumstances that your specific schema design seems to be exhibiting because of how you denormalized Room.house
(already available through Room.story.house
) and Window.story
(already available through Window.room.story
).
If your application is executing 50x the number of 6 queries that could be reduced to 4 by implementing this optimization there is likely something else to look on your side before implementing this optimization. In other words prefetch_related
is not a feature that systemically make elevated number of queries issues go away; it's a tool that bounds the N+1 query problem and not an identity mapper strategy #17 to accommodate database denomalization needs.
comment:4 by , 6 weeks ago
The example code I provided is not from our application, it's just a simplification that demonstrates the behavior. Most occurrences of duplicate prefetch queries in our application do not involve denormalized relations.
The quoted query stats are from a request in our application and were intended to demonstrate that this behavior is not rare, but I realize the greater context is missing: almost all of the duplicate queries are coming from prefetch_related
, with some queries being issued 20+ times for the most commonly prefetched relations in this request.
I don't think we can fix this without a significant refactor to how prefetch related work given the Django ORM does not maintain a map of object identities by related fields so I'm going to won't fix this one.
In order to implement such a feature the prefetch related logic would have to pass a memoization object of the form
dict[tuple[type[Model], models.Field], dict[Any, Model]]
that maps model types and from field to another map of from field value to model instances toprefetch_related_objects
andprefetch_one_level
. All that to support the likely rare cases where the same relationship is prefetched through different paths and happens to target the exact set of objects (as any difference would require an extra query anyway).If you disagree with the resolution of this ticket feel free to open a thread on the forums about the benefits vs complexity tradeoff of this requested optimization.