Opened 5 years ago

Closed 5 years ago

#31666 closed Cleanup/optimization (wontfix)

Combine prefetch_related queries for FK's to the same target model

Reported by: Adam Johnson Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If you have a source model with two or more FK's to the same target model, prefetch_related() for those FK's could do a single query rather than several. For models with duplicate targets in different FK's, this would also reduce transferred data and duplicate objects in memory.

For example:

from django.db import models


class Author(models.Model):
    pass


class Book(models.Model):
    author1 = models.ForeignKey(Author, on_delete=models.DO_NOTHING, related_name='+')
    author2 = models.ForeignKey(Author, on_delete=models.DO_NOTHING, related_name='+')

If I create some authors and fetch everything, the same author is represented by two objects because they were fetched in different queries:

In [1]: from example.core.models import Author, Book

In [2]: a1 = Author.objects.create()

In [3]: a2 = Author.objects.create()

In [4]: a3 = Author.objects.create()

In [5]: b1 = Book.objects.create(author1=a1, author2=a2)

In [6]: b2 = Book.objects.create(author1=a2, author2=a3)

In [7]: bs = Book.objects.prefetch_related('author1', 'author2')

In [8]: bs[0].author1
Out[8]: <Author: Author object (1)>

In [9]: bs[0].author2
Out[9]: <Author: Author object (2)>

In [10]: bs[1].author1
Out[10]: <Author: Author object (2)>

In [11]: bs[0].author2 is bs[1].author1
Out[11]: False

Change History (3)

comment:1 by Gordon Wrigley, 5 years ago

To my mind the duplicate objects in memory is kinda a feature.
Specifically it seems desirable to me that a prefetch related clause only changes performance not behaviour. Particularly it seems concerning that adding a prefetch_related to this (very contrived) example would change it's behaviour:

b = Book.objects.get()
b.author1.weight = 90
b.author2.weight = 50
print(b.author1.weight + b.author2.weight)
Version 0, edited 5 years ago by Gordon Wrigley (next)

comment:2 by Simon Charette, 5 years ago

To my mind the duplicate objects in memory is kinda a feature. Specifically it seems desirable to me that a prefetch related clause only changes performance not behaviour...

Agreed, different in memory objects should be created as it would also be slightly backward incompatible. The ORM doesn't maintain an identity map of objects (e.g. like SQLAlchemy's session does) so changing it solely for this feature would be confusing.

Performing a single query could be doable using a form of Author.objects.filter(Q(books__author1__in=author1_set) | Q(books__author2__in=author2_set)) which should naturally result in duplicated Author being returned because of the product caused by the two LEFT JOINs to book.

comment:3 by Mariusz Felisiak, 5 years ago

Resolution: wontfix
Status: newclosed

I agree with previous comments. This optimization is backward incompatible, quite niche, and can cause a lot of headache.

Note: See TracTickets for help on using tickets.
Back to Top