Opened 13 years ago

Closed 13 years ago

Last modified 7 years ago

#16554 closed Bug (invalid)

Unnecessary join when using a reverse foreign-key relationship in separate filter or aggregate calls

Reported by: Ben Davis Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Ben Davis Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django allows you to perform queries across reverse foreign key relationships. If, however, you need to access that same relationship in more than one filter or aggregate call, the ORM creates unnecessary joins.

Example

app/models.py:

class Author(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    email = models.EmailField()

class Book(models.Model):
    author = models.ForeignKey(Author, related_name='books')
    title = models.CharField(max_length=100)
    genre = models.CharField(max_length=20, choices=(
        ('SCIFI', 'SciFi'),
        ('FANTASY', 'Fantasy'),
        ('NONFICTION', 'NonFiction')
    ))
    published = models.DateField()
    pages = models.IntegerField()

This particular query will only perform one join on app_books:

qs = Author.objects.filter(books__genre='SCIFI', books__pages__gt=500)
print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" 
FROM "app_author" 
INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
WHERE ("app_book"."genre" = 'SCIFI'  AND "app_book"."pages" > 500 )

However, if you separate the filter() call into two separate calls, you get this:

qs = Author.objects.filter(books__genre='SCIFI')
qs = qs.filter(books__pages__gt=500)
print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" 
FROM "app_author" 
INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") 
WHERE ("app_book"."genre" = 'SCIFI'  AND T3."pages" > 500 )

As you can see, simply separating out the filters into separate calls creates a new, completely unecessary join.

This also occurs if you need to do an aggregate:

qs = Author.objects.annotate(pages_written=Sum('books__pages'))
qs = qs.filter(books__genre='SCIFI')
print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email", SUM("app_book"."pages") AS "pages_written" 
FROM "app_author" 
LEFT OUTER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") 
WHERE T3."genre" = 'SCIFI'
GROUP BY "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email"

In this particular case, I imagine the ORM is seeing the need for two joins because one is outer and one is inner, however, I would argue that if an outer join already occurs as a result of an aggregate, it should always use the existing join.

Change History (5)

comment:1 by Russell Keith-Magee, 13 years ago

Resolution: invalid
Status: newclosed

This is working as designed.

A single filter() clause is used to determine the join-sharing relationships in a query; Model.objects.filter(Q1, Q2) is not *necessarily* equivalent to Model.objects.filter(Q1).filter(Q2).

comment:2 by Ben Davis, 13 years ago

I would argue that this is still a bug and should be left open. There is no conceivable situation in which you would need both joins. An extra join can severely impact the performance of a query, especially when dealing with large datasets. The ORM should recognize these situations and optimize as needed by removing the unnecessary joins.

Sure, you can avoid the extra join in the multiple .filter() example, but there are some situations in which it cannot be avoided, as illustrated above with the aggregate query.

Last edited 13 years ago by Ben Davis (previous) (diff)

comment:3 by Ben Davis, 13 years ago

Cc: Ben Davis added

comment:4 by Anssi Kääriäinen, 13 years ago

If I am not mistaken, the aggregate example is valid - it produces wrong results. Author having two different SCIFI books will have pages calculated twice.

But the chained filters is not valid, as one query is asking for a model which has a book with genre of SCIFI and more than 500 pages, the other is asking for a model which has a book with genre of SCIFI and a book which has more than 500 pages. So, in the first example there must be one matching book, in the second example the books matching the condition can be different ones.

I am leaving this as closed, as I don't know what to do for a half-valid ticket and I haven't actually verified that the aggregate example produces wrong results.

comment:5 by Ben Davis, 13 years ago

akaari, thank you for clarifying that. I had to read your comment like five times, but that makes sense now. I opened #16603 for the aggregate bug.

Last edited 7 years ago by Tim Graham (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top