#21760 closed Bug (fixed)
prefetch_related uses an inefficient query to get the related fk data
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.6 |
Severity: | Release blocker | Keywords: | prefetch_related ForeignKey |
Cc: | loic@…, marti@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Consider the following models and query:
class Author(models.Model): name = models.CharField(max_length = 20) class Book(models.Model): name = models.CharField(max_length = 20) author = models.ForeignKey(Author, related_name = 'books') Book.objects.all().prefetch_related('author')
The query I'd expect to be issued by the prefetch_related
is:
SELECT * FROM author WHERE id IN (... the author_ids of the books from the main query ...)
However, the query that actually get executed are:
SELECT a.* FROM author a INNER JOIN book b ON (a.id = b.author_id) WHERE b.id IN (... the ids of the books from the main query ...)
There are two problems with this approach:
- It does a join, which wouldn't be necessary if
author_id
s were used instead ofbook.id
s - When one uses
prefetch_related
in such a case (that is, to prefetch a single fk per object rather than a many to many) instead ofselect_related
, it's usually because there are many primary objects (books) and few unique related objects (authors);select_related
, despite getting all the data in one query, could end up returning much more data than is needed because the related objects are duplicated many times over, and will be slower overall. Thus, the approach currently used byprefetch_related
ends up using the set of ids that will almost always be larger (the book ids, rather than the authors).
The reverse case (Author.objects.all().prefetch_related('books')
) does the efficient, joinless query I'd expect -- namely:
SELECT * FROM book WHERE author_id IN (... the ids of the authors from the main query ...)
I've attached a sample project with a test.
Attachments (3)
Change History (19)
by , 11 years ago
comment:1 by , 11 years ago
Summary: | prefetch_related uses an inefficient query to get the related data → prefetch_related uses an inefficient query to get the related fk data |
---|
comment:2 by , 11 years ago
Keywords: | ForeignKey added |
---|
comment:3 by , 11 years ago
Cc: | added |
---|
by , 11 years ago
Attachment: | patch_21760.diff added |
---|
comment:5 by , 11 years ago
Has patch: | set |
---|
Added a patch & test; seems to do the trick and all tests pass.
comment:6 by , 11 years ago
My bad, I only ran the prefetch_related tests. Patch breaks foreign_object.tests.MultiColumnFKTests.test_prefetch_foreignkey_forward_works
.
comment:7 by , 11 years ago
Cc: | added |
---|
comment:8 by , 11 years ago
I am in the process of migrating an app from Django 1.5 to 1.6; this regression also affects us (it did not occur in Django 1.5). In one example this turns a query from 175 bytes into 66 kB, with a corresponding increase in query time, result set size and memory usage.
I can confirm that the attached patch fixes this issue when applied on top of Django 1.6.2. Can it be included in 1.6.3?
comment:9 by , 11 years ago
Severity: | Normal → Release blocker |
---|
Haven't verified the claim that it's a regression from 1.5, but marking it as a release blocker assuming that's the case. A note in the 1.6.3 release notes would also be helpful.
comment:11 by , 11 years ago
We discussed this issue with @akaariai on IRC.
Like we did for #21410, we'll take a practical approach for fixing the regression from Django 1.6 to master and go for the proposed fix. It's worth noting that the inefficient query will still be present for multicolumn relations, but these are not documented and one needs to use private APIs to make them, also these have never seen the "joinless" query, so that doesn't qualify as a regression.
The implementation of ReverseSingleRelatedObjectDescriptor.get_prefetch_queryset()
really ought to be cleaned up at some point, especially when we land the patch for composite fields but it'll require a fair amount of refactoring so it only qualifies for master.
@valtron2000, would you mind updating the patch?
- The "FIXME" comment would have to be updated to explain this new case with a reference to this ticket.
- The current test is pretty fragile, could you rather check in QS.query if there is a join? (e.g. https://gist.github.com/10345520)
- Code needs to be compatible with PY3, use
range
instead ofxrange
, or import it fromsix
if you really wantxrange
. - The changes inside the
if
, do they do anything differently or is it just a cleanup? If it's just a cleanup, only fix theif
condition, and not the implementation. Because right now it's pretty hard to see where the problem was from.
Feel free to open a PR on github.
Edit: Added a couple more things to the patch review.
comment:12 by , 11 years ago
I'll submit a PR soon; thanks for the suggestions.
The fix is a one-liner in the if
, and the rest of the changes were just cleanup.
comment:14 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Project for book example with a test.