Context Navigation

← Previous Ticket
Next Ticket →

#36157 closed Bug

Unusual behaviour when pre-fetching with only applied on the related fields — at Version 1

Reported by:	Tim McCurrach	Owned by:
Component:	Uncategorized	Version:	5.1
Severity:	Normal	Keywords:
Cc:	Tim McCurrach	Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Tim McCurrach)

When prefetching related models. If you apply .only() to the related queryset, django performs additional lookups for related IDs that have been left out of the only.

It is probably easiest to explain the issue with an example.

Example Situation

Suppose you have these models:

class Blog(models.Model):
    name = models.CharField(max_length=100)

class Post(models.Model):
    name = models.CharField(max_length=100)
    blog = models.ForeignKey(Blog, on_delete=models.CASCADE, related_name="posts")
    ...lots of other big fields

And you create a few items for each:

blog = Blog.objects.create(name="Django Tricks")
blog2 = Blog.objects.create(name="React Tricks")
Post.objects.create(name="prefetching", blog=blog)
Post.objects.create(name="models", blog=blog)
Post.objects.create(name="templates", blog=blog)
Post.objects.create(name="hooks", blog=blog2)
Post.objects.create(name="components", blog=blog2)

If I wish to pre-fetch the posts for some blogs, but only want the names of each post, rather than the content of each post I can do the following:

Blog.objects.prefetch_related(Prefetch("posts", queryset=Post.objects.only("name")))

I would expect this to result in just 2 database queries. One to fetch the data for the Blog instances, and another to fetch the the data for the related Posts. Instead, there is an n+1 issue where there are 5 extra follow up requests for each of the related Post instances. This is the SQL that is generated:

SELECT "app_blog"."id", "app_blog"."name" FROM "app_blog" LIMIT 21
SELECT "app_post"."id", "app_post"."name" FROM "app_post" WHERE "app_post"."blog_id" IN (1, 2)
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 1 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 2 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 3 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 4 LIMIT 21
SELECT "app_post"."id", "app_post"."blog_id" FROM "app_post" WHERE "app_post"."id" = 5 LIMIT 21

I can understand it might be a good idea to have the related-id's for the blog on hand should you need them later. But I also think, that by using .only() you are explicitly telling django - I don't need these. This is a real problem for larger data-sets, where you end up with thousands of extra round-trips to the database.

Context

This is an issue that came up in the wild. I'm using a third-party optimiser that improves the performance of graphQL queries by decorating querysets with only(), select_related() etc. It correctly identifies that I am only using certain fields and applies only() to them, knowing I will never need to access certain related fields. This results in django producing many additional hits to the database.

Change History (1)

comment:1 by Tim McCurrach, 28 hours ago

Description:	modified (diff)

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#36157 closed Bug

Unusual behaviour when pre-fetching with only applied on the related fields — at Version 1

Description (last modified by Tim McCurrach)

Example Situation

Context

Change History (1)

comment:1 by Tim McCurrach, 28 hours ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us