Opened 9 years ago
Last modified 9 years ago
#25464 new New feature
Allow skipping IN clause on prefetch queries — at Initial Version
Reported by: | Erik Cederstrand | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Shai Berger | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When using prefetch_related() on a large queryset, the prefetch query SQL can be inefficient. Consider this:
Category.objects.filter(type=5).prefetch_related('items')
If 100.000 categories have type=5, then an IN clause with 100.000 Category IDs is generated to get the Item objects. Even with a custom queryset using a Prefetch() object, the IN clause is generated, even though it is A) redundant, B) sends a potentially multi-megabyte SQL statement over the wire for the database to process, C) may confuse the query planner to generate an inefficient execution plan, and D) doesn't scale:
Category.objects.filter(type=5).prefetch_related(Prefetch('items', queryset=Item.objects.filter(categoryitem=5)))
Pull request https://github.com/django/django/pull/5356 adds the possibility to skip the IN clause in cases where we are sure that a better queryset will get (at least) the same items as the IN clause would:
Category.objects.filter(type=5).prefetch_related(Prefetch('items', queryset=Item.objects.filter(categoryitem=5), filter_on_instances=False))
In my tests, this speeds up prefetch_related() by 20x-50x on large querysets.