Opened 9 months ago
Closed 9 months ago
#35309 closed Cleanup/optimization (fixed)
Elide ordering of prefetch querysets for single valued relationships
Reported by: | Laurent Lyaudet | Owned by: | Laurent Lyaudet |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | prefetch single-valued order_by |
Cc: | Laurent Lyaudet, Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
While the ordering of multi-valued relationships must be preserved when prefetching relationships is it unnecessary when using prefetch_related
against single valued relationships.
For example, given the following models
class Author(models.Model): name = models.CharField(max_length=200) class Meta: ordering = ["name"] class Book(models.Model): title = models.CharField(max_length=200) author = models.ForeignKey(Author, related_name="books", on_delete=models.CASCADE) class Meta: ordering = ["title"]
The ordering of an author's books in Author.objects.prefetch_related("books")
has a significance as multiple books might be associated with each authors.
It's not the case for a book's author in Book.objects.prefetch_related("author")
through as the relationship can only contain a single author and there is a single way to order the members of a singleton.
In other words sorted([element], key=sort_func)
will result in [element]
for any sort_func
.
This property holds true for all the single valued relationships that the ORM supports (backward and forward 1:1 and forward 1:M) which allows the prefetching to elide any predefined ordering safely to avoid an unnecessary and possibly expensive ordering defined for the related model queryset.
Currently the prefetch of authors will use the order by and add useless load on the DB server.
It would be useful to remove this order by.
#ClimateChangeBrake
Change History (15)
comment:1 by , 9 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 9 months ago
Again a fast and without thought answer.
I already know for this solution with Prefetch.
Continue bashing good ideas because you don't like people giving them.
I'll applaude at the end.
There is no way it is useful to keep an order by when you do a query
SELECT * FROM a WHERE a.id IN (.....100 or more ids here) ORDER BY name;
then add the result in the cache of B objects.
What you reject without thought yields a speed-up of 10 to 15 % on very big prefetches...
comment:3 by , 9 months ago
Resolution: | invalid |
---|---|
Status: | closed → new |
comment:4 by , 9 months ago
Please refrain from assuming bad faith from triagers regarding the resolution of this ticket. The provided resolution was a reflected based on your report details and in no way based on your persona.
What do you suggest should happen for the thousands of projects out there that rely on prefetch_related
to return results in a way that respects Meta.ordering
? We can't simply make the behaviour of prefetch_related
inconsistent with the normal behaviour or related manager access because it performs poorly when defined against an non-indexed field. I think the documentation warning I referred to is unfortunately all we can do to warn about this behaviour. Either use Meta.ordering
and be prepared to deal with its implicit footguns or don't use it and use order_by
where appropriate.
Whether Meta.ordering
should exist in the first place is debatable as it's at the origin of many unexpected behaviour with other features of the ORM (aggregation comes to mind) but making prefetch_related
special case it would not only be backward incompatible but inconsistent with how the rest of the framework treats it.
comment:5 by , 9 months ago
I spent my night on it but I was able to make a patch, and I don't think there will be any regression.
Consider the following models in some project TestNoOrderByForForeignKeyPrefetches and some app test_no_order_by models.py file:
from django.db import models class A(models.Model): name = models.CharField(max_length=200) class Meta: ordering = ["name"] class B(models.Model): name = models.CharField(max_length=200) a = models.ForeignKey(A, related_name="bs", on_delete=models.CASCADE) class Meta: ordering = ["name"]
Then consider the following command TestNoOrderByForForeignKeyPrefetches/test_no_order_by/management/commands/test_no_order_by_command.py :
from django.core.management.base import BaseCommand from django.db import connection from django.db.models import Prefetch, QuerySet, RawQuerySet from django.db.models.fields.related_descriptors import ( ForwardManyToOneDescriptor, ReverseOneToOneDescriptor, ) from TestNoOrderByForForeignKeyPrefetches.test_no_order_by.models import A, B old_prefetch_init = Prefetch.__init__ def new_prefetch_init(self, *args, **kwargs): result = old_prefetch_init(self, *args, **kwargs) if self.queryset is not None: self.queryset._do_not_modify_order_by = True return result Prefetch.__init__ = new_prefetch_init old_get_prefetch_querysets_forward_many_to_one = ForwardManyToOneDescriptor.get_prefetch_querysets old_get_prefetch_querysets_reverse_one_to_one = ReverseOneToOneDescriptor.get_prefetch_querysets def get_prefetch_querysets_forward_many_to_one(self, *args, **kwargs): result = old_get_prefetch_querysets_forward_many_to_one(self, *args, **kwargs) if not hasattr(result[0], '_do_not_modify_order_by'): result = (result[0].order_by(), *result[1:]) return result def get_prefetch_querysets_reverse_one_to_one(self, *args, **kwargs): result = old_get_prefetch_querysets_reverse_one_to_one(self, *args, **kwargs) if not hasattr(result[0], '_do_not_modify_order_by'): result = (result[0].order_by(), *result[1:]) return result ForwardManyToOneDescriptor.get_prefetch_querysets = get_prefetch_querysets_forward_many_to_one ReverseOneToOneDescriptor.get_prefetch_querysets = get_prefetch_querysets_reverse_one_to_one old_clone_queryset = QuerySet._clone def new_clone_queryset(self): result = old_clone_queryset(self) if hasattr(self, '_do_not_modify_order_by'): result._do_not_modify_order_by = True return result QuerySet._clone = new_clone_queryset old_clone_raw_queryset = RawQuerySet._clone def new_clone_raw_queryset(self): result = old_clone_raw_queryset(self) if hasattr(self, '_do_not_modify_order_by'): result._do_not_modify_order_by = True return result RawQuerySet._clone = new_clone_raw_queryset class Command(BaseCommand): help = "Test" def handle(self, *args, **options): B.objects.all().delete() A.objects.all().delete() a1 = A.objects.create(name="a1") a2 = A.objects.create(name="a2") a3 = A.objects.create(name="a3") a4 = A.objects.create(name="a4") a5 = A.objects.create(name="a5") a6 = A.objects.create(name="a6") a7 = A.objects.create(name="a7") b1 = B.objects.create(a=a1, name="b1") b2 = B.objects.create(a=a2, name="b2") b3 = B.objects.create(a=a3, name="b3") b4 = B.objects.create(a=a4, name="b4") b5 = B.objects.create(a=a5, name="b5") b6 = B.objects.create(a=a6, name="b6") b7 = B.objects.create(a=a7, name="b7") bs = list(B.objects.all().prefetch_related("a")) a_s = list(A.objects.all().prefetch_related("bs")) bs = list(B.objects.all().prefetch_related( Prefetch( "a", queryset=A.objects.order_by("-name") ), )) a_s = list(A.objects.all().prefetch_related( Prefetch( "bs", queryset=B.objects.order_by("-name") ), )) print(connection.queries)
If you launch the command with python3 manage.py test_no_order_by_command,
you will see that there are 8 SELECT after the 14 INSERT and that there is only 7 ORDER BY on them as requested.
I will prepare a PR.
comment:7 by , 9 months ago
Here is the PR, I will improve it when requested : https://github.com/django/django/pull/17984 :)
I still have doubts about keeping the order by even with manual Prefetch.
I need to verify if it is possible to bypass the filter by id.
comment:8 by , 9 months ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Laurent, thanks for this patch, however I agree with Simon.
I appreciate you'd like to reopen the ticket, but please follow the triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList.
comment:9 by , 9 months ago
Cc: | added |
---|---|
Description: | modified (diff) |
Keywords: | single-valued added |
Patch needs improvement: | set |
Resolution: | wontfix |
Status: | closed → new |
Summary: | Remove Order by on models when prefetching by id → Elide ordering of prefetch querysets for single valued relationships |
Triage Stage: | Unreviewed → Accepted |
I'm sorry for the awkward back and forth here but reviewing Laurent's PR made something clear to me that wasn't from the origin report.
The requested optimization here is solely for single valued relationships (backward and forward 1:1 and forward 1:M). In this scenario, as pointed out by Laurent, ORDER BY
doesn't matter as the related collection is either empty or a singleton and thus order_by()
can always be used in their respective get_prefetch_queryset
.
In the light of this realization I've adjusted the report and moved back this ticket to an accepted optimization.
Laurent, as for the patch I suggest simply decorating existing tests that make use of prefetching for single valued relationship (there are plenty in prefetch_related
tests) which assertNumQueries
and use the context queries to assert against the lack of ORDER BY
.
e.g.
with self.assertNumQueries(2) as ctx: list(Book.objects.prefetch_related("author")) self.assertNotIn("ORDER BY", ctx.queries[-1]["sql"])
I think that systematically calling order_by
without the _do_not_modify_order_by
should do.
Again, sorry for the misunderstanding and thank you for your efforts towards contributing this improvement to Django.
comment:10 by , 9 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:11 by , 9 months ago
Hello,
Thanks for the replies and the constructive suggested improvements.
I removed _do_not_modify_order_by
.
I improved to avoid an useless cloning of QuerySet
.
For that I created a protected method QuerySet._in_place_empty_order_by()
just below QuerySet.order_by()
.
I thought it would be the best solution without modifying the "public API" of QuerySet
.
Another solution could be to add kwargs to QuerySet.order_by()
like :
clone: bool = True
, orno_clone: bool = False
, orin_place: bool = False
,- and maybe
clear_default: bool = False
.
But it would require a larger discussion I think.
If you think the name of QuerySet._in_place_empty_order_by()
should be changed for something else, tell me.
I may also directly call queryset.query.clear_ordering(force=True, clear_default=True)
.
Tell me what you prefer to avoid the useless cloning :).
Regarding the tests, thanks for the suggestions.
I applied them in my two tests.
I think this is better practice to have two separate tests where the intent of each test is clear.
And I haven't added the asserts in other tests to avoid confusion on the goal of each test.
However, I understand that you may want to avoid cluttering the tests and the models in tests/prefetch_related/models.py
.
I have see that I can replace the class A35309
with the class Book
and the class B35309
with the class Author
in my first test.
Tell me if you think I should do that.
I have see that I can replace the class A35309
with the class Room
and the class C35309
with the class House
in my second test.
Tell me if you think I should do that.
Best regards,
Laurent Lyaudet
comment:12 by , 9 months ago
Description: | modified (diff) |
---|
comment:13 by , 9 months ago
Description: | modified (diff) |
---|
comment:14 by , 9 months ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Meta.ordering
is working as expected, please refer to its documentation and associated warningIf you don't want this implicit behaviour then don't use
Meta.ordering
. If you want to keep using it but not for particular prefetches than usePrefetch
objects with a queryset that explicitly callsorder_by()
to disable ordering.