Opened 8 months ago

Last modified 8 months ago

#35309 closed Cleanup/optimization

Elide ordering of prefetch querysets for single valued relationships — at Version 12

Reported by: Laurent Lyaudet Owned by: Laurent Lyaudet
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords: prefetch single-valued order_by
Cc: Laurent Lyaudet, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Laurent Lyaudet)

While the ordering of multi-valued relationships must be preserved when prefetching relationships is it unnecessary when using prefetch_related against single valued relationships.

For example, given the following models

class Author(models.Model):
    name = models.CharField(max_length=200)

    class Meta:
        ordering = ["name"]


class Book(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(Author, related_name="books", on_delete=models.CASCADE)

    class Meta:
        ordering = ["title"]

The ordering of an author's books in Author.objects.prefetch_related("books") has a significance as multiple books might be associated with each authors.

It's not the case for a book's author in Book.objects.prefetch_related("author") through as the relationship can only contain a single author and there is a single way to order the members of a singleton.

In other words sorted([element], key=sort_func) will result in [element] for any sort_func.

This property holds true for all the single valued relationships that the ORM supports (backward and forward 1:1 and forward 1:M) which allows the prefetching to elide any predefined ordering safely to avoid an unnecessary and possibly expensive ordering defined for the related model queryset.
Currently the prefetch of authors will use the order by and add useless charge on the DB server.
It would be useful to remove this order by.
#ClimateChangeBrake

Change History (12)

comment:1 by Simon Charette, 8 months ago

Resolution: invalid
Status: newclosed

Meta.ordering is working as expected, please refer to its documentation and associated warning

Ordering is not a free operation. Each field you add to the ordering incurs a cost to your database. Each foreign key you add will implicitly include all of its default orderings as well.

If you don't want this implicit behaviour then don't use Meta.ordering. If you want to keep using it but not for particular prefetches than use Prefetch objects with a queryset that explicitly calls order_by() to disable ordering.

B.objects.prefetch_related(Prefetch("a", A.objects.order_by()))

comment:2 by Laurent Lyaudet, 8 months ago

Again a fast and without thought answer.
I already know for this solution with Prefetch.
Continue bashing good ideas because you don't like people giving them.
I'll applaude at the end.

There is no way it is useful to keep an order by when you do a query

SELECT * FROM a WHERE a.id IN (.....100 or more ids here) ORDER BY name;

then add the result in the cache of B objects.
What you reject without thought yields a speed-up of 10 to 15 % on very big prefetches...

comment:3 by Laurent Lyaudet, 8 months ago

Resolution: invalid
Status: closednew

comment:4 by Simon Charette, 8 months ago

Please refrain from assuming bad faith from triagers regarding the resolution of this ticket. The provided resolution was a reflected based on your report details and in no way based on your persona.

What do you suggest should happen for the thousands of projects out there that rely on prefetch_related to return results in a way that respects Meta.ordering? We can't simply make the behaviour of prefetch_related inconsistent with the normal behaviour or related manager access because it performs poorly when defined against an non-indexed field. I think the documentation warning I referred to is unfortunately all we can do to warn about this behaviour. Either use Meta.ordering and be prepared to deal with its implicit footguns or don't use it and use order_by where appropriate.

Whether Meta.ordering should exist in the first place is debatable as it's at the origin of many unexpected behaviour with other features of the ORM (aggregation comes to mind) but making prefetch_related special case it would not only be backward incompatible but inconsistent with how the rest of the framework treats it.

comment:5 by Laurent Lyaudet, 8 months ago

I spent my night on it but I was able to make a patch, and I don't think there will be any regression.
Consider the following models in some project TestNoOrderByForForeignKeyPrefetches and some app test_no_order_by models.py file:

from django.db import models

class A(models.Model):
    name = models.CharField(max_length=200)

    class Meta:
        ordering = ["name"]


class B(models.Model):
    name = models.CharField(max_length=200)
    a = models.ForeignKey(A, related_name="bs", on_delete=models.CASCADE)

    class Meta:
        ordering = ["name"]

Then consider the following command TestNoOrderByForForeignKeyPrefetches/test_no_order_by/management/commands/test_no_order_by_command.py :

from django.core.management.base import BaseCommand
from django.db import connection
from django.db.models import Prefetch, QuerySet, RawQuerySet
from django.db.models.fields.related_descriptors import (
    ForwardManyToOneDescriptor,
    ReverseOneToOneDescriptor,
)

from TestNoOrderByForForeignKeyPrefetches.test_no_order_by.models import A, B


old_prefetch_init = Prefetch.__init__


def new_prefetch_init(self, *args, **kwargs):
    result = old_prefetch_init(self, *args, **kwargs)
    if self.queryset is not None:
        self.queryset._do_not_modify_order_by = True
    return result


Prefetch.__init__ = new_prefetch_init


old_get_prefetch_querysets_forward_many_to_one = ForwardManyToOneDescriptor.get_prefetch_querysets
old_get_prefetch_querysets_reverse_one_to_one = ReverseOneToOneDescriptor.get_prefetch_querysets


def get_prefetch_querysets_forward_many_to_one(self, *args, **kwargs):
    result = old_get_prefetch_querysets_forward_many_to_one(self, *args, **kwargs)
    if not hasattr(result[0], '_do_not_modify_order_by'):
        result = (result[0].order_by(), *result[1:])
    return result


def get_prefetch_querysets_reverse_one_to_one(self, *args, **kwargs):
    result = old_get_prefetch_querysets_reverse_one_to_one(self, *args, **kwargs)
    if not hasattr(result[0], '_do_not_modify_order_by'):
        result = (result[0].order_by(), *result[1:])
    return result


ForwardManyToOneDescriptor.get_prefetch_querysets = get_prefetch_querysets_forward_many_to_one
ReverseOneToOneDescriptor.get_prefetch_querysets = get_prefetch_querysets_reverse_one_to_one


old_clone_queryset = QuerySet._clone


def new_clone_queryset(self):
    result = old_clone_queryset(self)
    if hasattr(self, '_do_not_modify_order_by'):
        result._do_not_modify_order_by = True
    return result


QuerySet._clone = new_clone_queryset


old_clone_raw_queryset = RawQuerySet._clone


def new_clone_raw_queryset(self):
    result = old_clone_raw_queryset(self)
    if hasattr(self, '_do_not_modify_order_by'):
        result._do_not_modify_order_by = True
    return result


RawQuerySet._clone = new_clone_raw_queryset


class Command(BaseCommand):
    help = "Test"

    def handle(self, *args, **options):
        B.objects.all().delete()
        A.objects.all().delete()

        a1 = A.objects.create(name="a1")
        a2 = A.objects.create(name="a2")
        a3 = A.objects.create(name="a3")
        a4 = A.objects.create(name="a4")
        a5 = A.objects.create(name="a5")
        a6 = A.objects.create(name="a6")
        a7 = A.objects.create(name="a7")

        b1 = B.objects.create(a=a1, name="b1")
        b2 = B.objects.create(a=a2, name="b2")
        b3 = B.objects.create(a=a3, name="b3")
        b4 = B.objects.create(a=a4, name="b4")
        b5 = B.objects.create(a=a5, name="b5")
        b6 = B.objects.create(a=a6, name="b6")
        b7 = B.objects.create(a=a7, name="b7")

        bs = list(B.objects.all().prefetch_related("a"))
        a_s = list(A.objects.all().prefetch_related("bs"))
        bs = list(B.objects.all().prefetch_related(
            Prefetch(
                "a",
                queryset=A.objects.order_by("-name")
            ),
        ))
        a_s = list(A.objects.all().prefetch_related(
            Prefetch(
                "bs",
                queryset=B.objects.order_by("-name")
            ),
        ))
        print(connection.queries)

If you launch the command with python3 manage.py test_no_order_by_command,
you will see that there are 8 SELECT after the 14 INSERT and that there is only 7 ORDER BY on them as requested.

I will prepare a PR.

Last edited 8 months ago by Laurent Lyaudet (previous) (diff)

comment:6 by Laurent Lyaudet, 8 months ago

Has patch: set
Last edited 8 months ago by Laurent Lyaudet (previous) (diff)

comment:7 by Laurent Lyaudet, 8 months ago

Here is the PR, I will improve it when requested : https://github.com/django/django/pull/17984 :)
I still have doubts about keeping the order by even with manual Prefetch.
I need to verify if it is possible to bypass the filter by id.

comment:8 by Mariusz Felisiak, 8 months ago

Resolution: wontfix
Status: newclosed

Laurent, thanks for this patch, however I agree with Simon.

I appreciate you'd like to reopen the ticket, but please follow the triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList.

comment:9 by Simon Charette, 8 months ago

Cc: Simon Charette added
Description: modified (diff)
Keywords: single-valued added
Patch needs improvement: set
Resolution: wontfix
Status: closednew
Summary: Remove Order by on models when prefetching by idElide ordering of prefetch querysets for single valued relationships
Triage Stage: UnreviewedAccepted

I'm sorry for the awkward back and forth here but reviewing Laurent's PR made something clear to me that wasn't from the origin report.

The requested optimization here is solely for single valued relationships (backward and forward 1:1 and forward 1:M). In this scenario, as pointed out by Laurent, ORDER BY doesn't matter as the related collection is either empty or a singleton and thus order_by() can always be used in their respective get_prefetch_queryset.

In the light of this realization I've adjusted the report and moved back this ticket to an accepted optimization.

Laurent, as for the patch I suggest simply decorating existing tests that make use of prefetching for single valued relationship (there are plenty in prefetch_related tests) which assertNumQueries and use the context queries to assert against the lack of ORDER BY.

e.g.

with self.assertNumQueries(2) as ctx:
   list(Book.objects.prefetch_related("author"))
self.assertNotIn("ORDER BY", ctx.queries[-1]["sql"])

I think that systematically calling order_by without the _do_not_modify_order_by should do.

Again, sorry for the misunderstanding and thank you for your efforts towards contributing this improvement to Django.

Last edited 8 months ago by Simon Charette (previous) (diff)

comment:10 by Mariusz Felisiak, 8 months ago

Owner: changed from nobody to Laurent Lyaudet
Status: newassigned

comment:11 by Laurent Lyaudet, 8 months ago

Hello,

Thanks for the replies and the constructive suggested improvements.

I removed _do_not_modify_order_by.

I improved to avoid an useless cloning of QuerySet.
For that I created a protected method QuerySet._in_place_empty_order_by() just below QuerySet.order_by().
I thought it would be the best solution without modifying the "public API" of QuerySet.
Another solution could be to add kwargs to QuerySet.order_by() like :

  • clone: bool = True, or no_clone: bool = False, or in_place: bool = False,
  • and maybe clear_default: bool = False.

But it would require a larger discussion I think.
If you think the name of QuerySet._in_place_empty_order_by() should be changed for something else, tell me.
I may also directly call queryset.query.clear_ordering(force=True, clear_default=True).
Tell me what you prefer to avoid the useless cloning :).

Regarding the tests, thanks for the suggestions.
I applied them in my two tests.
I think this is better practice to have two separate tests where the intent of each test is clear.
And I haven't added the asserts in other tests to avoid confusion on the goal of each test.
However, I understand that you may want to avoid cluttering the tests and the models in tests/prefetch_related/models.py.
I have see that I can replace the class A35309 with the class Book and the class B35309 with the class Author in my first test.
Tell me if you think I should do that.
I have see that I can replace the class A35309 with the class Room and the class C35309 with the class House in my second test.
Tell me if you think I should do that.

Best regards,

Laurent Lyaudet

Last edited 8 months ago by Laurent Lyaudet (previous) (diff)

comment:12 by Laurent Lyaudet, 8 months ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top