Opened 4 months ago

Closed 4 months ago

Last modified 4 months ago

#35751 closed Bug (invalid)

Ordering a model via a m2m field creates unintended side effect for ForeignKeys

Reported by: Dennis Scheiba Owned by:
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords: ORM ordering
Cc: Dennis Scheiba Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Dennis Scheiba)

Python 3.10 @ Django 5.1.1

Using a many to many relation for ordering (which is something you shouldn't do?) will affect the traversing when the object in question is accessed via a foreign key - the necessary left join for the ordering "spills" into the ORM results, yielding n (number of foreign key references) times m (number of many-to-many relations within the object) objects instead of just the actual n objects referencing the foreign key.

If you comment out the ordering the ORM behaves as expected.

Th ORM results should not "multiply" due to an ordering configuration.

How to reproduce

Given a toy models.py on a new project which looks like

from django.db import models

class Order(models.Model):
    pass

class BookingTime(models.Model):
    date = models.DateTimeField(auto_now=True)


class OrderItem(models.Model):
    order = models.ForeignKey(
        Order,
        on_delete=models.CASCADE,
        related_name="order_items",
    )

    booking_times = models.ManyToManyField(
        "BookingTime",
        related_name="order_items",
    )

    class Meta:
        ordering = [
            # this is the problem!
            'booking_times__date',
        ]

Then on a shell do

In [1]: from foo.models import *

In [2]: booking_times = [BookingTime() for _ in range(4)]

In [3]: [b.save() for b in booking_times]
Out[3]: [None, None, None, None]

In [4]: order = Order()

In [5]: order.save()

In [6]: order_item = OrderItem(order=order)

In [7]: order_item.save()

In [8]: order_item.booking_times.add(*booking_times)

In [9]: order.order_items.count()
Out[9]: 1

In [10]: order.order_items.all()
Out[10]: <QuerySet [<OrderItem: OrderItem object (1)>, <OrderItem: OrderItem object (1)>, <OrderItem: OrderItem object (1)>, <OrderItem: OrderItem object (1)>]>

In [11]: print(order.order_items.all().query)
SELECT "foo_orderitem"."id", "foo_orderitem"."order_id" FROM "foo_orderitem" LEFT OUTER JOIN "foo_orderitem_booking_times" ON ("foo_orderitem"."id" = "foo_orderitem_booking_times"."orderitem_id") LEFT OUTER JOIN "foo_bookingtime" ON ("foo_orderitem_booking_times"."bookingtime_id" = "foo_bookingtime"."id") WHERE "foo_orderitem"."order_id" = 2 ORDER BY "foo_bookingtime"."date" ASC

In [12]: order.order_items.all().explain()
Out[12]: '5 0 0 SEARCH foo_orderitem USING COVERING INDEX foo_orderitem_order_id_e19c2dbd (order_id=?)\n11 0 0 SEARCH foo_orderitem_booking_times USING COVERING INDEX foo_orderitem_booking_times_orderitem_id_bookingtime_id_49667055_uniq (orderitem_id=?) LEFT-JOIN\n17 0 0 SEARCH foo_bookingtime USING INTEGER PRIMARY KEY (rowid=?) LEFT-JOIN\n35 0 0 USE TEMP B-TREE FOR ORDER BY'

This also happens in a template, e.g.

{% for item in order.order_items.all %}
  {{ item }}
{% endfor %} 

also yields 4 times the same item instead of just once (due to the left join?).

Change History (6)

comment:1 by Dennis Scheiba, 4 months ago

Description: modified (diff)

comment:2 by Dennis Scheiba, 4 months ago

Description: modified (diff)

comment:3 by Dennis Scheiba, 4 months ago

Description: modified (diff)

comment:4 by Simon Charette, 4 months ago

Resolution: invalid
Status: newclosed

This behaviour is described in the documentation under the note that covers It is permissible to specify a multi-valued field to order the results by.

Using a many to many relation for ordering (which is something you shouldn't do?)

Could you describe wha are you expecting to happen when specifying order_by("booking_times__date") if not for each item to be returned multiple times?

What are you expecting booking_times__date to even mean? Are you looking for order_by(Min("booking_times__date")) instead so the items are ordered by their first booking date, or maybe Max("booking_times__date") so its the item with the booking that ends first, or maybe you interested in ordering by the shortest booking duration so Max("booking_times__date") - Min("booking_times__date")?

See #20842 which introduced this documentation addition and ticket:20842#comment:3 which explains why order_by("booking_times__date") by itself doesn't make much sense and should likely be deprecated.

Version 5, edited 4 months ago by Simon Charette (previous) (next) (diff)

comment:5 by Dennis Scheiba, 4 months ago

I don't have enough expertise in the construction of an ORM to fully understand all implications, but as you mentioned, an ordering by a m2m relation is not explicit enough to be determined, and actually the LOC in question was an overlook from my side. I did not expect that the ordering of an item can be the cause for the multiplication of items and I instead digged through the Manager implementation to find the cause.

Maybe a warning could have been helpful here as it fundamentally changes the way the ORM behaves? I'd think that the edge cases where this actual behavior is wanted can be done in a way which is more canonical within the ORM.

In the end, a warning during boot procedure would have been nice for me and would have saved me some headache - maybe even this issue is already helpful enough for people.

comment:6 by Simon Charette, 4 months ago

I personally think that order_by(multivalued__value) is a foot-gun due to its ambiguity and duplicate row spanning behaviour and should be deprecated but it poses some challenges as it is a behavior that was supported and even documented for over a decade so it's safe to assume that some users might be legitimately depend on it for some nice query needs (e.g. adding distinct() afterwards to deal with the duplicate row issue).

If we were to raise warnings or consider deprecating this feature we should have an upgrade path that covers these use cases. In your particular case

OrderItem.objects.order_by("booking_times__date").distinct()

which results in

SELECT DISTINCT orderitem.*
FROM orderitem
LEFT JOIN orderitem_booking_times ON (orderitem_booking_times.orderitem_id = orderitem.id)
LEFT JOIN bookingtime ON (orderitem_booking_times.bookingtime_id = bookingtime.id)
ORDER BY bookingtime.date

can be replaced with

OrderItem.objects.order_by(Min("booking_times__date"))

which results in

SELECT orderitem.*
FROM orderitem
LEFT JOIN orderitem_booking_times ON (orderitem_booking_times.orderitem_id = orderitem.id)
LEFT JOIN bookingtime ON (orderitem_booking_times.bookingtime_id = bookingtime.id)
GROUP BY orderitem.id
ORDER BY MIN(bookingtime.date)

Since order_by is resolved late in the query compilation process it should be doable to present a deprecation warning that suggests the usage of Min and Max instead.

Note: See TracTickets for help on using tickets.
Back to Top