Opened 2 years ago

Closed 2 years ago

#33772 closed Cleanup/optimization (fixed)

Queryset first() returns different result than queryset[0] when using ExpressionWrapper

Reported by: Pablo Pissi Owned by: Pablo Pissi
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: ExpressionWrapper, first, queryset
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi! I was creating a new app and found the following scenario, seems like an issue to me:

I have three models:

class Transaction(models.Model):
    amount = models.DecimalField(max_digits=10, decimal_places=2)
    description = models.CharField(max_length=255)
    created_by = models.ForeignKey(get_user_model(), on_delete=models.CASCADE, related_name="created_transactions")
    created_at = models.DateTimeField(auto_now_add=True)


class TransactionUser(models.Model):
    user = models.ForeignKey(get_user_model(), on_delete=models.CASCADE, related_name="transactions")
    transaction = models.ForeignKey("bills.Transaction", on_delete=models.CASCADE, related_name="users")
    amount = models.DecimalField(max_digits=10, decimal_places=2)
    transaction_type = models.ForeignKey('bills.TransactionType', on_delete=models.PROTECT)


class TransactionType(models.Model):
    name = models.CharField(max_length=50)
    code = models.CharField(max_length=3)
    multiplier = models.DecimalField(max_digits=5, decimal_places=2)

I create the following instances:

        transaction = Transaction(amount=100)
        positive = TransactionType(multiplier=1)
        negative = TransactionType(multiplier=-1)
        
        TransactionUser(transaction=transaction, user=user1, amount=50, transaction_type=negative)
        TransactionUser(transaction=transaction, user=user2, amount=50, transaction_type=negative)
        TransactionUser(transaction=transaction, user=user1, amount=10, transaction_type=positive)

I want to obtain the balance of each transaction by doing the following query:

 TransactionUser.objects \
        .filter(user_id=user_id) \
        .annotate(
            real_amount=ExpressionWrapper(F("amount")*F("transaction_type__multiplier"), output_field=DecimalField()),
        ).values("transaction") \
        .annotate(
            balance=Sum("real_amount")
        )

If I log the queryset returned, the value logged is:

<QuerySet [{'transaction': 1, 'balance': Decimal('-40.0000')}]>

But if I log

queryset.first()["balance"]
>>> -50

And if I log

queryset[0]["balance"]
>>> -40

It's also reproducible in 4.0

Change History (12)

comment:1 by Simon Charette, 2 years ago

Resolution: invalid
Status: newclosed

This is happening because you don't explicitly define an ordering for your queryset so the database can return whatever rows it wants in the case of [0] while .first() explicitly order by pk if no ordering is specified.

Please TicketClosingReasons/UseSupportChannels before filing issues in this ticket tracker.

comment:2 by Pablo Pissi, 2 years ago

Resolution: invalid
Status: closednew

Hi,

I've read the docs about ordering, but in this case the queryset has only one value and the value returned by .first() is not a value that exists on the queryset, is that correct?

It's returning values from the non-aggregated queryset

comment:3 by Carlton Gibson, 2 years ago

Resolution: needsinfo
Status: newclosed

It's also reproducible …

Hi Pablo — it's looks like a usage problem, so the support channels are the place to go, but if you can create a sample project that demonstrates the issue, perhaps in a failing test case, I'm happy to look further.

comment:4 by Pablo Pissi, 2 years ago

Hi again!

Here is a repo with a test that shows the issue: https://github.com/pablop94/django-poc

Let me know if you need more info.

comment:5 by Simon Charette, 2 years ago

The addition of order_by columns to the GROUP BY clause specified by values().annotate() is documented.

The crux of the issue remain; you are asking for an element at a particular index of a collection with an undefined ordering. Adding order_by("example_model") to your test case addresses the irregularity you are experiencing with first().

first() could arguably be adjusted to raise an exception when attempted to be used on an unordered collection to refuse the temptation to guess but that would require a deprecation period. Another approach could be to only raise an exception when first() is called on a unordered queryset with an explicit values() aggregation grouping that doesn't include pk which is the case reported here.

Here's what the code could look like for a systematic error on attempts to use first() on a undefined ordering queryset performing aggregation over specific columns

  • django/db/models/query.py

    diff --git a/django/db/models/query.py b/django/db/models/query.py
    index 67fcb411eb..da3d1ff3a7 100644
    a b async def alatest(self, *fields):  
    10321032
    10331033    def first(self):
    10341034        """Return the first object of a query or None if no match is found."""
    1035         for obj in (self if self.ordered else self.order_by("pk"))[:1]:
     1035        queryset = self
     1036        if not self.ordered:
     1037            if isinstance(self.query.group_by, tuple):
     1038                raise TypeError(
     1039                    "Cannot use first() on an unordered queryset performing aggregation"
     1040                )
     1041            queryset = self.order_by("pk")
     1042        for obj in queryset[:1]:
    10361043            return obj
    10371044
    10381045    async def afirst(self):

And a somewhat more complex implementation that still allow the pk

  • django/db/models/query.py

    diff --git a/django/db/models/query.py b/django/db/models/query.py
    index 67fcb411eb..0534a2cc16 100644
    a b async def alatest(self, *fields):  
    10321032
    10331033    def first(self):
    10341034        """Return the first object of a query or None if no match is found."""
    1035         for obj in (self if self.ordered else self.order_by("pk"))[:1]:
     1035        queryset = self
     1036        if not self.ordered:
     1037            if isinstance(self.query.group_by, tuple) and not any(
     1038                col.output_field is self.model._meta.pk for col in self.query.group_by
     1039            ):
     1040                raise TypeError(
     1041                    "Cannot use first() on an unordered queryset performing aggregation "
     1042                    "over a set of expressions that doesn't include the base model's primary key"
     1043                )
     1044            queryset = self.order_by("pk")
     1045        for obj in queryset[:1]:
    10361046            return obj
    10371047
    10381048    async def afirst(self):

The latter patch seems to be passing the test suite.

Thoughts on the value of such adjustment Carlton?

Last edited 2 years ago by Simon Charette (previous) (diff)

comment:6 by Carlton Gibson, 2 years ago

Resolution: needsinfo
Status: closednew
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Thanks both. Good.

Thoughts on the value of such adjustment Carlton?

We do see tickets periodically running into this problem: I recall a number of occasions seeing links to that same docs section.
I guess for every one that we see there are several/many in the wild we don't

The first() docs link to the Interaction with order_by() section there, presumably to try to forestall this confusion, but I still get the impression it's one of the more subtle points of the ORM behaviour, so if we can provide a better error message to point folks in the right direction that's likely a win. 🤔

I'll provisionally accept on that basis. (Mariusz can object on review if he doesn't like it 😃)

Pablo would you like to work on a PR based on Simon's suggestion?

comment:7 by Pablo Pissi, 2 years ago

Yes, I could work on a PR this weekend. Maybe I can add a documentation note to the first function too.

This change will require a deprecation period right?

comment:8 by Carlton Gibson, 2 years ago

This change will require a deprecation period right?

I don't think so. We're (just) adding a warning for an already incorrect usage.

comment:9 by Carlton Gibson, 2 years ago

Owner: changed from nobody to Pablo Pissi
Status: newassigned

comment:10 by Pablo Pissi, 2 years ago

Has patch: set

Hi!

I've created this PR. I've also updated the last() function: https://github.com/django/django/pull/15769

comment:11 by Mariusz Felisiak, 2 years ago

Triage Stage: AcceptedReady for checkin

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In d2872948:

Fixed #33772 -- Added QuerySet.first()/last() error message on unordered queryset with aggregation.

Note: See TracTickets for help on using tickets.
Back to Top