Opened 4 years ago

Last modified 2 years ago

#32297 assigned Bug

QuerySet.get() method not working as expected with Window functions

Reported by: Jerin Peter George Owned by: Saeed Hasani Borzadaran
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Mariusz Felisiak, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jerin Peter George)

I have a simple model as below,

class Employee(models.Model):
    name = models.CharField(max_length=40, blank=False, null=False)
    salary = models.PositiveIntegerField()
    department = models.CharField(max_length=40, blank=False, null=False)
    hire_date = models.DateField(blank=False, null=False)
    age = models.IntegerField(blank=False, null=False)
    bonus = models.DecimalField(decimal_places=2, max_digits=15, null=True)

and I queried

qs = Employee.objects.annotate(
    rank=Window(expression=Rank(), order_by=F("salary").desc()),
)

and then I have called the get() method to get the rank of a single specific object

qs.get(pk=pk).rank

Issue

The get(pk=pk).rank returning a constant value irrespective of the kwargs applied to the get() method

Change History (7)

comment:1 by Jerin Peter George, 4 years ago

Description: modified (diff)

comment:2 by Jerin Peter George, 4 years ago

Test case to reproduce the issue

import datetime
from decimal import Decimal

from django.db.models import F, Window
from django.db.models.functions import Rank
from django.test import TestCase

from .models import Employee


class WindowFunctionTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        Employee.objects.bulk_create([
            Employee(
                name=e[0],
                salary=e[1],
                department=e[2],
                hire_date=e[3],
                age=e[4],
                bonus=Decimal(e[1]) / 400,
            )
            for e in [
                ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20),
                ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20),
                ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20),
                ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50),
                ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30),
                ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30),
                ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30),
                ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20),
                ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40),
                ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40),
                ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40),
                ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50),
            ]
        ])

    def test_rank_with_queryset_get_method(self):
        qs = Employee.objects.annotate(
            rank=Window(expression=Rank(), order_by=F("salary").desc()),
        )
        rank_set = list(qs.values_list("pk", "rank"))
        rank_set_iter = [(emp.pk, emp.rank) for emp in qs]

        self.assertEqual(rank_set, rank_set_iter)  # does queryset iteration has any problem?
        for pk, rank in rank_set:
            self.assertEqual(qs.get(pk=pk).rank, rank)  # does `.get()` has any problem?

in reply to:  2 comment:3 by starryrbs, 4 years ago

If you use qs.get(pk=pk).rank, the generated SQL statement is like this:

SELECT "expressions_window_employee"."id", "expressions_window_employee"."name", "expressions_window_employee"."salary", "expressions_window_employee"."department", "expressions_window_employee"."hire_date", "expressions_window_employee"."age", "expressions_window_employee"."classification_id", "expressions_window_employee"."bonus", RANK() OVER (ORDER BY "expressions_window_employee"."salary" DESC) AS "rank" FROM "expressions_window_employee" WHERE "expressions_window_employee"."id" = 11

If you want to query the value of rank, you can do this:

rank_dict = {item['pk']:item['rank'] for item in qs.values("pk", "rank")}
rank_dict.get(12)

Replying to Jerin Peter George:

Test case to reproduce the issue

import datetime
from decimal import Decimal

from django.db.models import F, Window
from django.db.models.functions import Rank
from django.test import TestCase

from .models import Employee


class WindowFunctionTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        Employee.objects.bulk_create([
            Employee(
                name=e[0],
                salary=e[1],
                department=e[2],
                hire_date=e[3],
                age=e[4],
                bonus=Decimal(e[1]) / 400,
            )
            for e in [
                ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20),
                ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20),
                ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20),
                ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50),
                ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30),
                ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30),
                ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30),
                ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20),
                ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40),
                ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40),
                ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40),
                ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50),
            ]
        ])

    def test_rank_with_queryset_get_method(self):
        qs = Employee.objects.annotate(
            rank=Window(expression=Rank(), order_by=F("salary").desc()),
        )
        rank_set = list(qs.values_list("pk", "rank"))
        rank_set_iter = [(emp.pk, emp.rank) for emp in qs]

        self.assertEqual(rank_set, rank_set_iter)  # does queryset iteration has any problem?
        for pk, rank in rank_set:
            self.assertEqual(qs.get(pk=pk).rank, rank)  # does `.get()` has any problem?

comment:4 by Jerin Peter George, 4 years ago

I am aware that I can do something similar as you suggested for a "workaround" to get the desired result. But the real question is whether this is the expected behavior for the Window function?

I am sure that I can do a similar query using the Count(...) expression as

from django.db import models


class Musician(models.Model):
    name = models.CharField(max_length=50)


class Album(models.Model):
    artist = models.ForeignKey(Musician, on_delete=models.CASCADE, related_name="albums")
    name = models.CharField(max_length=100)


qs = Musician.objects.annotate(album_count=models.Count("albums"))
print(qs.get(pk=2).album_count)

comment:5 by Carlton Gibson, 4 years ago

Cc: Mariusz Felisiak Simon Charette added
Triage Stage: UnreviewedAccepted

I'm going to provisionally accept this and ask Simon and Mariusz to have a look, since there does seem to be something amiss.

Taking a similar case from one of my own projects, it's not just values (which might be/have been the issue) that shows the strange behaviour, but straight iteration:

>>> list((t.pk, t.rank) for t in Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc()))[:10])
[(1444, 1), (1443, 2), (1442, 3), (1441, 4), (1440, 5), (1439, 6), (1438, 7), (1437, 8), (1436, 9), (1435, 10)]
>>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1444).rank
1
>>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1442).rank
1

I'm very much expecting the last line there to output 3.

Version 1, edited 4 years ago by Carlton Gibson (previous) (next) (diff)

comment:6 by Simon Charette, 4 years ago

It's just how window functions behave when mixed with a query level constraint (WHERE) or LIMIT'ing;

In your first example the [:10] performs a LIMIT 10 so the RANK will be over ten results while get(pk=1442) will be against in a single result through WHERE id = 1442 LIMIT 21.

To me this is a similar class of problem to #24462, #28333. If it was possible to use the current result set as a subquery then the following would work as expected

Task.objects.annotate(
    rank=Window(expression=Rank(), order_by=F('logged').desc())
)[0:10].subquery().get(pk=1442)

As it would result in

SELECT * FROM (
    SELECT task.*, RANK() OVER (ORDER BY logged DESC) AS "rank"
    FROM task
    LIMIT 10
) subquery WHERE id=1442 LIMIT 21

Where the window function would span over the full ten set of rows.

Whether or not get should do the subquery pushdown automatically is debatable. I get a feeling it should as the fact it uses LIMIT is more of an implementation detail but it would be slightly backward incompatible.

comment:7 by Saeed Hasani Borzadaran, 2 years ago

Owner: changed from nobody to Saeed Hasani Borzadaran
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top