Context Navigation

← Previous Ticket
Next Ticket →

#32297 assigned Bug

QuerySet.get() method not working as expected with Window functions

Reported by:	Jerin Peter George	Owned by:	Saeed Hasani Borzadaran
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:
Cc:	Mariusz Felisiak, Simon Charette	Triage Stage:	Accepted
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Jerin Peter George)

I have a simple model as below,

class Employee(models.Model):
    name = models.CharField(max_length=40, blank=False, null=False)
    salary = models.PositiveIntegerField()
    department = models.CharField(max_length=40, blank=False, null=False)
    hire_date = models.DateField(blank=False, null=False)
    age = models.IntegerField(blank=False, null=False)
    bonus = models.DecimalField(decimal_places=2, max_digits=15, null=True)

and I queried

qs = Employee.objects.annotate(
    rank=Window(expression=Rank(), order_by=F("salary").desc()),
)

and then I have called the get() method to get the rank of a single specific object

qs.get(pk=pk).rank

Issue

The get(pk=pk).rank returning a constant value irrespective of the kwargs applied to the get() method

Change History (7)

comment:1 by Jerin Peter George, 4 years ago

Description:	modified (diff)

follow-up: 3 comment:2 by Jerin Peter George, 4 years ago

Test case to reproduce the issue

import datetime
from decimal import Decimal

from django.db.models import F, Window
from django.db.models.functions import Rank
from django.test import TestCase

from .models import Employee


class WindowFunctionTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        Employee.objects.bulk_create([
            Employee(
                name=e[0],
                salary=e[1],
                department=e[2],
                hire_date=e[3],
                age=e[4],
                bonus=Decimal(e[1]) / 400,
            )
            for e in [
                ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20),
                ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20),
                ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20),
                ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50),
                ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30),
                ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30),
                ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30),
                ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20),
                ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40),
                ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40),
                ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40),
                ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50),
            ]
        ])

    def test_rank_with_queryset_get_method(self):
        qs = Employee.objects.annotate(
            rank=Window(expression=Rank(), order_by=F("salary").desc()),
        )
        rank_set = list(qs.values_list("pk", "rank"))
        rank_set_iter = [(emp.pk, emp.rank) for emp in qs]

        self.assertEqual(rank_set, rank_set_iter)  # does queryset iteration has any problem?
        for pk, rank in rank_set:
            self.assertEqual(qs.get(pk=pk).rank, rank)  # does `.get()` has any problem?

in reply to: 2 comment:3 by starryrbs, 4 years ago

If you use qs.get(pk=pk).rank, the generated SQL statement is like this:

SELECT "expressions_window_employee"."id", "expressions_window_employee"."name", "expressions_window_employee"."salary", "expressions_window_employee"."department", "expressions_window_employee"."hire_date", "expressions_window_employee"."age", "expressions_window_employee"."classification_id", "expressions_window_employee"."bonus", RANK() OVER (ORDER BY "expressions_window_employee"."salary" DESC) AS "rank" FROM "expressions_window_employee" WHERE "expressions_window_employee"."id" = 11

If you want to query the value of rank, you can do this:

rank_dict = {item['pk']:item['rank'] for item in qs.values("pk", "rank")}
rank_dict.get(12)

Replying to Jerin Peter George:

Test case to reproduce the issue

import datetime
from decimal import Decimal

from django.db.models import F, Window
from django.db.models.functions import Rank
from django.test import TestCase

from .models import Employee


class WindowFunctionTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        Employee.objects.bulk_create([
            Employee(
                name=e[0],
                salary=e[1],
                department=e[2],
                hire_date=e[3],
                age=e[4],
                bonus=Decimal(e[1]) / 400,
            )
            for e in [
                ('Jones', 45000, 'Accounting', datetime.datetime(2005, 11, 1), 20),
                ('Williams', 37000, 'Accounting', datetime.datetime(2009, 6, 1), 20),
                ('Jenson', 45000, 'Accounting', datetime.datetime(2008, 4, 1), 20),
                ('Adams', 50000, 'Accounting', datetime.datetime(2013, 7, 1), 50),
                ('Smith', 55000, 'Sales', datetime.datetime(2007, 6, 1), 30),
                ('Brown', 53000, 'Sales', datetime.datetime(2009, 9, 1), 30),
                ('Johnson', 40000, 'Marketing', datetime.datetime(2012, 3, 1), 30),
                ('Smith', 38000, 'Marketing', datetime.datetime(2009, 10, 1), 20),
                ('Wilkinson', 60000, 'IT', datetime.datetime(2011, 3, 1), 40),
                ('Moore', 34000, 'IT', datetime.datetime(2013, 8, 1), 40),
                ('Miller', 100000, 'Management', datetime.datetime(2005, 6, 1), 40),
                ('Johnson', 80000, 'Management', datetime.datetime(2005, 7, 1), 50),
            ]
        ])

    def test_rank_with_queryset_get_method(self):
        qs = Employee.objects.annotate(
            rank=Window(expression=Rank(), order_by=F("salary").desc()),
        )
        rank_set = list(qs.values_list("pk", "rank"))
        rank_set_iter = [(emp.pk, emp.rank) for emp in qs]

        self.assertEqual(rank_set, rank_set_iter)  # does queryset iteration has any problem?
        for pk, rank in rank_set:
            self.assertEqual(qs.get(pk=pk).rank, rank)  # does `.get()` has any problem?

comment:4 by Jerin Peter George, 4 years ago

I am aware that I can do something similar as you suggested for a "workaround" to get the desired result. But the real question is whether this is the expected behavior for the Window function?

I am sure that I can do a similar query using the Count(...) expression as

from django.db import models


class Musician(models.Model):
    name = models.CharField(max_length=50)


class Album(models.Model):
    artist = models.ForeignKey(Musician, on_delete=models.CASCADE, related_name="albums")
    name = models.CharField(max_length=100)


qs = Musician.objects.annotate(album_count=models.Count("albums"))
print(qs.get(pk=2).album_count)

comment:5 by Carlton Gibson, 4 years ago

Cc:	Mariusz Felisiak Simon Charette added
Triage Stage:	Unreviewed → Accepted

I'm going to provisionally accept this and ask Simon and Mariusz to have a look, since there does seem to be something amiss.

Taking a similar case from one of my own projects, it's not just values (which might be/have been the issue) the shows the strange behaviour, but straight iteration:

>>> list((t.pk, t.rank) for t in Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc()))[:10])
[(1444, 1), (1443, 2), (1442, 3), (1441, 4), (1440, 5), (1439, 6), (1438, 7), (1437, 8), (1436, 9), (1435, 10)]
>>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1444).rank
1
>>> Task.objects.annotate(rank=Window(expression=Rank(), order_by=F("logged").desc())).get(pk=1442).rank
1

I'm very much expecting the last line there to output 3.

Version 0, edited 4 years ago by Carlton Gibson (next)

comment:6 by Simon Charette, 4 years ago

It's just how window functions behave when mixed with a query level constraint (WHERE) or LIMIT'ing;

In your first example the [:10] performs a LIMIT 10 so the RANK will be over ten results while get(pk=1442) will be against in a single result through WHERE id = 1442 LIMIT 21.

To me this is a similar class of problem to #24462, #28333. If it was possible to use the current result set as a subquery then the following would work as expected

Task.objects.annotate(
    rank=Window(expression=Rank(), order_by=F('logged').desc())
)[0:10].subquery().get(pk=1442)

As it would result in

SELECT * FROM (
    SELECT task.*, RANK() OVER (ORDER BY logged DESC) AS "rank"
    FROM task
    LIMIT 10
) subquery WHERE id=1442 LIMIT 21

Where the window function would span over the full ten set of rows.

Whether or not get should do the subquery pushdown automatically is debatable. I get a feeling it should as the fact it uses LIMIT is more of an implementation detail but it would be slightly backward incompatible.

comment:7 by Saeed Hasani Borzadaran, 2 years ago

Owner:	changed from nobody to Saeed Hasani Borzadaran
Status:	new → assigned

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#32297 assigned Bug

QuerySet.get() method not working as expected with Window functions

Description (last modified by Jerin Peter George)

Change History (7)

comment:1 by Jerin Peter George, 4 years ago

follow-up: 3 comment:2 by Jerin Peter George, 4 years ago

in reply to: 2 comment:3 by starryrbs, 4 years ago

comment:4 by Jerin Peter George, 4 years ago

comment:5 by Carlton Gibson, 4 years ago

comment:6 by Simon Charette, 4 years ago

comment:7 by Saeed Hasani Borzadaran, 2 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us