#35437 closed Bug (needsinfo)
Exists subquery ignores ordering when it's distinct
Reported by: | Kevin Marsh | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | subquery, exists, ordering, distinct |
Cc: | Kevin Marsh, Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I ran into a bug in the Exists
expression, I'm using it to do some filtering where I just want the latest object grouped by a certain value (eg. latest financial statement object for each company). See the patch for a failing test, but using the Manager
/Employee
models from the test suite you can see in the slightly contrived example that for an Exists
query like
# ... filtering by highest paid employee per manager Exists( Employee.objects.order_by("manager", "-salary").distinct("manager").filter(pk=OuterRef("pk")) ) # ...
Gets transformed into SQL (Postgresql) like this where the ordering has been stripped out
-- ... EXISTS( SELECT DISTINCT ON (U0."manager_id") 1 AS "a" FROM "expressions_employee" U0 WHERE U0."id" = ("expressions_employee"."id") -- Missing ordering which is required for distinct LIMIT 1 ) -- ...
Obviously we want to call clear_ordering
most of the time on Exists
subqueries for performance reasons, but in this case we either shouldn't clear them or loudly raise an exception saying that distinct Exists
queries are not supported
Attachments (1)
Change History (3)
by , 6 months ago
Attachment: | test_ordering_in_distinct_exists_subquery.2.diff added |
---|
comment:1 by , 6 months ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Bug |
The report makes sense. We could go two ways here
- Clear
distinct_fields
as well as it's unnecessary (we only care about the existence of a single row after all) - Avoid forcing the removal of ordering since the
clear_ordering
method is a noop when distinct fields are defined.
comment:2 by , 6 months ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
Actually something is off in your test. Even if the ordering is not cleared all employees will match because you reduce the number of rows of the subquery by employee id
e.g.
SELECT "expressions_employee"."id", "expressions_employee"."firstname", "expressions_employee"."lastname", "expressions_employee"."salary", "expressions_employee"."manager_id", "expressions_employee"."based_in_eu" FROM "expressions_employee" WHERE (EXISTS (SELECT DISTINCT ON (U0."manager_id") 1 AS "a" FROM "expressions_employee" U0 WHERE (U0."manager_id" IS NOT NULL AND U0."id" = ("expressions_employee"."id")) ORDER BY U0."manager_id" ASC, U0."salary" DESC LIMIT 1) AND "expressions_employee"."manager_id" = 2) LIMIT 21;
In other words, your distinct group is by U0."id" = ("expressions_employee"."id")
so you'll always have matching rows even if the ordering is maintained. Try it out for yourself with the following patch
-
django/db/models/sql/query.py
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index b3f130c0b4..07da7ae1a1 100644
a b def exists(self, limit=True): 648 648 combined_query.exists(limit=False) 649 649 for combined_query in q.combined_queries 650 650 ) 651 q.clear_ordering( force=True)651 q.clear_ordering() 652 652 if limit: 653 653 q.set_limits(high=1) 654 654 q.add_annotation(Value(1), "a")
This appears to be a misuse of EXISTS
. You want to a subquery I suppose to achieve your purpose.
Employee.objects.filter( id__in=Employee.objects.filter( manager=OuterRef("manager") ).order_by("-salary").values("id")[:1] )
Or a window function
Employee.objects.annotate( highest_paid_id=Window( FirstValue("id"), partition_by="manager", order_by="-salary", ) ).filter(id=F("highest_paid_id"))
Failing test