Opened 10 years ago
Last modified 6 years ago
#24218 new New feature
Use sub-query in ORM when distinct and order_by columns do not match — at Version 1
Reported by: | Miroslav Shubernetskiy | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | subquery distinct order_by |
Cc: | Harro | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
This ticket is to propose a slight change in ORM - use subqueries when querying a model where .distinct()
and .order_by()
(or .extra(order_by=())
) leftmost columns do not match. For example:
Model.objects.all().distinct('foo').order_by('bar')
The above generates the following SQL:
SELECT DISTINCT ON ("app_model"."foo") <lots of columns here> FROM "app_model" ORDER BY "app_model"."bar" ASC;
I am not sure about all backends however the above syntax is not allowed in PostgreSQL which produces the following error:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Here are PostgreSQL docs explaining why that is not allowed:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).
This ticket proposes to use subqueries in such situations which would use SQL:
SELECT * FROM ( SELECT DISTINCT ON ("app_model"."foo") <lots of columns here> FROM "app_model" ) result ORDER BY "app_model"."bar" ASC;
The above is perfectly valid SQL and produces expected results (please note that ORDER_BY
is in the outer query to guarantee that distinct results are correctly sorted).
I created a simple proof-of-concept patch by overwriting few things in SQLCompiler.as_sql()
which seems to work pretty well. The patch only creates subquery when the above dilema is encountered which should not have any negative side-effects on existing queries (since such queries were not allowed by SQL). The patch also works on the .count()
queries since Django then strips any ordering hence the subquery is never created.