#24254 closed Bug (fixed)
Queries using distinct() and order_by() cannot be used with the __in lookup
Reported by: | Torsten Bronger | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | bronger@…, Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Consider the following model:
from django.db import models class MyModel(models.Model): name = models.CharField(max_length=200) class Meta: ordering = ("name",)
Then, the following query:
MyModel.objects.filter(pk__in=MyModel.objects.all().distinct()[:10])
results in an invalid SQL query. SQLite complains with "only a single result allowed for a SELECT that is part of an expression".
Change History (13)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
Description: | modified (diff) |
---|
You're right, my example was not minimal. I changed it so that only one model is defined.
comment:3 by , 10 years ago
Has patch: | set |
---|
I've added a patch with a fix and a test at https://github.com/django/django/pull/4078.
comment:4 by , 10 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:5 by , 10 years ago
Patch needs improvement: | set |
---|
comment:6 by , 10 years ago
The proposed patch passes on SQLite, but fails on Postgres, with:
for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: ...eries_person"."id" FROM "queries_person" ORDER BY "queries_p...
and on MySQL, with:
This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
I'm afraid I don't know enough about either database to know what the right fix is.
comment:7 by , 8 years ago
I've hit this issue as well, reaching an identical situation of __in
+ ordering + distinct + slicing.
Typing this out, I realised that if you have ordering and slicing, then you'll need the values in the SELECT
(see the Postgres error mentioned before).
So I would propose a "tuple unwrapping" in that case. So the pk__in=qs
property would expand to SQL like SELECT U1."id" FROM (SELECT DISTINCT ... ) U1
In [61]: print MyModel.objects.filter(pk__in=MyModel.objects.all().distinct()[:10]).query # invalid syntax due to missing tuple unwrapping but includes proper ordering SELECT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" WHERE "test_mymodel"."id" IN (SELECT DISTINCT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" ORDER BY "test_mymodel"."name" ASC LIMIT 10) ORDER BY "test_mymodel"."name" ASC # proposed solution with tuple unwrapping SELECT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" WHERE "test_mymodel"."id" IN (SELECT U0."id" FROM (SELECT DISTINCT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" ORDER BY "test_mymodel"."name" ASC LIMIT 10) U0 ) ORDER BY "test_mymodel"."name" ASC
I think we could always do wrapping and unwrapping of __in
subqueries. The cost is pretty negligible in Postgres (an extra select node), and avoids having to figure out the innards of the subquery select.
comment:8 by , 8 years ago
Cc: | added |
---|---|
Version: | 1.7 → master |
We should be able to avoid unconditional tuple wrapping as it only seems to be required when DISTINCT
is used with ORDER BY
.
comment:9 by , 8 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Status: | new → assigned |
Summary: | Passing a query to "pk__in" of another query may result in invalid SQL → Queries using distinct() and order_by() cannot be used with the __in lookup |
comment:10 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
(pending some cosmetic updates)
Have had a little dig into this.
Consider an
__in
lookup with the following on the RHS:M.objects.order_by(sort_field).distinct()[m:n]
.This generates a query that returns both
M
's primary key andsort_field
. Because it returns two fields, it cannot be used as a subquery (at least in SQLite) and so we hit the reported error.I think this behaviour is related to a warning in the docs: "Any fields used in an
order_by()
call are included in the SQLSELECT
columns" (although from reading the code forSQLCompiler.get_extra_select
I think that the fields used in anorder_by()
call are only included in theSELECT
columns when the query hasdistinct
set).It's not obvious to me why any fields used in an
order_by()
call must always be included in theSELECT
columns, so I'd like to propose a fix wherebySQLCompiler.get_extra_select
returns an empty list if it's dealing with a subquery. I've made that change locally, and nothing fails. However, I don't know enough about the ORM to know whether this is a good idea! I hope this is useful.