#30047 closed Cleanup/optimization (worksforme)
Documentation: add a Note about default ordering on models used in Subquery filters
Reported by: | powderflask | Owned by: | nobody |
---|---|---|---|
Component: | Documentation | Version: | 2.1 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
A Subquery filter on models.Obj, e.g., filter(obj__in=Subquery(...))
,requires that the subquery return exactly one column populated with Obj.pk.
Documentation suggests using objects.values('pk')
However, if the models.Obj.Meta.ordering is set, then those ordering fields are included with the subquery results, raising the exception:
django.db.utils.ProgrammingError: subquery has too many columns
This is expected behavior, but confusing and not directly obvious in the code. Could be more clearly documented at:
https://docs.djangoproject.com/en/dev/ref/models/expressions/#limiting-a-subquery-to-a-single-column
with a Note reading something similar to:
"""
To avoid unintended columns being added to the subquery expression, be sure there is no order_by clause on the subquery. In particular, watch for default ordering specified on the subquery model Meta.
"""
Change History (10)
comment:1 by , 6 years ago
comment:2 by , 6 years ago
Resolution: | → worksforme |
---|---|
Status: | new → closed |
I'd be surprised if the ordering affected which fields are selected in the subquery, and in fact I can't reproduce it on either sqlite or postgres. For example, the following works just fine (Permission
defines a default ordering):
>>> User.objects.filter(user_permissions__in=Subquery(Permission.objects.filter(codename='add_contenttype').values('pk'))) <QuerySet [<User: marten>]> >>> User.objects.filter(user_permissions__in=Subquery(Permission.objects.filter(codename='add_contenttype').order_by('codename').values('pk'))) <QuerySet [<User: marten>]>
Looking at the resulting queries, both only select the id
column in the subquery, and not any of the columns from the ordering.
comment:3 by , 6 years ago
Yes, I've just been playing with exactly the same thing: I can't reproduce it as long as I provide the 'pk'
field to values()
With these models:
from django.db import models class Parent(models.Model): name = models.CharField(max_length=255) class Child(models.Model): parent = models.ForeignKey(Parent, related_name='children', on_delete=models.CASCADE) name = models.CharField(max_length=255) class Meta: ordering = ('name',)
It works as expected:
>>> from django.db.models import Subquery >>> from app.models import Parent, Child >>> p = Parent.objects.create(name='A Parent') >>> Child.objects.create(name='First Child', parent=p) <Child: Child object (1)> >>> Child.objects.create(name='A Second Child', parent=p) <Child: Child object (2)> >>> Child.objects.all() <QuerySet [<Child: Child object (2)>, <Child: Child object (1)>]> >>> Child.objects.values() <QuerySet [{'id': 2, 'parent_id': 1, 'name': 'A Second Child'}, {'id': 1, 'parent_id': 1, 'name': 'First Child'}]> >>> Child.objects.values('pk') <QuerySet [{'pk': 2}, {'pk': 1}]> >>> Parent.objects.filter(children__in=Subquery(Child.objects.values())) OperationalError: sub-select returns 3 columns - expected 1 >>> Parent.objects.filter(children__in=Subquery(Child.objects.values('pk'))) <QuerySet [<Parent: Parent object (1)>, <Parent: Parent object (1)>]>
comment:4 by , 6 years ago
Type: | Uncategorized → Cleanup/optimization |
---|
follow-up: 9 comment:5 by , 6 years ago
powderflask I assume you forgot to mention your query was using annotated aggregation functions?
follow-up: 8 comment:6 by , 6 years ago
I'd be surprised if the ordering affected which fields are selected in the subquery, and in fact I can't reproduce it on either sqlite or postgres.
It does when annotated aggregation functions are involved and can be quite a surprise when Meta.ordering
is involved, see #14357.
comment:7 by , 6 years ago
comment:8 by , 6 years ago
Replying to Simon Charette:
It does when annotated aggregation functions are involved and can be quite a surprise when
Meta.ordering
is involved, see #14357.
Any column in SELECT
or ORDER BY
must be present in GROUP BY
, but it's not necessary to add columns from ORDER BY
to the SELECT
clause, and as far as I can tell Django does not add the extra columns to SELECT
. I suspect powderflask has something (e.g. an annotation) that adds an extra column to the subquery, that's not caused by any ordering. At this point we'd have to see the original code/query to be sure.
comment:9 by , 6 years ago
Replying to Simon Charette:
powderflask I assume you forgot to mention your query was using annotated aggregation functions?
Sorry. Clearly didn't isolate this issue far enough before reporting. Yes, that query may have had annotated aggregations.
This was an issue I noticed while doing some probing on a large legacy code base. Can't reproduce it myself now *&#&@
Clearly I jumped to a hasty conclusion about the behaviour I was seeing - feeling the idiot for not isolating this in a simple test case -- sorry for causing extra work.
comment:10 by , 6 years ago
Here is some gross data pulled the bug as it appears in the production app (I will try to reproduce this in a simpler test case when I get some time -- sorry).
There is no aggregation on the query or the subquery.
However the subquery clause does contain multiple joins and another sub-query.
The bug does appear to be as originally report -- that the subquery model.Meta.ordering
appears to cause the ordering fields to be included in the subquery select, even though a values('pk')
clause is added.
Code:
2 models: Prework and Treatment
Treatment has a ForiegnKey to Prework
Subquery is used to filter Treatments by a variety of filters applied to Prework (and its related Project Approvals, and Project ACL models)
Bug appears for the following queryset:
preworks = Prework.objects.filter(**filter_specs) qs = Treatment.objects.filter( prework__in=models.Subquery(preworks.values('pk')) )
with: Prework.Meta.ordering == ('subregion', 'fiscal_year', 'sequence', 'circuit')
Bug disappears when ordering is explicitly removed from the subquery:
preworks = Prework.objects.filter(**filter_specs).order_by()
Here is the WHERE clause for the buggy qs
above (notice extra select fields that match default ordering on sub-query):
SELECT .... WHERE "treatment_area"."prework_id" IN ( SELECT DISTINCT V0."id", 'V0."subregion", V0."fiscal_year", V0."sequence", V0."circuit" FROM "prework" V0 INNER JOIN "project_preworkproject" V1 ON (V0."id" = V1."prework_id") ' INNER JOIN "approvals_projectapproval" V2 ON (V1."id" = V2."project_id") ' WHERE ( NOT (V0."id" = %s) AND NOT (V0."project_type" IN (%s, %s)) ' AND V0."project_stage" = %s AND V0."subregion" = %s AND V0."project_type" = %s ' AND V0."fiscal_year" = %s AND V1."id" IN ( SELECT U0."id" FROM "project_preworkproject" U0 INNER JOIN "project_preworkprojectacl" U1 ON (U0."id" = U1."project_id") WHERE U1."user_id" = %s ) AND V2."type" = %s AND V2."is_final" = %s ) ORDER BY V0."subregion" ASC, V0."fiscal_year" ASC, 'V0."sequence" ASC, V0."circuit" ASC )
Even if it doesn't completely solves the issue the subquery code should be able to clear the ordering under certain circumstances in the first place (
default
akaModel.ordering
included).There's a lot duplicate code between
Subquery.resolve_expression
andQuery.resolve_expression
and most of it should probably live inQuery
, that would take care of this issue by itself.