Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#30047 closed Cleanup/optimization (worksforme)

Documentation: add a Note about default ordering on models used in Subquery filters

Reported by: powderflask Owned by: nobody
Component: Documentation Version: 2.1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

A Subquery filter on models.Obj, e.g., filter(obj__in=Subquery(...)) ,requires that the subquery return exactly one column populated with Obj.pk.
Documentation suggests using objects.values('pk')

However, if the models.Obj.Meta.ordering is set, then those ordering fields are included with the subquery results, raising the exception:

django.db.utils.ProgrammingError: subquery has too many columns

This is expected behavior, but confusing and not directly obvious in the code. Could be more clearly documented at:
https://docs.djangoproject.com/en/dev/ref/models/expressions/#limiting-a-subquery-to-a-single-column
with a Note reading something similar to:
"""
To avoid unintended columns being added to the subquery expression, be sure there is no order_by clause on the subquery. In particular, watch for default ordering specified on the subquery model Meta.
"""

Change History (10)

comment:1 by Simon Charette, 6 years ago

Even if it doesn't completely solves the issue the subquery code should be able to clear the ordering under certain circumstances in the first place (default aka Model.ordering included).

There's a lot duplicate code between Subquery.resolve_expression and Query.resolve_expression and most of it should probably live in Query, that would take care of this issue by itself.

comment:2 by Marten Kenbeek, 6 years ago

Resolution: worksforme
Status: newclosed

I'd be surprised if the ordering affected which fields are selected in the subquery, and in fact I can't reproduce it on either sqlite or postgres. For example, the following works just fine (Permission defines a default ordering):

>>> User.objects.filter(user_permissions__in=Subquery(Permission.objects.filter(codename='add_contenttype').values('pk')))
<QuerySet [<User: marten>]>
>>> User.objects.filter(user_permissions__in=Subquery(Permission.objects.filter(codename='add_contenttype').order_by('codename').values('pk')))
<QuerySet [<User: marten>]>

Looking at the resulting queries, both only select the id column in the subquery, and not any of the columns from the ordering.

comment:3 by Carlton Gibson, 6 years ago

Yes, I've just been playing with exactly the same thing: I can't reproduce it as long as I provide the 'pk' field to values()

With these models:

from django.db import models


class Parent(models.Model):
    name = models.CharField(max_length=255)


class Child(models.Model):
    parent = models.ForeignKey(Parent, related_name='children', on_delete=models.CASCADE)
    name = models.CharField(max_length=255)

    class Meta:
        ordering = ('name',)

It works as expected:

>>> from django.db.models import Subquery
>>> from app.models import Parent, Child
>>> p = Parent.objects.create(name='A Parent')
>>> Child.objects.create(name='First Child', parent=p)
<Child: Child object (1)>
>>> Child.objects.create(name='A Second Child', parent=p)
<Child: Child object (2)>
>>> Child.objects.all()
<QuerySet [<Child: Child object (2)>, <Child: Child object (1)>]>
>>> Child.objects.values()
<QuerySet [{'id': 2, 'parent_id': 1, 'name': 'A Second Child'}, {'id': 1, 'parent_id': 1, 'name': 'First Child'}]>
>>> Child.objects.values('pk')
<QuerySet [{'pk': 2}, {'pk': 1}]>
>>> Parent.objects.filter(children__in=Subquery(Child.objects.values()))
OperationalError: sub-select returns 3 columns - expected 1
>>> Parent.objects.filter(children__in=Subquery(Child.objects.values('pk')))
<QuerySet [<Parent: Parent object (1)>, <Parent: Parent object (1)>]>

comment:4 by Carlton Gibson, 6 years ago

Type: UncategorizedCleanup/optimization

comment:5 by Simon Charette, 6 years ago

powderflask I assume you forgot to mention your query was using annotated aggregation functions?

comment:6 by Simon Charette, 6 years ago

I'd be surprised if the ordering affected which fields are selected in the subquery, and in fact I can't reproduce it on either sqlite or postgres.

It does when annotated aggregation functions are involved and can be quite a surprise when Meta.ordering is involved, see #14357.

comment:7 by Carlton Gibson, 6 years ago

Ah, brilliant — I was thinking this reminded me of #14357, but couldn’t see the connection.

#14357 will be resolved via a deprecation of the current behaviour right… — is an extra comment in the docs worthwhile?
(A PR Refs #14357 -- … would be fine no?)

in reply to:  6 comment:8 by Marten Kenbeek, 6 years ago

Replying to Simon Charette:

It does when annotated aggregation functions are involved and can be quite a surprise when Meta.ordering is involved, see #14357.

Any column in SELECT or ORDER BY must be present in GROUP BY, but it's not necessary to add columns from ORDER BY to the SELECT clause, and as far as I can tell Django does not add the extra columns to SELECT. I suspect powderflask has something (e.g. an annotation) that adds an extra column to the subquery, that's not caused by any ordering. At this point we'd have to see the original code/query to be sure.

in reply to:  5 comment:9 by powderflask, 6 years ago

Replying to Simon Charette:

powderflask I assume you forgot to mention your query was using annotated aggregation functions?

Sorry. Clearly didn't isolate this issue far enough before reporting. Yes, my query includes annotated aggregations.

Version 0, edited 6 years ago by powderflask (next)

comment:10 by powderflask, 6 years ago

Here is some gross data pulled the bug as it appears in the production app (I will try to reproduce this in a simpler test case when I get some time -- sorry).

There is no aggregation on the query or the subquery.
However the subquery clause does contain multiple joins and another sub-query.
The bug does appear to be as originally report -- that the subquery model.Meta.ordering appears to cause the ordering fields to be included in the subquery select, even though a values('pk') clause is added.

Code:
2 models: Prework and Treatment
Treatment has a ForiegnKey to Prework
Subquery is used to filter Treatments by a variety of filters applied to Prework (and its related Project Approvals, and Project ACL models)

Bug appears for the following queryset:

    preworks = Prework.objects.filter(**filter_specs)
    qs = Treatment.objects.filter( prework__in=models.Subquery(preworks.values('pk')) )

with: Prework.Meta.ordering == ('subregion', 'fiscal_year', 'sequence', 'circuit')

Bug disappears when ordering is explicitly removed from the subquery:

    preworks = Prework.objects.filter(**filter_specs).order_by()

Here is the WHERE clause for the buggy qs above (notice extra select fields that match default ordering on sub-query):

SELECT .... WHERE
   "treatment_area"."prework_id" IN 
   (
         SELECT DISTINCT V0."id", 'V0."subregion", V0."fiscal_year", V0."sequence", V0."circuit"
 	  FROM "prework" V0 
 	  INNER JOIN "project_preworkproject" V1 ON (V0."id" = V1."prework_id") '
          INNER JOIN "approvals_projectapproval" V2 ON (V1."id" = V2."project_id") '
          WHERE (
              NOT (V0."id" = %s) AND NOT (V0."project_type" IN (%s, %s)) '
             AND V0."project_stage" = %s 
             AND V0."subregion" = %s 
             AND V0."project_type" = %s '
             AND V0."fiscal_year" = %s 
             AND V1."id" IN 
             (
                SELECT U0."id" FROM "project_preworkproject" U0 
                INNER JOIN "project_preworkprojectacl" U1 ON (U0."id" = U1."project_id") 
                WHERE U1."user_id" = %s
             ) 
             AND V2."type" = %s 
             AND V2."is_final" = %s
          ) 
         ORDER BY V0."subregion" ASC, V0."fiscal_year" ASC, 'V0."sequence" ASC, V0."circuit" ASC
   ) 
Note: See TracTickets for help on using tickets.
Back to Top