#30771 closed Bug (fixed)
Filtering on query result overrides GROUP BY of internal query
Reported by: | Aur Saraf | Owned by: | James Timmins |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | annotate, annotation, filter, groupby, aggregate, aggregation |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
from django.contrib.auth import models a = models.User.objects.filter(email__isnull=True).values('email').annotate(m=Max('id')).values('m') print(a.query) # good # SELECT MAX("auth_user"."id") AS "m" FROM "auth_user" WHERE "auth_user"."email" IS NULL GROUP BY "auth_user"."email" print(a[:1].query) # good # SELECT MAX("auth_user"."id") AS "m" FROM "auth_user" WHERE "auth_user"."email" IS NULL GROUP BY "auth_user"."email" LIMIT 1 b = models.User.objects.filter(id=a[:1]) print(b.query) # GROUP BY U0."id" should be GROUP BY U0."email" # SELECT ... FROM "auth_user" WHERE "auth_user"."id" = (SELECT U0."id" FROM "auth_user" U0 WHERE U0."email" IS NULL GROUP BY U0."id" LIMIT 1)
Change History (12)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
follow-up: 5 comment:3 by , 5 years ago
Thanks for tackling that one James!
If I can provide you some guidance I'd suggest you have a look at lookups.Exact.process_rhs
We probably don't want to perform the clear_select_clause
and add_fields(['pk'])
when the query is already selecting fields.
That's exactly what In.process_rhs
does already by only performing these operations if not getattr(self.rhs, 'has_select_fields', True)
.
comment:4 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 2.2 → master |
comment:5 by , 5 years ago
Thanks so much for the help Simon! This is a great jumping-off point.
There's something that I'm unclear about, which perhaps you can shed some light on. While I was able to replicate the bug with 2.2, when I try to create a test on Master to validate the bug, the group-by behavior seems to have changed.
Here's the test that I created:
def test_exact_selected_field_rhs_subquery(self): author_1 = Author.objects.create(name='one') author_2 = Author.objects.create(name='two') max_ids = Author.objects.filter(alias__isnull=True).values('alias').annotate(m=Max('id')).values('m') authors = Author.objects.filter(id=max_ids[:1]) self.assertFalse(str(max_ids.query)) # This was just to force the test-runner to output the query. self.assertEqual(authors[0], author_2)
And here's the resulting query:
SELECT MAX("lookup_author"."id") AS "m" FROM "lookup_author" WHERE "lookup_author"."alias" IS NULL GROUP BY "lookup_author"."alias", "lookup_author"."name"
It no longer appears to be grouping by the 'alias' field listed in the initial .values()
preceeding the .annotate()
. I looked at the docs and release notes to see if there was a behavior change, but didn't see anything listed.
Do you know if I'm just misunderstanding what's happening here? Or does this seem like a possible regression?
follow-up: 7 comment:6 by , 5 years ago
It's possible that a regression was introduced in between.
Could you try bisecting the commit that changed the behavior
comment:7 by , 5 years ago
Mmm actually disregard that. The second value in the GROUP BY
is due to the ordering
value in the Author class's Meta class.
class Author(models.Model): name = models.CharField(max_length=100) alias = models.CharField(max_length=50, null=True, blank=True) class Meta: ordering = ('name',)
Regarding the bug in question in this ticket, what should the desired behavior be if the inner query is returning multiple fields? With the fix, which allows the inner query to define a field to return/group by, if there are multiple fields used then it will throw a sqlite3.OperationalError: row value misused
.
Is this the desired behavior or should it avoid this problem by defaulting back to pk
if more than one field is selected?
comment:8 by , 5 years ago
I think that we should only default to pk
if no fields are selected. The ORM has preliminary support for multi-column lookups and other interface dealing with subqueries doesn't prevent passing queries with multiple fields so I'd stick to the current __in
lookup behavior.
comment:9 by , 5 years ago
Has patch: | set |
---|
The pull request can be found at the following link: https://github.com/django/django/pull/11797
Workaround: