Opened 3 years ago
Closed 3 years ago
#33655 closed Cleanup/optimization (fixed)
Unnecessary column in a GROUP BY clause with QuerySet.exists()
Reported by: | Marc Perrin | Owned by: | Marc Perrin |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | exists group by |
Cc: | Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've got a question about the interaction between exists() and group by.
For example something like:
Manager.values('field').annotate(cnt=Count('id')).filter(cnt__gt=1).exists()
The corresp. query with PostgreSQL looks like this:
SELECT (1) AS "a" FROM "app_model" GROUP BY "app_model"."field", (1) HAVING COUNT("app_model"."id") > 1 LIMIT 1
exists() (see https://github.com/django/django/blob/470708f50d8c13a50770893b8d7181f5218bf3ac/django/db/models/sql/query.py#L563) clears the SELECT clause and replaces it by (if I understand correctly) a hardcoded value 1 (as "a"), along with a limit of 1, which makes sense to me.
But get_group_by() (see https://github.com/django/django/blob/6b53114dd862ec97c282fdfdc83579cbd6d1560d/django/db/models/sql/compiler.py#L79) pushes this hardcoded value to the GROUP BY clause and we end up with the query above.
Now, on PostgreSQL, that works, but to me it sounds like it works by luck/lucky robustness... and certainly the same query without the , (1) in the GROUP BY clause yields the same result.
The problem is that outside of PostgreSQL that GROUP BY clause can be downright invalid...
Note that if I alter exists() to use {'a': 2} instead of {'a': 1}, it does not work anymore (on PostgreSQL), because (2) in the SELECT clause means the hardcoded number 2 while (2) in the GROUP BY clause presumably (??) refers to the 2nd expr of the SELECT clause, but we only got one...
My feeling is that get_group_by() should avoid adding extra/raw sql (or at least extra/raw pure values, if we can detect that...) to the group by expressions?
NB: the existing/old ticket that seems most related to my question would probably be: https://code.djangoproject.com/ticket/24835
Change History (15)
follow-up: 2 comment:1 by , 3 years ago
Cc: | added |
---|---|
Summary: | Interaction between exists() and group by → Unnecessary column in a GROUP BY clause with QuerySet.exists() |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
comment:2 by , 3 years ago
Thanks for the quick answer.
Your suggested workaround seems interesting indeed (provided the value of said column does not affect the exists() return value - for example if it's 0?)
NB: I didn't mean removing all constants from GROUP BY, rather: removing the automatic transmission of constants from the SELECT clause to the GROUP BY clause, since their meaning automatically switches from "constant number n" to "nth element of the select clause" which imo can never be intended.
comment:3 by , 3 years ago
provided the value of said column does not affect the exists() return value - for example if it's 0?
No, it's not. It's enough for the query to return any rows.
follow-up: 5 comment:4 by , 3 years ago
Maybe we should stop relying on extra and use Value(1)
instead which already disables grouping.
-
django/db/models/sql/query.py
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index 64e7927f7a..1097de4f79 100644
a b 27 27 OuterRef, 28 28 Ref, 29 29 ResolvedOuterRef, 30 Value, 30 31 ) 31 32 from django.db.models.fields import Field 32 33 from django.db.models.fields.related_lookups import MultiColSource … … def exists(self, using, limit=True): 582 583 q.clear_ordering(force=True) 583 584 if limit: 584 585 q.set_limits(high=1) 585 q.add_extra({"a": 1}, None, None, None, None, None) 586 q.set_extra_mask(["a"]) 586 q.add_annotation(Value(1), "a") 587 587 return q 588 588 589 589 def has_results(self, using):
Anything that relies on extra is a black box to the compiler as it could be an aggregate function or not.
follow-up: 6 comment:5 by , 3 years ago
Replying to Simon Charette:
Maybe we should stop relying on extra and use
Value(1)
instead which already disables grouping.
Great idea!
comment:6 by , 3 years ago
Sounds good indeed!
Replying to Mariusz Felisiak:
Replying to Simon Charette:
Maybe we should stop relying on extra and use
Value(1)
instead which already disables grouping.
Great idea!
follow-up: 10 comment:8 by , 3 years ago
Yeah, I guess I can do that, with Simon's suggestion the whole thing should be straightforward enough.
I'll take the ticket - I'll have to look up the contribution guide etc. / especially about the tests, as I haven't really contributed before.
comment:9 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:10 by , 3 years ago
Replying to Marc Perrin:
Yeah, I guess I can do that, with Simon's suggestion the whole thing should be straightforward enough.
I'll take the ticket - I'll have to look up the contribution guide etc. / especially about the tests, as I haven't really contributed before.
Great, thanks! In this ticket we are forced to check a generated SQL, IMO it's fine to add assertions to an existing test e.g. aggregation.tests.AggregateTestCase.test_aggregation_subquery_annotation_exists
:
with self.assertNumQueries(1) as ctx: self.assertTrue(publisher_qs.exists()) sql = ctx[0]["sql"] self.assertIn("SELECT 1 ", sql) self.assertNotIn("(1)", sql) # There is just one GROUP BY clause (zero commas means at most one # clause). self.assertEqual(sql[sql.index("GROUP BY") :].count(", "), 0)
comment:11 by , 3 years ago
Yeah, looks like the most relevant existing test case indeed, thanks for the pointer
comment:12 by , 3 years ago
Though after debugging this test, it might be a bit more complex than we want/need (e.g. there are 4 fields in the GROUP BY anyway...) and I might just add a new test e.g. "test_aggregation_annotation_exists" that gets close to what I mentioned originally (e.g.
Book.objects.values('publisher').annotate(cnt=Count('isbn')).filter(cnt__gt=1).exists()
)
For reference, with the fix, this query goes from:
SELECT (1) AS "a" FROM "aggregation_book" GROUP BY "aggregation_book"."publisher_id", (1) HAVING COUNT("aggregation_book"."isbn") > 1 LIMIT 1
to:
SELECT 1 AS "a" FROM "aggregation_book" GROUP BY "aggregation_book"."publisher_id" HAVING COUNT("aggregation_book"."isbn") > 1 LIMIT 1
(with the test default settings i.e. sqlite I guess)
I'll try to be diligent with the asserts though, because if the test must work on all backends, even
self.assertIn("SELECT 1 ", sql)
isn't good, because we can get SELECT TOP 1 1 (...) instead of SELECT 1 (...) LIMIT 1
So perhaps a simple
self.assertNotIn(", (1)", sql)
would be enough (I'm not really fond of your assert about GROUP BY either, because it's not really targeting the GROUP BY clause, it includes everything after it incl. HAVING clause)
comment:13 by , 3 years ago
But anyway I've started the PR: https://github.com/django/django/pull/15630 se we can discuss the details in there
comment:14 by , 3 years ago
Has patch: | set |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Thanks for the report. I agree that
(1)
is unnecessary in theGROUP BY
clause unfortunately we cannot always remove constants from it because folks can use them as an alias for n-th column (as you already noticed). Maybe we could select only the first column from theGROUP BY
clause instead:django/db/models/sql/query.py
Tentatively accepted for future investigation.