#31150 closed Bug (fixed)
Subquery annotations are omitted in group by query section if multiple annotation are declared
Reported by: | Johannes Maron | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.0 |
Severity: | Release blocker | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Sadly there is more regression in Django 3.0.2 even after #31094.
Background: It's the same query as #31094. I tried upgrading to Django 3.0.2 and now I get duplicate results. Even tho they query should be distinct. Where on 2.2 the queryset yields 490 results, it's 519 on 3.0.
A quick diff on the queries still reveals a different grouped by section:
This is the new query on 3.0.2:
SELECT DISTINCT "camps_offer"."id", "camps_offer"."title", "camps_offer"."slug", "camps_offer"."is_active", "camps_offer"."modified", "camps_offer"."created", "camps_offer"."provider_id", "camps_offer"."activity_type", "camps_offer"."description", "camps_offer"."highlights", "camps_offer"."important_information", "camps_offer"."min_age", "camps_offer"."max_age", "camps_offer"."food", "camps_offer"."video", "camps_offer"."accommodation", "camps_offer"."accommodation_type", "camps_offer"."room_type", "camps_offer"."room_size_min", "camps_offer"."room_size_max", "camps_offer"."external_url", "camps_offer"."application_form", "camps_offer"."caseload", "camps_offer"."field_trips", MIN(T4."retail_price") AS "min_retail_price", (SELECT U0."id" FROM "camps_servicepackage" U0 INNER JOIN "camps_region" U2 ON (U0."region_id" = U2."id") WHERE (U0."company_id" = 1 AND U0."option" = "camps_offer"."activity_type" AND ST_Contains(U2."locations", T4."position")) LIMIT 1) AS "in_package", "camps_provider"."id", "camps_provider"."title", "camps_provider"."slug", "camps_provider"."is_active", "camps_provider"."modified", "camps_provider"."created", "camps_provider"."logo", "camps_provider"."description", "camps_provider"."video", "camps_provider"."external_url", "camps_provider"."terms", "camps_provider"."cancellation_policy", "camps_provider"."privacy_policy", "camps_provider"."application_form" FROM "camps_offer" LEFT OUTER JOIN "camps_bookingoption" ON ("camps_offer"."id" = "camps_bookingoption"."offer_id") INNER JOIN "camps_provider" ON ("camps_offer"."provider_id" = "camps_provider"."id") INNER JOIN "camps_bookingoption" T4 ON ("camps_offer"."id" = T4."offer_id") WHERE ("camps_offer"."is_active" = True AND "camps_provider"."is_active" = True AND T4."end" >= STATEMENT_TIMESTAMP() AND T4."is_active" = True AND "camps_offer"."max_age" >= 5 AND "camps_offer"."min_age" <= 13 AND (SELECT U0."id" FROM "camps_servicepackage" U0 INNER JOIN "camps_region" U2 ON (U0."region_id" = U2."id") WHERE (U0."company_id" = 1 AND U0."option" = "camps_offer"."activity_type" AND ST_Contains(U2."locations", T4."position")) LIMIT 1) IS NOT NULL) GROUP BY "camps_offer"."id", T4."position", "camps_provider"."id" ORDER BY "camps_offer"."created" ASC
And what it was (and should be) on 2.2.9:
SELECT DISTINCT "camps_offer"."id", "camps_offer"."title", "camps_offer"."slug", "camps_offer"."is_active", "camps_offer"."modified", "camps_offer"."created", "camps_offer"."provider_id", "camps_offer"."activity_type", "camps_offer"."description", "camps_offer"."highlights", "camps_offer"."important_information", "camps_offer"."min_age", "camps_offer"."max_age", "camps_offer"."food", "camps_offer"."video", "camps_offer"."accommodation", "camps_offer"."accommodation_type", "camps_offer"."room_type", "camps_offer"."room_size_min", "camps_offer"."room_size_max", "camps_offer"."external_url", "camps_offer"."application_form", "camps_offer"."caseload", "camps_offer"."field_trips", MIN(T4."retail_price") AS "min_retail_price", (SELECT U0."id" FROM "camps_servicepackage" U0 INNER JOIN "camps_region" U2 ON (U0."region_id" = U2."id") WHERE (U0."company_id" = 1 AND U0."option" = ("camps_offer"."activity_type") AND ST_Contains(U2."locations", (T4."position"))) LIMIT 1) AS "in_package", "camps_provider"."id", "camps_provider"."title", "camps_provider"."slug", "camps_provider"."is_active", "camps_provider"."modified", "camps_provider"."created", "camps_provider"."logo", "camps_provider"."description", "camps_provider"."video", "camps_provider"."external_url", "camps_provider"."terms", "camps_provider"."cancellation_policy", "camps_provider"."privacy_policy", "camps_provider"."application_form" FROM "camps_offer" LEFT OUTER JOIN "camps_bookingoption" ON ("camps_offer"."id" = "camps_bookingoption"."offer_id") INNER JOIN "camps_provider" ON ("camps_offer"."provider_id" = "camps_provider"."id") INNER JOIN "camps_bookingoption" T4 ON ("camps_offer"."id" = T4."offer_id") WHERE ("camps_offer"."is_active" = True AND "camps_provider"."is_active" = True AND T4."end" >= (STATEMENT_TIMESTAMP()) AND T4."is_active" = True AND (SELECT U0."id" FROM "camps_servicepackage" U0 INNER JOIN "camps_region" U2 ON (U0."region_id" = U2."id") WHERE (U0."company_id" = 1 AND U0."option" = ("camps_offer"."activity_type") AND ST_Contains(U2."locations", (T4."position"))) LIMIT 1) IS NOT NULL) GROUP BY "camps_offer"."id", (SELECT U0."id" FROM "camps_servicepackage" U0 INNER JOIN "camps_region" U2 ON (U0."region_id" = U2."id") WHERE (U0."company_id" = 1 AND U0."option" = ("camps_offer"."activity_type") AND ST_Contains(U2."locations", (T4."position"))) LIMIT 1), "camps_provider"."id" ORDER BY "camps_offer"."created" ASC
Change History (20)
comment:1 by , 5 years ago
Description: | modified (diff) |
---|
comment:2 by , 5 years ago
comment:3 by , 5 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
I really try but without a queryset I was not able to reproduce this issue.
comment:4 by , 5 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
@felixxm, it seems that Subquery
annotation are omitted from the grouping section in 3.0. I will try to create a test case today.
If you add the following test to 2.2.* anywhere in tests.aggregation
it will pass, in 3.0 it will fail, because the subquery is missing from the grouping bit:
def test_filtered_aggregate_ref_subquery_annotation_e_31150(self): from django.db.models import OuterRef, Subquery aggs = Author.objects.annotate( earliest_book_year=Subquery( Book.objects.filter( contact__pk=OuterRef('pk'), ).order_by('pubdate').values('pubdate__year')[:1] ), ).annotate(max_id=Max('id')) print(aggs.query) self.assertIn(str(aggs.query), """ SELECT "aggregation_author"."id", "aggregation_author"."name", "aggregation_author"."age", (SELECT django_date_extract('year', U0."pubdate") FROM "aggregation_book" U0 WHERE U0."contact_id" = ("aggregation_author"."id") ORDER BY U0."pubdate" ASC LIMIT 1) AS "earliest_book_year", MAX("aggregation_author"."id") AS "max_id" FROM "aggregation_author" GROUP BY "aggregation_author"."id", "aggregation_author"."name", "aggregation_author"."age", (SELECT django_date_extract('year', U0."pubdate") FROM "aggregation_book" U0 WHERE U0."contact_id" = ("aggregation_author"."id") ORDER BY U0."pubdate" ASC LIMIT 1) """)
comment:5 by , 5 years ago
Summary: | SELECT DISTINCT is not not so distict → Subquery annotations are omitted in group by query section if multiple annotation are declared |
---|
comment:6 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|
OK, thank you for the test case Joe.
I added the import line so it pops straight into ,aggregation.tests.AggregateTestCase
and then, yes, there's a change of behaviour between 2.2 and 3.0.
comment:7 by , 5 years ago
The generated query effectively changed but the returned resultset should be the same since the subquery is a function of the outer query's primary key and the query is grouped by the outer query primary key.
I don't think asserting against the generated query string is a proper test case; the generated SQL will change between Django versions but the returned result set should be the same.
comment:8 by , 5 years ago
Yes. I didn’t take the test case as final. Just illustrating the issue. 😀
If it’s equivalent then there’s no issue, but I’m assuming Joe is seeing what he thinks is a substantive regression?
comment:9 by , 5 years ago
Replying to Simon Charette:
The generated query effectively changed but the returned resultset should be the same since the subquery is a function of the outer query's primary key and the query is grouped by the outer query primary key.
I came from a wrong result set and deduced the error from here on out. It took me a while to figure it out to, but sadly grouping by the outer reference is not the same as grouping by the query. Here an example:
with t_w_douplicates as ( select abs(n) as pk, n as val from generate_series(-2, 2, 1) n ) select pk, ( select n from generate_series(2, 2) n where n = val ) from t_w_douplicates where ( select n from generate_series(2, 2) n where n = val ) is null GROUP BY pk, ( select n from generate_series(2, 2) n where n = val );
Which returns
(0, null) (1, null) (2, null)
And just using the outer ref:
with t_w_douplicates as ( select abs(n) as pk, n as val from generate_series(-2, 2, 1) n ) select pk, ( select n from generate_series(2, 2) n where n = val ) from t_w_douplicates where ( select n from generate_series(2, 2) n where n = val ) is null GROUP BY pk, val;
Which returns 4 results:
(2, null) (1, null) (0, null) (1, null)
I don't think asserting against the generated query string is a proper test case; the generated SQL will change between Django versions but the returned result set should be the same.
Certainly, haha, that was just to illustrate how to get from the ORM to the SQL query in the description. The SQL sample now, shows how they produce different result sets.
comment:10 by , 5 years ago
Thanks for the extra details, I think I have a better picture of what's happening here.
In the cases where an outer query spawns a multi-valued relationship and the subquery references one of these multi-valued relationship we must keep grouping by the subquery.
In your original reports the INNER JOIN "camps_bookingoption" T4
joins spawns multiple rows for the same camps.Offer
and then your subquery filter against (T4."position")
.
In ORM terms that means we must keep returning self
in Subquery.get_group_by_cols
when any of our OuterRef
include a __
which could point to a multi-valued relationship.
We could go a bit further and only disable the optimization if any of the outer-ref in the __
chain is multi-valued by relying on getattr(rel, 'many_to_many', True)
introspection but in all cases that means adding a way to taint Col
instances with whether or not they are/could be multi-valued so get_group_by_cols
can return self
if any of get_external_cols
is multi-valued
I could see this tainting as being useful to warn about #10060 in the future since the logic could be based on this column or alias tainting.
I'm happy to submit a patch or discuss any alternatives but it feels like the above solution solves the reported problem while maintaining the optimization for most of the cases.
comment:11 by , 5 years ago
I give a quick shot at the above solution and came with this rough patch
https://github.com/django/django/compare/master...charettes:ticket-31150
Johannes, could you give it a shot and report whether or not it addresses your problem? Do you confirm your original queryset was using an OuterRef
with a reference to an outer-query multi-valued relationship?
comment:12 by , 5 years ago
I tested Simon's patch applied to 3.0.3 and it now works as expected. So yes, I'd say go ahead!
comment:13 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:15 by , 5 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Johannes, I need to repeat my gentle request for a queryset (see comment:2 and comment:6 ) Can you provide a queryset? It's really hard to restore the original queryset from a raw SQL.