Opened 7 years ago
Closed 7 years ago
#29416 closed Bug (fixed)
Undesired subquery added to the GROUP BY clause
Reported by: | Antoine Pinsard | Owned by: | Mariusz Felisiak |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 2.0 |
Severity: | Release blocker | Keywords: | groupby, subquery |
Cc: | Mariusz Felisiak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I am facing an issue while upgrading from Django 1.11 to Django 2.0.
I have a complex query interacting with a legacy MySQL database, which I simplified below to highlight the issue:
>>> from user.models import Sponsor >>> from django.db.models import ExpressionWrapper, Count, DecimalField >>> from django.db.models.expressions import RawSQL >>> nb_reports = RawSQL("SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2", []) >>> str(Sponsor.objects.all().annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'), output_field=DecimalField())).order_by('-report_rate').query)
This code, in Django 1.11.9, gives me the following query:
SELECT `ala_sponsor`.`sponId`, [...], ((SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId` ORDER BY `report_rate` DESC
This is the expected behavior and it works well.
However, in Django 2.0.5, the same code gives me this query:
SELECT `ala_sponsor`.`sponId`, [...], ((SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) ORDER BY `report_rate` DESC
As you can see, the ORM appended the subquery (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2)
to the GROUP BY clause. Which is wrong, and takes forever to execute.
I tried to play with .values('id')
or such as I usually do when I get unexpected GROUP BY. I spent an afternoon on it but there's no way I could get rid of this undesired group by clause. The order_by
is not to blame either. Here is another example of what I tried:
str(Sponsor.objects.all().values('id').annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'), output_field=DecimalField())).order_by().query)
Which gives:
SELECT `ala_sponsor`.`sponId`, ((SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) / COUNT(`ala_sponsor_need`.`asnId`)) AS `report_rate` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) ORDER BY NULL
Also note that this is the annotate(report_rate=ExpressionWrapper(nb_reports / Count('deliveries'), output_field=DecimalField()))
that causes this issue. If I only do annotate(nb_reports=nb_reports)
or annotate(nb_deliveries=COUNT('deliveries'))
there is no additional GROUP BY clause generated.
In [40]: str(Sponsor.objects.all().values('id').annotate(nb_reports=nb_reports).order_by().query) Out[40]: "SELECT `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) AS `nb_reports` FROM `ala_sponsor`" In [41]: str(Sponsor.objects.all().values('id').annotate(nb_deliveries=Count('deliveries')).order_by().query) Out[41]: 'SELECT `ala_sponsor`.`sponId`, COUNT(`ala_sponsor_need`.`asnId`) AS `nb_deliveries` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId` ORDER BY NULL' In [42]: str(Sponsor.objects.all().values('id').annotate(nb_reports=nb_reports, nb_deliveries=Count('deliveries')).order_by().query) Out[42]: "SELECT `ala_sponsor`.`sponId`, COUNT(`ala_sponsor_need`.`asnId`) AS `nb_deliveries`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) AS `nb_reports` FROM `ala_sponsor` LEFT OUTER JOIN `ala_sponsor_need` ON (`ala_sponsor`.`sponId` = `ala_sponsor_need`.`asnSponId`) GROUP BY `ala_sponsor`.`sponId`, (SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2) ORDER BY NULL"
Attachments (1)
Change History (24)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|
comment:2 by , 7 years ago
Description: | modified (diff) |
---|
comment:3 by , 7 years ago
comment:4 by , 7 years ago
So, here is the commit that introduced the change of behavior: https://github.com/django/django/commit/1d070d027c218285b66c0bde8079034b33a87f11
by , 7 years ago
Attachment: | test_regression_29416.py added |
---|
tests/annotations/test_regression_29416.py
comment:5 by , 7 years ago
Cc: | added |
---|
comment:6 by , 7 years ago
I think the issue is the getattr(expr, 'alias', None) not in pk_aliases
. Maybe it should be getattr(expr, 'alias', None) not in pk_aliases and not isinstance(expr, RawSQL)
or something like that?
comment:7 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Probably this condition should be more complex. I will dig into it on DjangoConEU sprints.
comment:8 by , 7 years ago
OK, FYI I was able to workaround the issue:
>>> nb_reports = RawSQL("SELECT COUNT(*) FROM pro_moderation WHERE objType='sponsor' AND objId=ala_sponsor.sponId AND state=2", []) >>> nb_reports.alias = 'nb_reports' >>> nb_reports.target = type('bare', (object,), {'primary_key': True})
So I will be able to upgrade to 2.0 meanwhile.
comment:9 by , 7 years ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:11 by , 7 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:15 by , 7 years ago
Has patch: | unset |
---|---|
Resolution: | fixed |
Status: | closed → new |
Triage Stage: | Ready for checkin → Accepted |
This broke a MySQL GIS test on MySQL 5.7:
gis_tests.geoapp.test_expressions.GeoExpressionsTests.test_multiple_annotation Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'test_django.geoapp_multifields.point' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by")
comment:16 by , 7 years ago
MySQL documentation for reference.
https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by
Tim, is the only MySQL 5.7 failing against this patch?
comment:18 by , 7 years ago
IMO query in this test (even if we add "ST_Distance(geoapp_multifields.point, ST_GeomFromText('POINT (-95.363151 29.763374)'))
" to the GROUP BY clause) doesn't have much sense. This is not a realistic use case. It also didn't work before 1d070d027c218285b66c0bde8079034b33a87f11.
SELECT `geoapp_city`.`name`, ST_Distance(`geoapp_multifields`.`point`, ST_GeomFromText('POINT (-95.363151 29.763374)')) AS `distance`, COUNT(`geoapp_multifields`.`id`) AS `count` FROM `geoapp_city` LEFT OUTER JOIN `geoapp_multifields` ON (`geoapp_city`.`id` = `geoapp_multifields`.`city_id`) GROUP BY `geoapp_city`.`id` ORDER BY `geoapp_city`.`id` ASC LIMIT 1
We can fix this test by changing:
diff --git a/tests/gis_tests/geoapp/test_expressions.py b/tests/gis_tests/geoapp/test_expressions.py index 2d0ebbcae0..89e83a782f 100644 --- a/tests/gis_tests/geoapp/test_expressions.py +++ b/tests/gis_tests/geoapp/test_expressions.py @@ -3,7 +3,7 @@ from unittest import skipUnless from django.contrib.gis.db.models import F, GeometryField, Value, functions from django.contrib.gis.geos import Point, Polygon from django.db import connection -from django.db.models import Count +from django.db.models import Count, Min from django.test import TestCase, skipUnlessDBFeature from ..utils import postgis @@ -56,7 +56,7 @@ class GeoExpressionsTests(TestCase): poly=Polygon(((1, 1), (1, 2), (2, 2), (2, 1), (1, 1))), ) qs = City.objects.values('name').annotate( - distance=functions.Distance('multifields__point', multi_field.city.point), + distance=Min(functions.Distance('multifields__point', multi_field.city.point)), ).annotate(count=Count('multifields')) self.assertTrue(qs.first())
or by adding multifields__point
to the values
, i.e. City.objects.values('name', 'multifields__point')
.
comment:23 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Can you bisect to find where the behavior changed?