#32006 closed Bug (wontfix)
Aggregation generates wrong SQL query on PostgreSQL partionned tables.
Reported by: | Konstantin Popov | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.1 |
Severity: | Normal | Keywords: | QuerySet values annotate GROUP BY |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I use values and annotate on QuerySet to make query for my report.
When I add 'id' or 'pk' field to another fields in values it becomes the only field in GROUP BY clause.
This, obviously, generates an invalid SQL query.
str(Currency.objects.filter(status=3).values('sifr','guid').annotate(cnt=Count('*')).query)
'SELECT "rk7data_currency"."sifr", "rk7data_currency"."guid", COUNT(*) AS "cnt" FROM "rk7data_currency" WHERE "rk7data_currency"."status" = 3 GROUP BY "rk7data_currency"."sifr", "rk7data_currency"."guid"'
str(Currency.objects.filter(status=3).values('sifr','guid','id').annotate(cnt=Count('*')).query)
'SELECT "rk7data_currency"."sifr", "rk7data_currency"."guid", "rk7data_currency"."id", COUNT(*) AS "cnt" FROM "rk7data_currency" WHERE "rk7data_currency"."status" = 3 GROUP BY "rk7data_currency"."id"'
I have checked this on several models with the same result.
I have discovered this on django 2.2.10. Upgrade to 2.2.16 and even to 3.1 does not solve the problem.
Change History (5)
comment:1 by , 4 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Summary: | QuerySet with values with annotate generates wrong SQL query → QuerySet with values with annotate generates wrong SQL query on PostgreSQL. |
comment:2 by , 4 years ago
But this does not work properly! I receive error from postgresql. It says other fields should be in GROUP BY.
This example query is simplified, the real query includes several joined tables.
SELECT "rk7data_globalshift"."shiftdate" AS "date", "rk7data_restaurant"."name" AS "rest", "rk7data_printcheck"."check_number" AS "chck", "rk7data_printcheck"."moment" AS "check_mom", "rk7data_printcheck"."id" AS "check_id", "rk7data_printcheck"."period" AS "check_period", COUNT("rk7data_printcheck"."check_number") AS "qnt", SUM("rk7data_printcheck"."binded_sum") AS "pay" FROM "rk7data_printcheck" INNER JOIN "rk7data_order" ON ("rk7data_printcheck"."order_id" = "rk7data_order"."id" AND "rk7data_printcheck"."period" = "rk7data_order"."period") INNER JOIN "rk7data_globalshift" ON ("rk7data_order"."shift_id" = "rk7data_globalshift"."id") INNER JOIN "rk7data_midserver" ON ("rk7data_globalshift"."midserver_id" = "rk7data_midserver"."id") INNER JOIN "rk7data_restaurant" ON ("rk7data_midserver"."restaurant_id" = "rk7data_restaurant"."id") WHERE ("rk7data_printcheck"."deleted" = %s AND "rk7data_printcheck"."isbill" = %s AND "rk7data_order"."shift_id" IN (SELECT U0."id" FROM "rk7data_globalshift" U0 WHERE U0."shiftdate" BETWEEN %s AND %s) AND "rk7data_printcheck"."period" IN (%s, %s, %s)) GROUP BY "rk7data_globalshift"."shiftdate", "rk7data_restaurant"."name", "rk7data_printcheck"."id" ORDER BY "rest" ASC, "date" ASC, "chck" ASC
And the error message is
ERROR: column "rk7data_printcheck.check_number" must appear in GROUP BY clause or used in aggregate function LINE 1: ... AS "date", "rk7data_restaurant". "Name" AS "rest", "rk7data_p ...
I have found workaround for this, I have wrapped 'id' field with Coalesce function, but I don't think it is good solution.
Will you reopen this issue now?
comment:3 by , 4 years ago
Please provide a reduced set of model that results in the must appear in GROUP BY clause or used in aggregate function
exception in order to help diagnose your issue.
Is rk7data_printcheck
a view or table? If it's a view you'll likely need to mark it as Meta.managed = False
.
Something seems also odd with this query, why is "rk7data_printcheck"."check_number"
both aliased an counted against via COUNT("rk7data_printcheck"."check_number")
.
comment:4 by , 4 years ago
I just now realized what is causing the problem.
I use declarative partitioning in some tables, so the primary key must include the partitioning key and has two fields, 'id' and 'period'.
To use such tables with Django, I developed my own versions of the CreateModel and AddField migration operations, as well as my own version of the ForeignKey field.
However, the Django Model class code still considers its primary key to be a single 'id' field.
I have already discovered a number of side effects of this solution. Now there is another one.
It may be best to leave this ticket closed. Please accept my apologies for the needless concern.
I am ready to provide an abbreviated version of my code here if anyone is interested.
comment:5 by , 4 years ago
Resolution: | invalid → wontfix |
---|---|
Summary: | QuerySet with values with annotate generates wrong SQL query on PostgreSQL. → Aggregation generates wrong SQL query on PostgreSQL partionned tables. |
I have already discovered a number of side effects of this solution. Now there is another one. It may be best to leave this ticket closed. Please accept my apologies for the needless concern. I am ready to provide an abbreviated version of my code here if anyone is interested
Thanks for the following report. The reason why I was asking what kind of table rk7data_printcheck
was is that the group by selected table primary key optimization is known to break for views, foreign tables, and any other table-like objects that PostgreSQL cannot introspect enough to determine this optimization is safe.
The optimization does make a huge difference when aggregating over tables containing columns costly to group so we decided to keep it around even with the aforementioned caveats. We did add a per-model hook to disable though if that can be of any help to you #28107.
PostgreSQL allows grouping by selected PKs, so optimized query works properly, see #19259.