Opened 11 years ago

Closed 11 years ago

#22278 closed Uncategorized (invalid)

Excessive GROUP BY when using the Count annotation

Reported by: Mark Jones Owned by: nobody
Component: Uncategorized Version: 1.6
Severity: Normal Keywords:
Cc: Shai Berger Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The following code illustrates a group by that seems computationally expensive since it is grouping by a large number of fields when it could be grouping by ID.

The generated SQL looks like this:

SELECT "foo_employee"."id", "foo_employee"."last_name", "foo_employee"."first_name", "foo_employee"."another_attribute", COUNT("foo_shift"."id") AS "overlapping_events" 
FROM "foo_employee" LEFT OUTER JOIN "foo_shift" ON ( "foo_employee"."id" = "foo_shift"."employee_id" ) 
WHERE ("foo_shift"."end_at" <= 2014-03-16 08:28:43.040428-07:00  AND "foo_shift"."start_at" >= 2014-03-16 08:28:43.040407-07:00 ) 
GROUP BY "foo_employee"."id", "foo_employee"."last_name", "foo_employee"."first_name", "foo_employee"."another_attribute" 
ORDER BY "overlapping_events" ASC, "foo_employee"."last_name" ASC, "foo_employee"."first_name" ASC

Everything after foo_employee.id in the GROUP BY isn't needed. Now the sql statement optimizer might be smart enough to figure this out, but then again, it might not.... Seems like if the pk is in the GROUP BY list, the rest of it isn't needed.

import datetime
from django.db import models

class Employee(models.Model):
    last_name = models.CharField(max_length=64)
    first_name = models.CharField(max_length=64)
    another_attribute = models.CharField(max_length=64)

class Shift(models.Model):
    start_at = models.DateTimeField()
    end_at = models.DateTimeField()
    employee = models.ForeignKey(Employee, related_name="shift_set")


start_at = datetime.datetime.now()
end_at = datetime.datetime.now()
qs = Employee.objects.filter(shift_set__start_at__gte=start_at,
                             shift_set__end_at__lte=end_at
                             ).annotate(overlapping_events=models.Count('shift_set'))
print qs.order_by('overlapping_events', 'last_name', 'first_name').query

Changing the Count('shift_set') to Count('id') has no effect on the group by.

Change History (1)

comment:1 by Shai Berger, 11 years ago

Cc: Shai Berger added
Resolution: invalid
Status: newclosed

Oh, it is needed on any backend that isn't MySql, and that is for two reasons:

1) The SQL standard is stupid; once you have aggregates, it doesn't allow any column to be selected unless it is in the group-by clause.

2) MySQL is stupid, and ignores the above restriction in the SQL standard.

The stupidity of MySQL is because it doesn't actually check that the group-by columns determine all the other selected columns, and so it allows queries with non-deterministic results.

Note: See TracTickets for help on using tickets.
Back to Top