Opened 14 years ago

Closed 6 years ago

#16025 closed Bug (fixed)

distinct does not apply to aggregated querysets

Reported by: Michael Owned by:
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ramiro Morales)

When applying .distinct() to a QuerySet, performing an aggregation using .aggregate() on that will only ensure that the aggregation result is distinct. As a result, this prevents you from doing a query where you want the input to the aggregation to have distinct rows.

Take the example models:

class Category(Model):
  label = CharField(max_length=128)

class Food(Model):
  name = CharField(max_length=128)
  categories = ManyToManyField(Category)
  kilojoules_per_kg = PositiveIntegerField()

If you then did a query on which you selected a couple of Categories as foreign keys (eg: "Pizza Toppings" and "Fruits"), you'd end up with a list of Foods that would contain duplicates where a food item fits in multiple categories (eg: "Apple", "Orange", "Pineapple", "Pineapple", "Bacon" and "Cheese" ...).

If you then wanted to then find the average number of kilojoules per kilogram for items in those groups and eliminate duplicates (Pineapple), the documentation would lead you to do this:

qs.distinct().aggregate(Avg('kilojoules_per_kg'))

However instead of counting Pineapple once, Pineapple would be counted twice, as the distinct() call would only apply to the result of the aggregation, not to the input of it.

>>> qs.distinct().aggregate(Avg('kilojoules_per_kg')) == qs.aggregate(Avg('kilojoules_per_kg'))
True

Instead, .distinct() before .aggregate() should instead cause a subquery to be executed to filter the results before passing it to the aggregation functions. There doesn't seem a way to presently execute a query like this using the Django ORM.

.distinct() after an aggregation should cause the results of the aggregation to be filtered so only distinct records are returned (which is the current behavior).

I've managed to confirm this behavior in Django 1.2.3 and 1.3.

Change History (10)

comment:1 by Michael, 14 years ago

I'm aware wrote "Average" here instead of "Avg" -- I meant "Avg". The rest of the bug still applies, I was trying to make an example of the error that was effecting my application using different models (because my application's models are rather complex), but have the same basic problem as described here.

Last edited 14 years ago by Michael (previous) (diff)

comment:2 by Ramiro Morales, 13 years ago

Component: Database layer (models, ORM)ORM aggregation
Triage Stage: UnreviewedAccepted

Does your Food model have any Meta.ordering model option or does the queryset have a order_by() call somewhere?

comment:3 by Michael, 13 years ago

In the example application I developed it doesn't (the only difference is I've implemented __unicode__, but in the application I'm actually writing it does.

Both ways give the same (erroneous) result.

comment:4 by Ramiro Morales, 13 years ago

Description: modified (diff)

comment:6 by Anssi Kääriäinen, 12 years ago

Component: ORM aggregationDatabase layer (models, ORM)

comment:7 by Vinay Sajip, 11 years ago

Version: 1.31.6

Still there in 1.6 :-(

comment:8 by mikalai, 10 years ago

Owner: changed from nobody to mikalai
Status: newassigned

comment:9 by mikalai, 10 years ago

Owner: mikalai removed
Status: assignednew

comment:10 by Simon Charette, 6 years ago

Resolution: fixed
Status: newclosed

I'm not sure when this got fixed but I can confirm that the DISTINCT is now applied to the subquery aggregated over by the expression provided in the aggregate call and not on the outer query which was effectively useless. I suspect that #23875 fixed it by moving the distinct special casing from the get_count method to the get_aggregation one.

Note: See TracTickets for help on using tickets.
Back to Top