#26430 closed Bug (fixed)
Coalesce in Aggregations ignored when EmptyResultSet returned
Reported by: | Ryan Prater | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | aggregation coalesce in queryset |
Cc: | Simon Charette, Hannes Ljungberg, Anton Agestam, pope1ni | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Using an empty list when using the __in=
returns an EmptyResultSet and prevents an Aggregate Coalesce from working properly. See below:
# Test with matched Queryset. Sum will return 50
OrderItem.objects.filter(pk__in=[1]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>>{'test': 50
}
# Test with unmatched Queryset. Sum will return 0
OrderItem.objects.filter(pk__in=[-1]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>> {'test':0
}
# Test with unmatched Queryset (using empty list). EmptyResultSet returned because of empty list. Sum will return NONE
OrderItem.objects.filter(pk__in=[]).aggregate(test=Coalesce(Sum('quantity'), Value(0)))
>>> {'test': None
}
Simon Charette on django-users suggested the following:
From what I understand the ORM simply doesn't perform any query in this case
as the pk__in
lookup cannot match any OrderItem
and result in an
EmptyResultSet
exception[1].
This exception is caught in the Query.get_aggregation()
method where all
aggregates are converted to None
[2].
I suppose we should alter the except EmptyResultSet
clause to account for
outer_query.annotation_select
items that are Coalesce()
instances used with
Value()
but I'm unsure about how it should be done.
[1] https://github.com/django/django/blob/2e0cd26ffb29189add1e0435913fd1490f52b20d/django/db/models/lookups.py#L221-L223
[2] https://github.com/django/django/blob/2e0cd26ffb29189add1e0435913fd1490f52b20d/django/db/models/sql/query.py#L439-L445
See full discussion here:
https://groups.google.com/forum/#!topic/django-users/HGD3Vv3IerA
Change History (21)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
I can agree that detecting specific expressions is overkill and for that matter, not scalable. However, I don't think an "empty list in a filter clause seems like broken behavior". As you mentioned about valid use cases: I'm using this with a SelectMultiple which obviously allows users to select no options. Writing custom validation to prevent an empty query should be the responsibility of the developer, but not enforced at the framework level.
It seems like this could be avoided by modifying the logic of In.rhs
(and/or not throwing the EmptyResultSet
at all?), but this is my first bug and I'm not versed enough to discern.
comment:3 by , 9 years ago
I agree with Josh that the on_empty_result
approach seems like a lot of work for very little gain but I can't see any other way to fix this issue appropriately.
@ryanprater not throwing EmptyResultSet
in the In
lookup would solve your particular issue but the underlying one would remain, using Coalesce
with aggregation can result in an unexpected return value as EmptyResultSet
is thrown by other part of Django (queryset.none()
relies on it).
comment:4 by , 9 years ago
Maybe we could somehow skip throwing EmptyResultSet when doing an .aggregate() call, but throw it when doing normal queries. One idea is to use query.add_context('in_aggregate_query', True), and then check compiler.query.context before throwing the EmptyResultSet.
Another possible idea is to add as_python() methods to pretty much every part of the ORM. The as_python() method would run the expressions as Python code, so you could run whole querysets without actually touching the database. For the case where the query for .aggregate() generates an EmptyResultSet we would run the queryset in Python against one row having all None values, thus producing the wanted answer. This could be extremely convenient for testing, too. OTOH doing this properly will require at least a couple of months of work, so this is for sure overkill for this issue.
comment:5 by , 9 years ago
FWIW I gave a naive try at implementing the get_empty_result()
approach suggested by Josh:
https://github.com/django/django/pull/6361
The full suite pass but coalesced results relying on database functions (e.g. NOW()
) are not supported.
comment:6 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
See #26433 for related issue and a new idea for solving this.
Marking as accepted, I think we can solve this without unreasonable effort.
follow-up: 8 comment:7 by , 5 years ago
This happens to be addressed by https://github.com/django/django/pull/12602
comment:8 by , 4 years ago
Replying to Simon Charette:
This happens to be addressed by https://github.com/django/django/pull/12602
I was looking into reviving https://github.com/django/django/pull/12602.
It seems that some parts of that were already addressed in c2d4926702045e342a668057f0a758eec9db9436.
Notably this solved the None
instead of 0
issue for Count()
in aggregation_regress.tests.AggregationTests.test_empty_filter_aggregate
.
It doesn't seem to be true that this ticket will be addressed by that PR as implied by this comment.
Changing the test to use Coalesce()
as below still fails with total_age
being None
. There is nothing to determine that the second argument of Coalesce()
should be used.
self.assertEqual( - Author.objects.filter(id__in=[]).annotate(Count('friends')).aggregate(Count('pk')), - {'pk__count': 0}, + Author.objects.filter(pk__in=[]).annotate(Count('friends')).aggregate(total_age=Coalesce(Sum('age'), 0)), + {'total_age': 0}, )
In the case of Count()
we get 0 because of Count.convert_value()
explicitly returning 0
if the value is None
.
It seems that https://github.com/django/django/pull/6361 would be the way forward.
Was there a reason you abandoned that approach, Simon? Would you be happy for me to pick that up again?
comment:9 by , 4 years ago
Cc: | added |
---|
comment:10 by , 4 years ago
Cc: | added |
---|
comment:11 by , 4 years ago
Cc: | added |
---|
comment:12 by , 3 years ago
Sorry for the delayed answer Nick.
It seems that https://github.com/django/django/pull/6361 would be the way forward. Was there a reason you abandoned that approach, Simon? Would you be happy for me to pick that up again?
I think it would be the most straightforward way to get there but the main issue with the get_empty_result
approach is coalescence to database expressions e.g. Coalesce('col', Now())
.
It's tempting to mock our way there and cheat for provided expressions in implementing Python equivalent (e.g. Now.get_empty_result -> timezone.now
) but it has its limits (e.g. imagine coalescing to calling a database procedure). In the end EmptyResultSet
is meant to be an optimization so I believe we should turn it off in circumstances where it affects the correctness of the returned results.
A simpler approach, inspired by Anssi's comment:4, would be to add a SQLCompiler.supports_empty_result_set = False
property and set it to False
for SQLAggregateCompiler
. Parts of the code that raise this exception could then be adjusted not to do so when the current compiler doesn't support it.
(completely untested)
-
django/db/models/lookups.py
diff --git a/django/db/models/lookups.py b/django/db/models/lookups.py index 8d3648b393..0b02287d2a 100644
a b def process_rhs(self, compiler, connection): 392 392 except TypeError: # Unhashable items in self.rhs 393 393 rhs = [r for r in self.rhs if r is not None] 394 394 395 if not rhs :395 if not rhs and compiler.supports_empty_result_set: 396 396 raise EmptyResultSet 397 397 398 398 # rhs should be an iterable; use batch_process_rhs() to -
django/db/models/sql/compiler.py
diff --git a/django/db/models/sql/compiler.py b/django/db/models/sql/compiler.py index 7264929da8..6cfcbf7f9f 100644
a b class SQLCompiler: 25 25 r'^(.*)\s(?:ASC|DESC).*', 26 26 re.MULTILINE | re.DOTALL, 27 27 ) 28 supports_empty_result_set = True 28 29 29 30 def __init__(self, query, connection, using): 30 31 self.query = query … … def pre_sql_setup(self): 1636 1637 1637 1638 1638 1639 class SQLAggregateCompiler(SQLCompiler): 1640 # Due to the nature of aggregation coalescence some expressions might 1641 # need to be interpreted on the database to return the correct value 1642 # when dealing with an empty result set. 1643 supports_empty_result_set = False 1644 1639 1645 def as_sql(self): 1640 1646 """ 1641 1647 Create the SQL for this query. Return the SQL string and list of
I guess we could also use an hybrid approach where SQLAggregateCompiler
only sets supports_empty_result_set = False
if any of outer_query.annotation_select.items()
pass through .get_empty_result()
is not a literal value prior to proceeding with the compilation/as_sql
phase. That would keep the optimization for most cases but fallback to the database when it cannot be computed on the Python side.
comment:13 by , 3 years ago
Has patch: | set |
---|
Tried implementing the above in https://github.com/django/django/pull/14430
comment:14 by , 3 years ago
Cc: | added |
---|
comment:15 by , 3 years ago
Needs documentation: | set |
---|---|
Owner: | changed from | to
Patch needs improvement: | set |
Status: | new → assigned |
comment:17 by , 3 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
comment:18 by , 3 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I do agree that this is a bug, but I don't think the correct fix is to detect specific expressions and apply custom logic for each one. Simon mentioned nested expressions which would be very difficult to resolve the correct behaviour for.
For what it's worth, a
Count
will also showNone
in an EmptyResultSet even though its converters should coerceNone
to0
. The reason it doesn't is because the expressions and aggregates aren't actually used. Everything is Noned.If I was writing this from scratch I'd raise an error on receiving an EmptyResultSet. An empty list in a filter clause seems like broken behaviour. We can't do that though, because it's backwards incompatible, and there's probably valid use cases for this behaviour now.
A possible fix would be to implement something like
on_empty_result
for Expressions, to let each subclass decide what it should return if it's not executed. It would have to recursively call this method, and then decide which children should be allowed to propagate their values. But honestly, this seems like a lot of work and complexity for very little gain.Short of documentation changes or a deprecation, I'd be tempted to wontfix this. I'm certainly open to ideas I haven't considered though.