#34564 closed Bug (invalid)
returning None instead of zero in Count annotation
Reported by: | Amin Aminian | Owned by: | Amin Aminian |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | count, orm, annotate |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
I am trying to upgrade the Django's version from 3.2 to 4.2 in my project. after installation, some of my tests were broken.
The problem:
I am annotating a queryset with Count function, and the count result in some objects should be zero.
This is my code:
Order.objects.annotate(count=Count("customer", distinct=True))
In Django==3.2, if some orders don't have any customers, the count field would be zero (as I want).
But in Django==4.2, if some orders don't have any customers, the count field would be None, which breaks my code.
I read the source code for both versions. In version 3.2, we have a method called convert_value
in Count
class in aggregates
module. The method is:
class Count(Aggregate): ... def convert_value(self, value, expression, connection): return 0 if value is None else value
As we can see, this function is responsible to return 0 instead of None.
But this method is deleted in version 4.2, and the conversion is handling in convert_value
property in BaseExpression
class in expressions
module:
class BaseExpression: ... @cached_property def convert_value(self): field = self.output_field internal_type = field.get_internal_type() if internal_type == "FloatField": return ( lambda value, expression, connection: None if value is None else float(value) ) elif internal_type.endswith("IntegerField"): return ( lambda value, expression, connection: None if value is None else int(value) ) elif internal_type == "DecimalField": return ( lambda value, expression, connection: None if value is None else Decimal(value) ) return self._convert_value_noop
In this property we are return a lambda function to be called later, and when we call this lambda, what ever we pass to it, we would get None as the result.
Should it be something like returning 0, if the value is None ?
Change History (22)
comment:1 by , 18 months ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
follow-ups: 3 5 comment:2 by , 18 months ago
Cc: | added |
---|
I think this should be considered a release blocker as this is an unintended regression that will affect quite a few users given how common count annotations are over multiple valued relationships.
Unlike ArrayAgg
and friends in fee87345967b3d917b618533585076cbfa43451b there was not deprecation period to force the usage of default=0
for Count
in 9f3cce172f6913c5ac74272fa5fc07f847b4e112.
A simple solution for a backport would be to adjust Count.__init__
to do extra.setdefault("default", 0)
(or define in it's signature for enhanced introspection) so Coalesce
is used.
Happy to submit a patch if you agree.
Amin, you can use Count(..., default=0)
in the mean time.
comment:3 by , 18 months ago
Resolution: | invalid |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
OK, let's accept this.
Replying to Simon Charette:
I think this should be considered a release blocker as this is an unintended regression that will affect quite a few users given how common count annotations are over multiple valued relationships.
This is a regression in Django 4.0, so it doesn't qualify for a backport anymore.
follow-up: 6 comment:4 by , 18 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:5 by , 18 months ago
Replying to Simon Charette:
I think this should be considered a release blocker as this is an unintended regression that will affect quite a few users given how common count annotations are over multiple valued relationships.
Unlike
ArrayAgg
and friends in fee87345967b3d917b618533585076cbfa43451b there was not deprecation period to force the usage ofdefault=0
forCount
in 9f3cce172f6913c5ac74272fa5fc07f847b4e112.
A simple solution for a backport would be to adjust
Count.__init__
to doextra.setdefault("default", 0)
(or define in it's signature for enhanced introspection) soCoalesce
is used.
Happy to submit a patch if you agree.
Amin, you can use
Count(..., default=0)
in the mean time.
I have read 9f3cce172f6913c5ac74272fa5fc07f847b4e112. The thing that i'm not understanding is why we want to return None instead of zero. What is the reason ? How this is benefiting us ?
As far as I have read the changes, in Django==4.2, we have an attribute called empty_result_set_value
which is equal to NotImplemented
in BaseExpression
:
class BaseExpression: empty_result_set_value = NotImplemented ...
And aggregate classes have implemented this. All of classes have empty_result_set_value = None
instead of Count
class and RegrCount
class, which it is empty_result_set_value = 0
in those classes. And because of this attr, we can't use default=0
:
class Aggregate(Func): ... empty_result_set_value = None def __init__( self, *expressions, distinct=False, filter=None, default=None, **extra ): ... if default is not None and self.empty_result_set_value is not None: raise TypeError(f"{self.__class__.__name__} does not allow default.") ...
So as far as I understand, we are considering empty_result_set_value
as kind of a default value. So why don't we just return empty_result_set_value
in case of being None in convert_value
property ?
comment:6 by , 18 months ago
Replying to Mariusz Felisiak:
Can't it be assigned to me ? I really would love to solve it, if we could deal with the solution.
comment:7 by , 18 months ago
Owner: | changed from | to
---|
comment:8 by , 18 months ago
Thanks for your consideration Mariusz, I missed that it wasn't a recent regression, time fly by!
If that can help you Amin, the ORM compiler has some logic in place to prevent doing database queries when it knows that it cannot match any rows (e.g. id__in=[]
). When this happens (aggregation or when annotating subqueries that don't match any rows) we still need to provide some values to the user that are coherent with what would have happened if the query was still performed, that's when empty_result_set_value
kicks in. You might notice in this patch that QuerySet.none()
is used, that's a public way of making sure that evaluating the queryset results in no query.
Through a series of patches that tried to make the logic more coherent in this area, and the introduction of Aggregate(..., default=default)
which is basically a shorthand for Coalesce(Aggregate(...), default)
, the long standing empty().aggregate(cnt=Count())
issue where {"cnt": None}
was returned was fixed (due to Count.empty_result_set_value = 0
) but surprisingly (this is mind blowing to me) we didn't have any tests for the case where .annotate(cnt=Cnt("possibly_empty_multi_value"))
so convert_value
was wrongly removed assuming that it was covered by the adjusted logic.
As for the patch I could see us going to ways
- Drop
Count.empty_result_set_value
to useextra["default"] = 0
soCoalesce(..., 0)
is systematically used and we inherit theempty_result_set_value
from theCoaesce
andValue
chain. This might require adjusting some tests that check the exact SQL generated or assert thatCount(default)
cannot be used. - Simply add back the
convert_value
that were removed in 9f3cce172f6913c5ac74272fa5fc07f847b4e112
- seems more desirable to me as it pushes the logic to the database and avoids Python time conversions (
convert_value
comes at a cost as its run for each row) while 2. is less invasive as it requires no test changes and keeps the generated SQL the same (noCoalesce
wrapping for each usage ofCount
).
Whatever solution we choose we should make sure to add address the same issue we have with RegrCount
.
comment:9 by , 18 months ago
Thanks Simon for explaining empty_result_set_value
usage.
So, as far as I understood, if the query is going to return None, we return empty_result_set_value
instead, without running the query. Right ? If it's right, there is still something that I'm missing.
My point is, assume that we have a query that the compiler does not know that it is going to return None (normal queries like objects.annotate(count=Count("..."))
). In this case, the query is going to be run, and the result would be fetched. Now why we can't use empty_result_set_value
in case of result is None ?
What I am saying is a code like this:
class BaseExpression: ... @cached_property def convert_value(self): field = self.output_field internal_type = field.get_internal_type() ... elif internal_type.endswith("IntegerField"): return ( lambda value, expression, connection: self.empty_result_set_value # instead of None if value is None else int(value) ) ... return self._convert_value_noop
And about Python time conversions, we are already running this convert_property
for each row.
comment:10 by , 18 months ago
So, as far as I understood, if the query is going to return None, we return empty_result_set_value instead, without running the query. Right ? If it's right, there is still something that I'm missing.
That's not exactly how it works, there's a difference between returning None
and dealing with an empty result. Some aggregate functions might return None
even when not dealing with empty results (e.g. when aggregation over a set of value that contain a NULL
). It is not the case for Count
but that's the rationale for empty_result_set_value
being a distinct feature from default
and how convert_value
deals with None
.
My point is, assume that we have a query that the compiler does not know that it is going to return None (normal queries like objects.annotate(count=Count("..."))). In this case, the query is going to be run, and the result would be fetched. Now why we can't use empty_result_set_value in case of result is None ?
Because empty_result_set_value
is used a single very peculiar case which is when an EmptyResultSet
is exception is raised at query compilation time. The change you are proposing will work but they blur the line in terms of empty_result_set_value
usage. Why should expressions with an IntegerField
as an output_field
default to empty_result_set_value
when other types don't? It also impacts the locality of change since one can no longer see that Count
is treated differently that the normal COUNT
SQL function solely by looking at its definition.
And about Python time conversions, we are already running this convert_property for each row.
Correct but adding even more logic to it makes it harder to remove eventually which is something option 1. would eventually benefit from. In reality these converters would only need to be run if an explicit output_field
is provided otherwise the ORM has the possibility to generate the proper SQL and backend pecularities can be dealt with get_db_converters
.
Since the regression is solely about Count
, I believe that either 1. or 2. still remain the two best options.
comment:11 by , 18 months ago
Allright then.
I will start working on first solution, as it sounds better for me, if anything sounded like being complicated, I just simply use second solution for now.
Thanks
comment:12 by , 18 months ago
Hi again.
I implemented the first solution, I want to just double check the output ...
I removed empty_result_set_value
class attr from both Count
and RegrCount
, so basically they are inheriting this attr from Aggregate
class.
In __init__
method of Count
and RegrCount
class, I called super()...
with default=0
, so the default value is 0 for Count
. But still, I don't let user pass default value for Count
and the default value would be 0 all the time.
So basically Coalesce
is appended to Count
query with this default value. I fixed all tests that had SELECT COUNT(...)
and changed it to SELECT COALESCE(COUNT(...), 0)
.
Is everything right and as we wanted ?
Thanks for your time Simon!
comment:13 by , 18 months ago
Yep that sounds about right Amin!
Submitting a PR with these changes should validate that everything work as expected.
Thanks for the report, rubber ducking, and working on a patch!
comment:14 by , 18 months ago
Has patch: | set |
---|
comment:15 by , 18 months ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Triage Stage: | Accepted → Ready for checkin |
comment:16 by , 18 months ago
Resolution: | fixed |
---|---|
Status: | closed → new |
Triage Stage: | Ready for checkin → Accepted |
Ticket is not fixed until PR is merged.
comment:17 by , 18 months ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
Status: | new → assigned |
comment:18 by , 18 months ago
Hi again.
I was trying to write a regression test that I faced an issue.
Apparently, Count
function is returning 0 usually. But I was unlucky enough to face some situations that Count
return None
.
One situation is during my tests, when using Timescaledb
(Postgres extension). In this scenario, I'm getting None
instead of 0 when using Count
.
And the another situation is on production server, when some complicated queries are executed and I'm getting None there too. I cannot find out why that leads to None
, as I get 0 when using similar queries.
So I'm basically asking, do you know some simple scenarios that DB would return None ? (I suppose the old convert_value
function was there for those scenarios)
And if your not, is it valid to use Timescaledb
in test ? (Actually I have written such a test already for TDD development during this PR, and my changed fixes my issue with that)
follow-up: 20 comment:19 by , 18 months ago
Django doesn't support TimescaleDB so any change against it is not considered a regression from our perspective. You might have run into bugs with TimescaleDB for all we know.
I suppose the old convert_value function was there for those scenarios
It could have been there solely to support a subset of the cases that empty_result_set_value
covers for aggregation and selects now. This code was introduced in f59fd15c4928caf3dfcbd50f6ab47be409a43b01 without a clear reason. We'd have to checkout the code from that time and run the suite to be sure.
The fact you cannot reproduce against SQLite, Postgres, MySQL, or Oracle might explain why this was not caught by the test suite in the first place which would make more sense given how common such annotations are.
Here are my attempts at reproducing
comment:20 by , 18 months ago
Replying to Simon Charette:
The fact you cannot reproduce against SQLite, Postgres, MySQL, or Oracle might explain why this was not caught by the test suite in the first place which would make more sense given how common such annotations are.
I'm sure about this problem on Timescale, and as it is an extension for Postgres, I think it is a corner case for Postgres (or even other DBs).
I should have tried to do this regression test earlier. It is my bad, sorry about that.
But still, I suggest to simply have old convert_value
to prevent corner cases for now and for the future.
comment:21 by , 18 months ago
Resolution: | → invalid |
---|---|
Status: | assigned → closed |
Unfortunately this was not even demonstrated to be a corner case for Postgres, if it was the case we'd certainly accept a patch for it.
We can't keep code around that we can't test for regression against I'm afraid.
If someone is able to write a test demonstrating that Count
can return NULL
values from modern SQL standard or against any supported database backend that might diverge from the standard (Postgres, SQLite, MySQL, Oracle) we can re-open this ticket but in the mean time this ticket doesn't seem actionable.
comment:22 by , 18 months ago
PG docs confirms that count(<expression>)
must always return a result as it counts the number of rows for which <expression>
is not null:
count ( "any" ) → bigint
Computes the number of input rows in which the input value is not null.
My advice would be to confirm this isn't documented behaviour with Timescale, if it isn't then report the issue ¯\_(ツ)_/¯.
This was an intended change in 9f3cce172f6913c5ac74272fa5fc07f847b4e112. You can use
Coalesce()
to return0
instead, e.g.