Opened 11 months ago
Closed 11 months ago
#35102 closed Cleanup/optimization (fixed)
Performance regression on Model.objects.count() between Django 4.2 and 5.0
Reported by: | Anthony Shaw | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I was running the djangobench tool against the 4.2 and 5.0.1 tags (and main to verify) and noticed that some of the benchmarks had significant regressions.
The one I wanted to highlight is query_count
. The other two with regressions were model creation and query_annotate.
djangobench --control 4.2 --experiment 5.0.1 query_count -p profiles
The profile for both versions shows that in v5, get_aggregation has an additional call to replace_expressions which spends a lot of time in the slow inspect module. In the profile reports I collected this accounts for the 30-40% performance regression on the count aggregation query.
Performance regressions with new features are expected, but I wanted to raise this because the [ASV environment for Django](https://github.com/django/django-asv) seemed to have missed some of these regressions when all of the benchmarks suddenly ran faster.
I've attached profiles for 4.2 and 5.0.1 which can be opened using something like snakeviz
Attachments (2)
Change History (20)
by , 11 months ago
Attachment: | con-query_count added |
---|
comment:1 by , 11 months ago
This is the commit that added the extra call tree https://github.com/django/django/commit/2ee01747c32a7275a7a1a5f7862acba7db764921
follow-up: 4 comment:2 by , 11 months ago
Thanks for the report, I think we could make it faster by avoiding cloning every time:
-
django/db/models/expressions.py
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py index c20de5995a..0e1c265d9e 100644
a b class BaseExpression: 402 402 return clone 403 403 404 404 def replace_expressions(self, replacements): 405 if not replacements: 406 return self 405 407 if replacement := replacements.get(self): 406 408 return replacement 409 if not (source_expressions := self.get_source_expressions()): 410 return self 407 411 clone = self.copy() 408 source_expressions = clone.get_source_expressions()409 412 clone.set_source_expressions( 410 413 [ 411 414 expr.replace_expressions(replacements) if expr else None
Does it work for you?
comment:3 by , 11 months ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
comment:4 by , 11 months ago
Replying to Mariusz Felisiak:
Does it work for you?
According to my experiments, it reduces regression to ~15%.
comment:5 by , 11 months ago
We could avoid cloning in two more places:
-
django/db/models/sql/query.py
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index a79d66eb21..911d0c7848 100644
a b class Query(BaseExpression): 422 422 return obj 423 423 424 424 def relabeled_clone(self, change_map): 425 if not change_map: 426 return self 425 427 clone = self.clone() 426 428 clone.change_aliases(change_map) 427 429 return clone -
django/db/models/sql/where.py
diff --git a/django/db/models/sql/where.py b/django/db/models/sql/where.py index 2f23a2932c..ae7a2a1224 100644
a b class WhereNode(tree.Node): 225 225 return clone 226 226 227 227 def replace_expressions(self, replacements): 228 if not replacements: 229 return self 228 230 if replacement := replacements.get(self): 229 231 return replacement 230 232 clone = self.create(connector=self.connector, negated=self.negated)
comment:6 by , 11 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 11 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
Please don't assign ticket to yourself. For now, we're discussing a potential approach, and some improvements have already been proposed.
comment:8 by , 11 months ago
Cc: | added |
---|
get_aggregation has an additional call to replace_expressions which spends a lot of time in the slow inspect module
One way we could cut the time spent in Expression.identity
significantly would be to cache (e.g. at class initialization time) the inspect.signature(cls.__init__)
signature as it represents the vast majority of the time spent by replace_expressions
comment:9 by , 11 months ago
If Expression.hash (Expression.identity()) were to implement another mechanism for determining a unique hash that doesn't depend upon using inspect.signature
the benchmark would be much faster since this is evaluated twice. I suspect this would impact other benchmarks as well, hash functions should be as fast as possible but I understand why that is challenging for something like expressions. inspect.signature is a very slow function that does a huge amount of processing to build a Signature object for the given callable.
follow-up: 11 comment:10 by , 11 months ago
I suspect this would impact other benchmarks as well, hash functions should be as fast as possible but I understand why that is challenging for something like expressions. inspect.signature is a very slow function that does a huge amount of processing to build a Signature object for the given callable.
yeah that's the crux of the issue here. This would require having each expression implement their own correct identity functions based on their __init__
signature which would be a significant maintenance burden and also backward incompatible for third party expressions.
I think we can get a lot of benefit by caching inspect.signature
calls like the following.
-
django/db/models/expressions.py
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py index c20de5995a..0703a134a9 100644
a b def select_format(self, compiler, sql, params): 485 485 class Expression(BaseExpression, Combinable): 486 486 """An expression that can be combined with other expressions.""" 487 487 488 def __init_subclass__(cls, /, **kwargs): 489 super().__init_subclass__(**kwargs) 490 cls._constructor_signature = inspect.signature(cls.__init__) 491 492 _constructor_signature = inspect.signature(BaseExpression.__init__) 493 488 494 @cached_property 489 495 def identity(self): 490 constructor_signature = inspect.signature(self.__init__)491 496 args, kwargs = self._constructor_args 492 signature = constructor_signature.bind_partial(*args, **kwargs)497 signature = self._constructor_signature.bind_partial(self, *args, **kwargs) 493 498 signature.apply_defaults() 494 499 arguments = signature.arguments.items() 495 500 identity = [self.__class__] 496 501 for arg, value in arguments: 502 if arg == 'self': 503 continue 497 504 if isinstance(value, fields.Field): 498 505 if value.name and value.model: 499 506 value = (value.model._meta.label, value.name)
comment:11 by , 11 months ago
Replying to Simon Charette:
I think we can get a lot of benefit by caching
inspect.signature
calls like the following.
That patch alone is 1.4x faster
djangobench --control main --experiment exp_1 query_count -t 200 Running benchmarks: query_count Control: Django 5.1 (in git branch main) Experiment: Django 5.1 (in git branch exp_1) Running 'query_count' benchmark ... Min: 0.000266 -> 0.000186: 1.4310x faster Avg: 0.000292 -> 0.000202: 1.4424x faster Significant (t=23.676856) Stddev: 0.00004 -> 0.00003: 1.4418x smaller (N = 200)
comment:12 by , 11 months ago
And then with Mariusz' patches the overall speedup is 1.8x compared with main:
Running benchmarks: query_count Control: Django 5.1 (in git branch main) Experiment: Django 5.1 (in git branch exp_1) Running 'query_count' benchmark ... Min: 0.000294 -> 0.000163: 1.8003x faster Avg: 0.000317 -> 0.000173: 1.8301x faster Significant (t=53.998054) Stddev: 0.00003 -> 0.00002: 1.4805x smaller (N = 200)
And an overall improvement on 4.2
Running benchmarks: query_count Control: Django 4.2 (in git branch 4.2) Experiment: Django 5.1 (in git branch exp_1) Running 'query_count' benchmark ... Min: 0.000191 -> 0.000174: 1.0997x faster Avg: 0.000263 -> 0.000200: 1.3143x faster Significant (t=3.949122) Stddev: 0.00022 -> 0.00004: 5.2520x smaller (N = 200)
For the query_aggregate benchmark a marginal improvement over 4.2:
djangobench --control 4.2 --experiment exp_1 query_aggregate -t 200 Running benchmarks: query_aggregate Control: Django 4.2 (in git branch 4.2) Experiment: Django 5.1 (in git branch exp_1) Running 'query_aggregate' benchmark ... Min: 0.000211 -> 0.000192: 1.0973x faster Avg: 0.000238 -> 0.000217: 1.0976x faster Significant (t=3.652858) Stddev: 0.00004 -> 0.00007: 1.9551x larger (N = 200)
And a more significant one to main/5.1
djangobench --control main --experiment exp_1 query_aggregate -t 200 Running benchmarks: query_aggregate Control: Django 5.1 (in git branch main) Experiment: Django 5.1 (in git branch exp_1) Running 'query_aggregate' benchmark ... Min: 0.000263 -> 0.000189: 1.3960x faster Avg: 0.000290 -> 0.000211: 1.3708x faster Significant (t=15.063702) Stddev: 0.00005 -> 0.00005: 1.0220x smaller (N = 200)
comment:13 by , 11 months ago
Alternate patch that avoids the cost at class definition time as that might have an effect on Django bootstrapping time given we define tens of Expression
subclasses
-
django/db/models/expressions.py
diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py index c20de5995a..0ecda38353 100644
a b 14 14 from django.db.models.constants import LOOKUP_SEP 15 15 from django.db.models.query_utils import Q 16 16 from django.utils.deconstruct import deconstructible 17 from django.utils.functional import cached_property 17 from django.utils.functional import cached_property, classproperty 18 18 from django.utils.hashable import make_hashable 19 19 20 20 … … def select_format(self, compiler, sql, params): 485 485 class Expression(BaseExpression, Combinable): 486 486 """An expression that can be combined with other expressions.""" 487 487 488 @classproperty 489 @functools.lru_cache(maxsize=128) 490 def _constructor_signature(cls): 491 return inspect.signature(cls.__init__) 492 488 493 @cached_property 489 494 def identity(self): 490 constructor_signature = inspect.signature(self.__init__)491 495 args, kwargs = self._constructor_args 492 signature = constructor_signature.bind_partial(*args, **kwargs)496 signature = self._constructor_signature.bind_partial(self, *args, **kwargs) 493 497 signature.apply_defaults() 494 498 arguments = signature.arguments.items() 495 499 identity = [self.__class__] 496 500 for arg, value in arguments: 501 if arg == 'self': 502 continue 497 503 if isinstance(value, fields.Field): 498 504 if value.name and value.model: 499 505 value = (value.model._meta.label, value.name)
comment:14 by , 11 months ago
Has patch: | set |
---|
Submitted a PR that confirms the suite is happy with both changes.
comment:15 by , 11 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
Triage Stage: | Accepted → Ready for checkin |
comment:18 by , 11 months ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Profile for Django 4.2