#32840 closed Cleanup/optimization (fixed)
Micro-optimisation possibility in Field.get_col
Reported by: | Keryn Knight | Owned by: | Keryn Knight |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Current implementation is:
def get_col(self, alias, output_field=None): if output_field is None: output_field = self if alias != self.model._meta.db_table or output_field != self: from django.db.models.expressions import Col return Col(alias, self, output_field) else: return self.cached_col
If no different output field is provided, is doing the following comparison needlessly as far as I can tell: output_field != self
for which
the default implementation of Field.__eq__
is:
if isinstance(other, Field): return ( self.creation_counter == other.creation_counter and getattr(self, 'model', None) == getattr(other, 'model', None) ) return NotImplemented
in that scenario, because self
and output_field
are literally the same object (down to the id(...)
) the isinstance
resolves to True, the creation counters also are the same and the models are as you'd expect ... the same. There's no short-circuiting via falsy condition available.
I think that the method body can be changed to:
has_diff_output_field = True if output_field is None: output_field = self has_diff_output_field = False if alias != self.model._meta.db_table or (has_diff_output_field and output_field != self): from django.db.models.expressions import Col return Col(alias, self, output_field) else: return self.cached_col
The introduction of has_diff_output_field
being the important part. If it's False
then comparison short-circuiting will prevent the execution of output_field != self
at all.
I'm purposefully avoiding making further investigation/judgement about whether output_field != self
is itself necessary, because it's ostensibly possible for a custom output_field
to be provided which has the same creation_counter
+ model
and I don't know how likely that is.
Across the entire test suite (ignoring those which have skipped), executed with the proposed change didn't seem to break anything (yay) and augmenting the method additionally with:
if has_diff_output_field: print('different') else: print('same')
and counting the results across some 14K tests, there were 87021 different
and 178493 same
.
Quick example of how to get to the method:
>>> tuple(get_user_model().objects.all()) (Pdb) w /path/django/db/models/query.py(280)__iter__() -> self._fetch_all() /path/django/db/models/query.py(1343)_fetch_all() -> self._result_cache = list(self._iterable_class(self)) /path/django/db/models/query.py(51)__iter__() -> results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size) /path/django/db/models/sql/compiler.py(1175)execute_sql() -> sql, params = self.as_sql() /path/django/db/models/sql/compiler.py(523)as_sql() -> extra_select, order_by, group_by = self.pre_sql_setup() /path/django/db/models/sql/compiler.py(55)pre_sql_setup() -> self.setup_query() /path/django/db/models/sql/compiler.py(46)setup_query() -> self.select, self.klass_info, self.annotation_col_map = self.get_select() /path/django/db/models/sql/compiler.py(228)get_select() -> cols = self.get_default_columns() /path/django/db/models/sql/compiler.py(715)get_default_columns() -> column = field.get_col(alias) > /path/django/db/models/fields/__init__.py(396)get_col()
Overall it's:
- 1 comparison if they're the same (it was 1 comparison before too, but that was itself 3 comparisons)
- 1 additional comparison if they're not the same.
The weighting/ratio of the test suite + the fact that the simplest ORM usage suggests (to me) it might have merit.
Addendum: when I say micro, I mean micro. It's not a big time saver, I just happened to notice upon far more calls to __eq__
than I expected.
Change History (6)
comment:1 by , 4 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 3 years ago
Has patch: | set |
---|
PR is here: https://github.com/django/django/pull/14585
For consistency across 2 separate comment areas:
The reason I've not opted for the suggested alternative above (and have instead kept my rough original proposal) is not a criticism of it, but for keeping intent as clear as possible -- the version which was suggested relies on the mental parsing of the x or y
twice (which always causes me to double take) and leaves open an accidental regression in the future should someone justify an implementation of Field.__bool__
which is equally/more costly as Field.__eq__
(I think).
(Obviously I can change that stance, if it's even deemed worthwhile doing it. We'll see)
comment:4 by , 3 years ago
Patch needs improvement: | set |
---|
Updating state based on initial review on the PR.
comment:6 by , 3 years ago
Patch needs improvement: | unset |
---|
Hi Keryn, thanks. I'm going to provisionally Accept this to let Mariusz/Simon have a look at it in PR.
I wonder if we need the initial conditional at all... 🤔
Untested, but same idea I think:
Yes.