Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#32840 closed Cleanup/optimization (fixed)

Micro-optimisation possibility in Field.get_col

Reported by: Keryn Knight Owned by: Keryn Knight
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Current implementation is:

def get_col(self, alias, output_field=None):
    if output_field is None:
        output_field = self
    if alias != self.model._meta.db_table or output_field != self:
        from django.db.models.expressions import Col
        return Col(alias, self, output_field)
    else:
        return self.cached_col

If no different output field is provided, is doing the following comparison needlessly as far as I can tell: output_field != self for which
the default implementation of Field.__eq__ is:

        if isinstance(other, Field):
            return (
                self.creation_counter == other.creation_counter and
                getattr(self, 'model', None) == getattr(other, 'model', None)
            )
        return NotImplemented

in that scenario, because self and output_field are literally the same object (down to the id(...)) the isinstance resolves to True, the creation counters also are the same and the models are as you'd expect ... the same. There's no short-circuiting via falsy condition available.

I think that the method body can be changed to:

has_diff_output_field = True
if output_field is None:
    output_field = self
    has_diff_output_field = False
if alias != self.model._meta.db_table or (has_diff_output_field and output_field != self):
    from django.db.models.expressions import Col
    return Col(alias, self, output_field)
else:
    return self.cached_col

The introduction of has_diff_output_field being the important part. If it's False then comparison short-circuiting will prevent the execution of output_field != self at all.
I'm purposefully avoiding making further investigation/judgement about whether output_field != self is itself necessary, because it's ostensibly possible for a custom output_field to be provided which has the same creation_counter + model and I don't know how likely that is.

Across the entire test suite (ignoring those which have skipped), executed with the proposed change didn't seem to break anything (yay) and augmenting the method additionally with:

        if has_diff_output_field:
            print('different')
        else:
            print('same')

and counting the results across some 14K tests, there were 87021 different and 178493 same.

Quick example of how to get to the method:

>>> tuple(get_user_model().objects.all())
(Pdb) w
  /path/django/db/models/query.py(280)__iter__()
-> self._fetch_all()
  /path/django/db/models/query.py(1343)_fetch_all()
-> self._result_cache = list(self._iterable_class(self))
  /path/django/db/models/query.py(51)__iter__()
-> results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  /path/django/db/models/sql/compiler.py(1175)execute_sql()
-> sql, params = self.as_sql()
  /path/django/db/models/sql/compiler.py(523)as_sql()
-> extra_select, order_by, group_by = self.pre_sql_setup()
  /path/django/db/models/sql/compiler.py(55)pre_sql_setup()
-> self.setup_query()
  /path/django/db/models/sql/compiler.py(46)setup_query()
-> self.select, self.klass_info, self.annotation_col_map = self.get_select()
  /path/django/db/models/sql/compiler.py(228)get_select()
-> cols = self.get_default_columns()
  /path/django/db/models/sql/compiler.py(715)get_default_columns()
-> column = field.get_col(alias)
> /path/django/db/models/fields/__init__.py(396)get_col()

Overall it's:

  • 1 comparison if they're the same (it was 1 comparison before too, but that was itself 3 comparisons)
  • 1 additional comparison if they're not the same.

The weighting/ratio of the test suite + the fact that the simplest ORM usage suggests (to me) it might have merit.

Addendum: when I say micro, I mean micro. It's not a big time saver, I just happened to notice upon far more calls to __eq__ than I expected.

Change History (6)

comment:1 by Carlton Gibson, 4 years ago

Triage Stage: UnreviewedAccepted

Hi Keryn, thanks. I'm going to provisionally Accept this to let Mariusz/Simon have a look at it in PR.

I wonder if we need the initial conditional at all... 🤔

Untested, but same idea I think:

def get_col(self, alias, output_field=None):
    if alias != self.model._meta.db_table or (output_field and output_field != self):
        from django.db.models.expressions import Col
        return Col(alias, self, output_field or self)
    else:
        return self.cached_col

I'm purposefully avoiding making further investigation/judgement about whether output_field != self is itself necessary, because it's ostensibly possible for a custom output_field to be provided which has the same creation_counter + model and I don't know how likely that is.

Yes.

comment:2 by Keryn Knight, 4 years ago

Owner: changed from nobody to Keryn Knight
Status: newassigned

comment:3 by Keryn Knight, 4 years ago

Has patch: set

PR is here: https://github.com/django/django/pull/14585
For consistency across 2 separate comment areas:

The reason I've not opted for the suggested alternative above (and have instead kept my rough original proposal) is not a criticism of it, but for keeping intent as clear as possible -- the version which was suggested relies on the mental parsing of the x or y twice (which always causes me to double take) and leaves open an accidental regression in the future should someone justify an implementation of Field.__bool__ which is equally/more costly as Field.__eq__ (I think).

(Obviously I can change that stance, if it's even deemed worthwhile doing it. We'll see)

comment:4 by Keryn Knight, 4 years ago

Patch needs improvement: set

Updating state based on initial review on the PR.

comment:5 by GitHub <noreply@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 5013798f:

Fixed #32840 -- Optimized Field.get_col().

get_col() used "self" as "output_field" when it was not given, and
unnecessarily compared "self" to "self".

Co-authored-by: Chris Jerdonek <chris.jerdonek@…>
Co-authored-by: Mariusz Felisiak <felisiak.mariusz@…>

comment:6 by Mariusz Felisiak, 4 years ago

Patch needs improvement: unset
Note: See TracTickets for help on using tickets.
Back to Top