Opened 13 years ago

Closed 12 years ago

#17030 closed Cleanup/optimization (wontfix)

Remove special handling of deferred models in queryset iterators

Reported by: Anssi Kääriäinen Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Suor gave me a great idea in ticket #17025: use attname cache for __init__. This turned out to work really well for deferred models: we don't need to special case them any more in queryset iterator loops (we do of course need to get the right deferred model class still).

Because the model __init__ did need to get the data in the same order as the model._meta.fields for *args based initialization, we couldn't use the fast init for deferred models. But if the model's options know the selected columns in the queryset, we suddenly can use the *args based __init__ which is a lot faster. A new model._meta.get_init_attnames() method was added for this.

One might wonder if this is backwards compatible. Well, the __init__ behaviour surely changed for deferred models, but I don't think this matters: the deferred classes are only for internal use of querysets, and thus the user should never call __init__ of deferred models directly.

The speedup from djangobench for raw_deferred is 2.5x. The speedup for query_all and query_all_multifield is about 1.05. There isn't a djangobench test for .defer, but according to my test .only('id')[0] is about 2x faster, and .only('id)[0:100] is about 1.6x faster.

The attached patch passes all test. There is some cleanup needed in get_cached_row, but there is a separate patch for optimizing those parts of the queryset, so I didn't bother to clean it up. However there is something strange going on with select_related querysets and its handling of load_fields. It seems that sometimes the field is passed as field.attname and sometimes as field.name to deferred_model_factory, and I don't believe that to be correct. The patch just checks for both, but that is the wrong cure. I believe the load_fields should always be field.name, not field.attname. Another tickets problem IMHO.

Suor also suggested the idea of using __dict__.update in model.__init__. That would speed up model initialization somewhat, but I believe that is backwards incompatible as __setattr__ would not be called. Is this corret? We are talking about 10-15% speed increase in the case the model has a lot of columns.

Attachments (1)

deferred_init.diff (18.4 KB ) - added by Anssi Kääriäinen 13 years ago.

Download all attachments as: .zip

Change History (5)

by Anssi Kääriäinen, 13 years ago

Attachment: deferred_init.diff added

comment:1 by Aymeric Augustin, 13 years ago

Triage Stage: UnreviewedDesign decision needed

comment:2 by Anssi Kääriäinen, 13 years ago

Cc: anssi.kaariainen@… added

The idea of this ticket isn't backwards compatible. Assume you have a model with fields a and b. Now, if you do qs.only('b'), the __init__ will be called with args=[val_for_b]. If somebody has overriden the model's __init__ this will break the contract of the __init__ args. If args are given, the first one must be the value for first field, but after this patch it will be value for the second field. Hence this is backwards incompatible.

So this would break:

class SomeModel:
   a = TextField()
   b = TextField()

   def __init__(self, *args, **kwargs):
       a_val = args[0] if args else kwargs.get('a')
       if a_val == 'some value':
           do_something()
       super(SomeModel, self).__init__(*args, **kwargs)

Using the attached patch .only('b') will give b's value as first arg. Current code will init the model with kwargsb = b's value.

Wontfixing this is a bit sad, as the speedup of 1.5-2x for .defer/.only queries and over 2x for .raw queries can be important. These methods are often used for performance reasons.

There is one trick that could be used: detect if the model has overridden __init__ method (Model.__init__ <> deferred_class.__init__). If there is no overridden __init__, then it is safe to use the fast-path. However that feels a little dirty, so maybe this one should be given a wontfix: backwards incompatible resolution.

comment:3 by Alex Gaynor, 12 years ago

Triage Stage: Design decision neededAccepted

comment:4 by Anssi Kääriäinen, 12 years ago

Resolution: wontfix
Status: newclosed

I am going to close this one wontfix. Getting faster __init__ for deferred models would be nice. Notably currently deferred models are considerably slower to __init__ than loading the same model in full (the __init__ speed, of course, if you transfer a lot of data things look different). Still, the idea in this specific ticket isn't implementable for backwards compatibility reasons.

Note: See TracTickets for help on using tickets.
Back to Top