Opened 13 years ago
Closed 12 years ago
#17030 closed Cleanup/optimization (wontfix)
Remove special handling of deferred models in queryset iterators
Reported by: | Anssi Kääriäinen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | anssi.kaariainen@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Suor gave me a great idea in ticket #17025: use attname cache for __init__
. This turned out to work really well for deferred models: we don't need to special case them any more in queryset iterator loops (we do of course need to get the right deferred model class still).
Because the model __init__
did need to get the data in the same order as the model._meta.fields for *args based initialization, we couldn't use the fast init for deferred models. But if the model's options know the selected columns in the queryset, we suddenly can use the *args based __init__
which is a lot faster. A new model._meta.get_init_attnames() method was added for this.
One might wonder if this is backwards compatible. Well, the __init__
behaviour surely changed for deferred models, but I don't think this matters: the deferred classes are only for internal use of querysets, and thus the user should never call __init__
of deferred models directly.
The speedup from djangobench for raw_deferred is 2.5x. The speedup for query_all and query_all_multifield is about 1.05. There isn't a djangobench test for .defer, but according to my test .only('id')[0] is about 2x faster, and .only('id)[0:100] is about 1.6x faster.
The attached patch passes all test. There is some cleanup needed in get_cached_row, but there is a separate patch for optimizing those parts of the queryset, so I didn't bother to clean it up. However there is something strange going on with select_related querysets and its handling of load_fields. It seems that sometimes the field is passed as field.attname and sometimes as field.name to deferred_model_factory, and I don't believe that to be correct. The patch just checks for both, but that is the wrong cure. I believe the load_fields should always be field.name, not field.attname. Another tickets problem IMHO.
Suor also suggested the idea of using __dict__
.update in model.__init__
. That would speed up model initialization somewhat, but I believe that is backwards incompatible as __setattr__
would not be called. Is this corret? We are talking about 10-15% speed increase in the case the model has a lot of columns.
Attachments (1)
Change History (5)
by , 13 years ago
Attachment: | deferred_init.diff added |
---|
comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:2 by , 13 years ago
Cc: | added |
---|
comment:3 by , 12 years ago
Triage Stage: | Design decision needed → Accepted |
---|
comment:4 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I am going to close this one wontfix. Getting faster __init__
for deferred models would be nice. Notably currently deferred models are considerably slower to __init__
than loading the same model in full (the __init__
speed, of course, if you transfer a lot of data things look different). Still, the idea in this specific ticket isn't implementable for backwards compatibility reasons.
The idea of this ticket isn't backwards compatible. Assume you have a model with fields a and b. Now, if you do qs.only('b'), the
__init__
will be called with args=[val_for_b]. If somebody has overriden the model's__init__
this will break the contract of the__init__
args. If args are given, the first one must be the value for first field, but after this patch it will be value for the second field. Hence this is backwards incompatible.So this would break:
Using the attached patch .only('b') will give b's value as first arg. Current code will init the model with kwargsb = b's value.
Wontfixing this is a bit sad, as the speedup of 1.5-2x for .defer/.only queries and over 2x for .raw queries can be important. These methods are often used for performance reasons.
There is one trick that could be used: detect if the model has overridden
__init__
method (Model.__init__ <> deferred_class.__init__
). If there is no overridden__init__
, then it is safe to use the fast-path. However that feels a little dirty, so maybe this one should be given a wontfix: backwards incompatible resolution.