#16679 closed Cleanup/optimization (fixed)
Speed up signals by caching the receivers per sender
Reported by: | Anssi Kääriäinen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | anssi.kaariainen@…, jdunck@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This is related to ticket #16639, where I tried to speed up db.models.Model __init__
by some kludges in pre_init and post_init signals. Here is another try, this time with a more generic approach.
I have two patches, the first one speeds up just the cases where there are no receivers for the current sender. This is done by caching "no receivers" when send was called and the sender had no receivers. The cache is cleared whenever connect or disconnect is called or whenever a weakref signal is removed by garbage collection. Performance is roughly the same as trunk when there are receivers for the current sender. In the case when there are receivers, but not for the current sender, the performance is much better, as in almost no overhead.
The second patch optimizes things a bit further by not only remembering which senders have receivers, but by also remembering the list of receivers. This will require more memory, but I believe this optimization might be needed for further work (for example deferred models do not currently send signals correctly, and correcting this might mean more work in signal.send()).
There are some problems:
- The sender object must be hashable. In the trunk code only
__eq__
is assumed. - The caches can leak memory. For example the template_rendered signal has that property.
- Code complexity.
- More work if signals change constantly.
The first two problems can be solved by adding a kwarg use_caching to Signal.__init__
(default False). There are some signals that will benefit from caching (model signals at least) and some that would leak memory too much with caching (template_rendered, notably). The use_cache kwarg is included in the second patch.
The rough numbers for pre and post second patch for 10000 simple model __init__
(only id) is:
pre patch:
- no signals at all: 108ms
- one pre_init signal to other model: 0.171ms
- one pre_init and one post_init signal to other model: 0.220ms
- one pre_init signal to initialized model: 0.215ms
post patch 2:
- no signals at all: 103ms
- one pre_init signal to other model: 0.115ms
- one pre_init and one post_init signal to other model: 0.115ms
- one pre_init signal to initialized model: 0.173ms
In the case when there is one post_init and one pre_init signal in the project, one can save around 50% in the best case.
The numbers are nice, but IMHO the real benefit is that with the second patch, having inherited model signals would be possible without paying a huge performance penalty. Fixing deferred objects not firing signals would also be easier without performance problems.
The patches are still somewhat WIP. But I'd like to open discussion before I polish them and test them more.
Attachments (6)
Change History (24)
by , 13 years ago
Attachment: | fast-signals1.diff added |
---|
comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
This clearly needs a review by a core developer before going into trunk => marking as DDN.
comment:2 by , 13 years ago
Triage Stage: | Design decision needed → Accepted |
---|
This seems like a good idea to me - marking accepted.
comment:3 by , 13 years ago
Here are another approach, without cache, store receivers by sender instead.
by , 13 years ago
Attachment: | no_cache_fast_signals.py added |
---|
by , 13 years ago
Attachment: | no_cache_fast_signals.diff added |
---|
by , 13 years ago
Attachment: | no_cache_fast_signals.2.diff added |
---|
comment:4 by , 13 years ago
I quickly tried your patch, but didn't see performance improvement in this case:
class A(models.Model): pass class B(models.Model): pass def recv1(*args, **kwargs): pass signals.pre_init.connect(recv1, sender=B, weak=False) signals.post_init.connect(recv1, sender=B, weak=False) for i in range(0, 1000): A(i)
Note that the connected signals for B are slowing down A __init__
. This is the case where the cached signals (fast-signals2.diff) give the best performance increase.
comment:5 by , 13 years ago
My patch is designed for large amount of signals. Adding some special casing for empty receivers should make it almost as fast as cached signals without actually caching anything. Still storing receivers in proper structure seems cleaner approach than caching.
To make signals even faster we should drop weakrefs, they seem more confusing than useful.
by , 13 years ago
Attachment: | no_cache_fast_signals.3.diff added |
---|
comment:7 by , 13 years ago
As I see it, your approach is the right one, at least as long as we don't want to support inherited signals. There are some problems when trying to make inherited signals work, though.
The problem for inherited signals is that at the time the signal is registered, we don't necessarily know all of the sender's subclasses. And, when a model is created by the apploading code, we don't know the signals yet. So, we would need to update the signal datastructures when a new signal is connected, disconnected or a new subclass is registered in the app cache. In the two first cases, we should check all the subclasses of the sender, and in the last case we would need to check the parents of the sender.
My approach to that problem is that when a signal is sent, we cache the found results. When a receiver is connected or disconnected we flush all the caches. This would make it perhaps easier to support inherited signals, as there is just one point when we need to check the parents and cache the results based on that. On the other hand, my version is very bad if you do dynamically connect / disconnect signals, as the caches would get flushed all the time and the caching would just increase the overhead, not reduce it. I don't know if dynamical connect is a common use case.
Of course, there is no decision if we actually do want to support inherited signals. They would be useful, as one would usually want to have BaseModel
pre_save/post_save sent when InheritedModel
is saved.
comment:8 by , 13 years ago
Yes, my solution is not suitable for inherited signals. And once we have those it should be discarded. But do we really need them? Implementing as little as possible is generally a good idea.
And, about dynamic creation of model subclasses, I do pretty much of it. And in my experience, signals are usually connected in .contribute_to_class() call which is done for every new model class, thus no need for inherited signals here.
comment:9 by , 13 years ago
Patch needs improvement: | set |
---|
Actually now that I think about it, dynamic creation of subclasses will not be a problem for the caching approach. The memory used by the signals cache will be much less than the memory used by the app module cache anyways. For the caching approach the only real killer is dynamic addition / removal of signals. But I think that could be improved somewhat easily.
And I claim that inherited signals would be useful. An external application like Haystack could use them, see this for example: https://github.com/toastdriven/django-haystack/blob/master/haystack/indexes.py#L289. Now, if you save a inherited model, it is not going to do any indexing. They would surely want inherit=True. However if a decision needs to be made between fast __init__
signals and inherited signals, it is the model initialization that is important.
But lets concentrate here on making the signals as fast as possible and nothing else. Even if it is decided that we want inherited signals and thus caching, it would still be a good idea to have proper data structures for signal storage.
The patch seems to fail signals / signals_regress test suites. Some high level comments about what is going on with the datastructures would be useful.
I assume the worst case for your patch, where you have a connected receiver for both sender=None and sender=CurrentSender
isn't a common case?
comment:10 by , 13 years ago
About failures: it's tests that wrong here, they rely on .receivers
property being list of receivers while I use more complex structure. Sure adjusted tests should be added to my patch, I was just giving a try at a moment.
About worst case: Yeah, the worst case is than combine() is needed. It still has linear complexity, not so bad.
About caching: in fast-signals2.diff you use both sender_receivers_cache
and sender_no_receivers_cache
. The latter seems superfluous you can just store [] in the former, and then use something like:
if not self.receivers: return [] if sender in self.sender_receivers_cache: receivers = self.receivers_cache[sender] else: receivers = self.receivers_cache[sender] = self._receivers_for_sender(sender) return deref(receivers)
I also dropped .use_caching
property. What can be a reason not to? Just cache always since it don't change any behavior.
comment:11 by , 13 years ago
As I understand haystack it creates SearchIndex for each model and that search index connects signals to receivers that depend on that search index, so no use for inherited signals here too. Or maybe I just don't see a big simplifying refactoring here which involves inherited signals?
comment:12 by , 13 years ago
The senders_no_receivers cache is there to make the sending marginally faster. But I think if you did on the last line this:
return receivers and deref(receivers) or []
I would be actually as fast as the senders_no_receivers cache version, as you would get rid of that function call. Other than that your code does look much better.
The problem with not having .use_caching property is that if somebody decides to use dynamic classes with the signaling system, you will get really bad memory leak - every dynamic class will be stored in the cache. This is not a problem with model classes, because you will leak the classes anyways in app models cache.
So, what do you think, which way should we go here, caching, your version or both?
The Haystack
use case is following:
class NoteIndex(RealtimeSearchIndex): # Index definition site.register(Note, NoteIndex)
Because you use RealtimeSearchIndex, haystack will connect to the signals, and do automatic updates of the index in the signals. Now, if you have SpecialNote(Note) class, and you save a SpecialNote, you will NOT get the index updated, even if you saved a note. Even a ProxyNote(Note) would break your indexing. There are surely other use cases like that. For example you could have an extension "django-audits", where you register an audit_trail for a model, and it will automatically do auditing for you. Now, if you use subclassing (even proxy classes), suddenly your auditing is not there. See the problem?
Although there are other problems for signals, too: they are not sent when you do qs.update, nor are they sent when you do bulk_create. I have some ideas of how there could be fixed, although that would mean first fetching the to-be-updated resultset in qs.update() and then doing the update in one query. So it would mean that if you update a gazillion rows in one go, you would get a LOT worse performance... But having data modification signals that are only sometimes sent will lead to problems for users who think they can audit their data by just signals. Ah well, getting out of this ticket's topic again... :)
comment:13 by , 13 years ago
Are there any real world scenario for using numerous thrown out dynamic classes? And while they are thrown out they still live long enough to use signals. Probably the right approach here will be just ignore this case.
About inherited signals. I agree that they could simplify some code (making it more implicit by the way). It will also make signals implementation slower and more complex. It seems that somebody should start another ticket or topic on django-developers to resolve this question. Who knows? Maybe a lot of people want this?
About implementation. It appears to me that my approach is cleaner when sender inheritance is not involved, otherwise one should just go with caching. So this solution should be delayed until we get any decision on previous question.
comment:14 by , 13 years ago
Cc: | added |
---|
comment:15 by , 12 years ago
comment:16 by , 12 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
A rebased and slightly simplified version at https://github.com/akaariai/django/tree/ticket_16679.
Benchmarks:
model_init_signals (pre and post init signals defined for model other than initialized, one field for the initialized model):
Running 'model_init_signals' benchmark ... Min: 0.012126 -> 0.007745: 1.5656x faster Avg: 0.012178 -> 0.007774: 1.5666x faster Significant (t=644.833741) Stddev: 0.00004 -> 0.00002: 2.3318x smaller (N = 34)
Benchmark available at: https://github.com/akaariai/djangobench/tree/continuous_bechmarking/djangobench/benchmarks/model_init_signals
(Note that the djangobench used is somewhat hacked version - it discards 2/3 of the results - the highest third and the lowest third - to get consistent results on my laptop).
model_init_self_signals (pre and post init for same model):
Running 'model_init_self_signals' benchmark ... Min: 0.014637 -> 0.011735: 1.2473x faster Avg: 0.014675 -> 0.011762: 1.2477x faster Significant (t=530.411861) Stddev: 0.00002 -> 0.00002: 1.2429x smaller (N = 34)
query_all (no signals at all, 1000 objects from DB)
Running 'query_all' benchmark ... Min: 0.006779 -> 0.006783: 1.0006x slower Avg: 0.006819 -> 0.006825: 1.0010x slower Not significant Stddev: 0.00003 -> 0.00003: 1.1693x larger (N = 34)
All tests pass and I don't see any reason to not add the cache. So, will commit tonight if no objections.
comment:17 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:18 by , 8 years ago
Summary: | Speed up signals by caching the reveicers per sender → Speed up signals by caching the receivers per sender |
---|
Without sender -> receivers cache