Opened 4 months ago
Last modified 4 hours ago
#35801 new Bug
Signals are dispatched to receivers associated with dead senders
Reported by: | bobince | Owned by: | |
---|---|---|---|
Component: | Core (Other) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | bobince, Simon Charette, Lincoln | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
django.dispatch.Signal.receivers is keyed on the id() of the receiver and sender (modulo a thing to make bound methods consistent).
If a sender connected to a signal is destroyed, and a new object is allocated with the same id() and then also connected as a sender to the signal, when the signal fires it will match the original sender and call the receiver that was connected for that sender (as well as the new one).
Signal works around the problem of re-used ids for the receiver by having a weakref to the receiver (since it needs a reference anyway to be able to call it), but it doesn't for sender.
In my case this resulted in post-migrate hooks for the wrong apps being occasionally called in migration tests that mutated INSTALLED_APPS causing AppConfig senders to be re-created, but it can be more simply provoked with:
from django.dispatch import Signal receivers_called = [] def create_receiver(i): def receiver(**kwargs): receivers_called.append(i) return receiver n = 100 receivers = [create_receiver(i) for i in range(n)] signal = Signal() for i in range(n): sender = object() signal.connect(receivers[i], sender=sender) receivers_called = [] _ = signal.send(sender=sender) # only the receiver for the new sender object should be called assert receivers_called == [i], f'Expected [{i}], called {receivers_called}'
(how readily this explodes may depend on local memory allocation differences, but it dies pretty consistently for me on iteratio
n 3.)
Perhaps Signal should be keeping a weakref to each sender as well as receiver, and detecting when it has gone None? Does this need to participate in the _dead_receivers mechanism?
Attachments (1)
Change History (9)
comment:1 by , 4 months ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
follow-up: 3 comment:2 by , 4 months ago
I don't think it's correct to clear receivers_called = [] like that inside the loop
It should be fine as long as the loop is in the same scope as the definition of receivers_called. But yes, it might be clearer to reset the list without rebinding it by doing del receivers_called[:]
.
Sadly, this does not fail for me at any iteration
That's because the new example stores a reference to sender
in signal_calls
, which prevents the sender being destroyed and its id re-used.
If you store id(sender)
or str(sender)
instead you'll get the error again.
comment:3 by , 4 months ago
Cc: | added |
---|---|
Resolution: | needsinfo |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Replying to bobince:
I don't think it's correct to clear receivers_called = [] like that inside the loop
It should be fine as long as the loop is in the same scope as the definition of receivers_called. But yes, it might be clearer to reset the list without rebinding it by doing
del receivers_called[:]
.
Right, I would suggest receivers_called.clear()
Sadly, this does not fail for me at any iteration
That's because the new example stores a reference to
sender
insignal_calls
, which prevents the sender being destroyed and its id re-used.
If you store
id(sender)
orstr(sender)
instead you'll get the error again.
Great catch, that makes perfect sense. I tweaked my script and was able to confirm your report. I'll accept this ticket on that basis, I couldn't find a previous report about this.
Would you like to prepare a patch?
(Adding Simon as cc since he might have some good ideas for a robust fix.)
comment:4 by , 3 months ago
Right, I would suggest receivers_called.clear()
(Ha ha, what is this crazy new-fangled API? Oh... Python 3.3 you say. well. huh!)
Would you like to prepare a patch?
Here's a candidate fix: https://github.com/bobince/django/compare/main..signals_from_dead_senders
It raises some questions about what to do when ‘sender’ isn't weakreffable — which admittedly is uncommon, and would already break if ‘use_caching’, but there's no doc that rules it out as far as I can see. In this commit it quietly falls back to making a strong reference, which would _probably_ be better than the buggy behaviour, but is still a change in behaviour.
So would certainly appreciate thoughts from someone familiar with the background of dispatch before PR'ing.
comment:5 by , 3 months ago
In my case this resulted in post-migrate hooks for the wrong apps being occasionally called in migration tests that mutated INSTALLED_APPS causing AppConfig senders to be re-created
Could you provide a minimal project demonstrating the issue in a non-synthetic manner? Before we commit to a solution (which yours appear to be correct from a quick look) we'd want to establish under which organic conditions the problem can be reproduced.
The Django test suite has many instances of @override_settings(INSTALLED_APPS)
and doesn't run into this issue and I'm not sure it's fair to expect Django to behave properly under memory pressure circumstances that make id
and therefore hash
(as the default implementation of __hash__
is id()
based) to return the same value for two distinct objects.
I think a possible alternative solution could be to identify under which organic circumstances this problem is likely to happen, document that signal connected with a sender must be disconnected prior to the object being garbage collected, and adjust the infringing cases within the framework itself.
by , 2 months ago
Attachment: | minimal.tar.gz added |
---|
comment:6 by , 2 months ago
I found this ticket while investigating a problem we are having in our CI, where we have recurring failures when running python manage.py migrate
.
In our set up, we have a lot of database migrations. Some of our models inherit from the DirtyFieldsMixin
provided by django-dirtyfields
.
In DirtyFieldsMixin.__init__
, it calls post_save.connect
, with sender=self.__class__
. (Code: https://github.com/romgar/django-dirtyfields/blob/f48bbe975c627f3f7577f018992a44b1b1cbad8d/src/dirtyfields/dirtyfields.py#L32).
What we're seeing during the migrations, is that reset_state
is called with an instance of a model that does *not* inherit from DirtyFieldsMixin
. Likely because the class definition is now in the same area of memory as which originally contained the temporary model class of a model that does inherit from DirtyFieldsMixin
.
Trimming that down to a 'minimal project demonstrating the issue in a non-synthetic manner' is however hard, because as you said Simon, it has to do also with memory pressure, and it's hard to simulate the memory pressure of a large project while simultaneously exhibiting a minimal example.
I did manage to trim down the workings to a minimal example:
import random from django.db import migrations def create_instance_withoutdirty_instance(apps, schema_editor): WithoutDirty = apps.get_model("stress", "WithoutDirty") WithoutDirty.objects.create() def clear_apps_cache(apps, schema_editor): # Whenever something schema-related changes, the apps-cache gets cleared. To keep # this test minimal, we clear the cache manually instead of changing the schema. apps.clear_cache() def init_withdirty_instance(apps, schema_editor): WithDirty = apps.get_model("stress", "WithDirty") print(f"WithDirty has ID {id(WithDirty)}") # Call __init__, don't even bother doing more. WithDirty() def update_withoutdirty_instance(apps, schema_editor): WithoutDirty = apps.get_model("stress", "WithoutDirty") print(f"WithoutDirty has ID {id(WithoutDirty)}") instance = WithoutDirty.objects.get() instance.save() class Migration(migrations.Migration): dependencies = [ ('stress', '0001_initial'), ] operations = [ migrations.RunPython(create_instance_withoutdirty_instance), *[ migrations.RunPython( random.choice( [ clear_apps_cache, init_withdirty_instance, update_withoutdirty_instance, ] ) ) for _ in range(1000) ] ]
(see also attached project)
The example works by randomly deciding from three options:
- clear the apps cache (as if a model was added/altered/deleted)
- Call
WithDirty()
, triggering the.connect(...)
call. - Call
WithoutDirty.objects.get().save()
, triggeringpost_save
to notify all its receivers.
When it fails, this is because the temporary WithoutDirty
class is instantiated in the same location of memory as which previously held a temporary WithDirty
class, causing reset_state
to be called with the WithoutDirty
instance.
During testing, occasionally it would not trigger the bug, but then I'd just delete the db.sqlite3
file and run again.
(I also logged this bug under django-dirtyfields, https://github.com/romgar/django-dirtyfields/issues/233, as it might be realistic to state that the case with the migrations might need to be solved there instead. However, I'm not sure if it's possible to fix this in django-dirtyfields without changes in Django).
comment:7 by , 2 weeks ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
Thanks for all the extra details Sjoerd, this is greatly appreciated.
Here's how I think we can resume the problem in a few bullet points
Signal
doesn't provide a way to connect to non-weak sendersSignal.receivers
doesn't hold a strong reference to senders as it usesid
to retrieve them by lookup and under memory pressureid
can return the same value for two distinct objects- The migration framework creates and disposes of
ModelBase
instances (akaModel
subclasses) which creates a lot of memory pressure (see #29898) and cycles throughsender
for all model signals which makes the perfect conditions forid(SomeFakeModel) == id(SomeOtherFakeModel)
collisions.
With all that said I can see a few solutions which should be considered.
First making Signal.connect(weak=True)
also use weakref
for senders like bobince implemented seems like a good step forward. If you could open a PR from your branch that also tests Signal(use_caching=True)
(which should work as it already uses a WeakKeyDictionary
for its sender cache) we should be land this work independently of the following proposed solutions. I believe we should have weakref.ref(sender)
raise a TypeError
instead of silencing though as the proposed code now holds a strong reference to sender
in receivers
which could make things much worst but this can be discussed further during code review.
That should leave us with what should be done with signal receivers connected with weak=False
. In the case of django-dirty-fields
's usage Sjoerd I think they should switch to weak=True
as their reset_state
receiver is defined at the module level anyway so it's causing more harm than good. There are albeit rare use cases for weak=False
and it's a public API so I wonder if the migration framework should do a better job at clearing after itself by implementing __del__
logic for AppConfig
and fake models for the core signals at least.
comment:8 by , 4 hours ago
Cc: | added |
---|
Hello bobince, thank you for taking the time to create this ticket report.
I have investigated and I have used the code that you shared to reproduce. I applied a small change because I don't think it's correct to clear
receivers_called = []
like that inside the loop. I also added somekwargs
when sending theSignal
to ensure we are debugging the right thing. What I find more "correct" is the following, which unless I'm missing something should be equivalent to your proposal. Sadly, this does not fail for me at any iteration (I even increasedn
to be1000
):Do you think you could, instead of a script, provide a failing test case for the Django test suite to illustrate the issue you are describing?