Opened 11 years ago
Last modified 10 months ago
#21461 new New feature
Add pre_update and post_update signals
Reported by: | loic84 | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, Shai Berger, Ben Davis, Sardorbek Imomaliev, Olivier Dalang, Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Quoting Anssi from the ML post https://groups.google.com/forum/#!msg/django-developers/tCzFMpBm5-c/XLHFY0awVJ8J:
The idea is that pre_update listeners get a queryset that isn't executed. Accessing that queryset might be costly, but if it isn't accessed, there isn't much cost of adding a pre_update signals. For post_update the signal handler would be given a list of PK values (the original queryset doesn't work for post_update, the update might cause different instances to be returned than was updated, consider qs.filter(deleted=False).update(deleted=True))
This feature will also provide an upgrade path for #21169 (for FK RelatedManager.remove() more specifically).
Change History (20)
comment:1 by , 11 years ago
Needs documentation: | set |
---|---|
Owner: | changed from | to
Patch needs improvement: | set |
Status: | new → assigned |
comment:2 by , 11 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
Now with docs, this should be ready for review.
comment:3 by , 11 years ago
IMO, until we add support for RETURNING
(or equivalent on other backends) we should try to mimic the way delete works, that is.
If no signals are attached to (pre|post)_update
we do a fast update.
If it's not the case we fetch the pk of the objects we're about to update and use the retrieved list to:
- Build the queryset passed to
pre_update
receivers (pk__in=pk_list
); - Issue the update query (
WHERE pk IN pk_list
); - Pass the
pk_list
topost_update
receivers (actually this could also be a queryset built withpk__in
instead of thepk_list
itself).
comment:4 by , 11 years ago
I am not a big fan of first fetching pk_list, it suffers from these problems:
pk__in=hugelist
is problematic (SQLite doesn't work if you have more than 1000 items in the list, PostgreSQL performs badly)- one extra query is needed
- there is still no guarantee that the update is race free. The races would be different. Consider case
qs.filter(count__lte=0).update(count=F('count') + 1)
- two threads executing this statement in parallel could end up updating count from -1 to 1 while that isn't currently possible. In my opinion connecting to pre/post_update altering what .update() does is worse than pre/post update signals not seeing the exact same set of objects that were in fact updated. Notably if deletes happen the latter problem is still there for altered pre/post_update. - when we add support for RETURNING we can't optimize the implementation of pre_update
The reason why I voted for pk_list to post_update is that if the updated pk_list is large, the user doesn't need to execute pk__in=hugelist
query. The query can be split to parts, or maybe the pk list is all that is needed. OTOH I can see how queryset would make the API more consistent. Maybe pass both?
comment:5 by , 11 years ago
Latest attempt:
- Document that there is no guarantee that the
queryset
arg forpre_update
will exactly return the updated rows because of concurrency. - Ensure that
pk_set
forpost_update
is the exact set of updated models. - Use chunked updates on SQLite when there are
post_update
signals.
comment:6 by , 11 years ago
Cc: | added |
---|
comment:7 by , 11 years ago
The latest solution is close to what charettes suggested in comment:3. The biggest distinctions are that pre_update is given the to-be updated queryset instead of to-be-updated queryset.filter(pk_in=pk_set), and for post_update you don't get queryset, you just get pk_set.
The reason for going with pk_set is that in some cases the only thing you need is the set of primary keys, and if that happens to be a large set, then doing pk_in=pk_set query will be expensive (PostgreSQL doesn't like this type of query at all, and SQLite will not even execute a query with more than 1000 pk values.
A possible solution is to provide both the queryset (with pk_in=pk_set filtering pre-applied), *and* the pk_set to pre_/post_ update signals. That way the user has nice API if that is all that is needed, but can also just use the pk_set if that is possible.
comment:8 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:9 by , 11 years ago
Cc: | added |
---|
Not sure about this, but from a short look at the code it seems like the pk_set is calculated before the update. That makes perfect sense, but breaks a little if the update changes pk values. I don't see any sensible behavior around this, but it should probably be documented.
comment:10 by , 11 years ago
@shai you are totally right, the limitation on changing pk is something we already discussed but I agree it needs to be documented.
At this point @akaariai and I are not sure about how to proceed with this patch. Do we only introduce post_signal
? Is there value in the pre_signal
as currently implemented? Are there enough people who want update signals at all?
Opinion welcome.
comment:11 by , 11 years ago
I would love to have an update signal for the purpose of cache invalidation. Currently I use save/delete signals to detect when something has changed to update an expensive calculation. While pk_set
idea solves some problems, it's not ideal for large result sets and has unpredictable consequences.
For example, let's say I don't care about the primary keys of objects that were updated, but rather the objects related to those that were updated. I could run a much less expensive query by doing, for example, queryset.distinct().values_list('related_id')
.
Personally I feel it's better to allow the user be explicit in what is passed around. I would propose that pre_update
accept the queryset, and users have the option to return a dict of extra data that will be available to post_update
receivers. For example:
@receiver(pre_update, sender=MyModel) def on_pre_update(sender, update_fields, queryset, **kwargs): return { 'related_pks': queryset.distinct().values_list('relatedmodel_id') } @receiver(post_update, sender=MyModel) def on_post_update(sender, update_feilds, extra_data, **kwargs): related_pks = extra_data.get('related_pks', []) for obj in RelatedModel.objects.filter(pk__in=related_pks): obj.do_something()
The implementation if QuerySet.update()
would look like this:
def update(self, **kwargs): """ Updates all elements in the current QuerySet, setting all the given fields to the appropriate values. """ assert self.query.can_filter(), \ "Cannot update a query once a slice has been taken." meta = self.model._meta self._for_write = True query = self.query.clone(sql.UpdateQuery) query.add_update_values(kwargs) extra_data = {} if not meta.auto_created: responses = signals.pre_update.send( sender=self.model, update_fields=kwargs, queryset=self, using=self.db) for rcvr, response in responses: extra_data.update(response) with transaction.atomic(using=self.db, savepoint=False): rows = query.get_compiler(self.db).execute_sql(CURSOR) self._result_cache = None if not meta.auto_created: signals.post_update.send( sender=self.model, update_fields=kwargs, extra_data=extra_data, using=self.db) return rows
It's a bit cleaner and addresses any performance concerns, as well as the issue with updated pks. Regarding comment:10, I think there use cases for both pre_update
and post_update
, either together or alone.
comment:12 by , 11 years ago
Just created a PR for my above suggestion, with updated docs. I'm assuming this would now be targeted for 1.8 since we missed the feature freeze.
PR: https://github.com/django/django/pull/2619
Commit: https://github.com/bendavis78/django/commit/62a9ea61639a4a6ba9fde0eabc9f546db0a40de8
comment:13 by , 11 years ago
While the idea in comment:11 is interesting, I think the current suggestion is lacking.
First of all, returning values from signals breaks with well-established assumptions, and I wouldn't haste to change that. Further, the suggested code assumes without checking that the returned values can be used to update a dictionary, and worse: Collapses all returned values into one dictionary, allowing different signal handlers to step on each other's toes with no way to find out about it.
I think that the general direction of letting the user specify which data is received has merit, but this should be implemented via the registration API -- that is, either defining different signals ('pre_update', 'pre_update_with_qset', 'pre_update_with_pks') or specifying different senders (something along the lines of @receiver(pre_update, sender=(MyModel, pre_update.WITH_QSET))
). Then, update can calculate the required data only if it is, in fact, requested.
comment:14 by , 11 years ago
Patch needs improvement: | set |
---|
comment:15 by , 11 years ago
Cc: | added |
---|
Yeah, you're right, using the dict is a messy way to solve the problem. I guess idea was to put the onus of "type" handling on the pre_save & post_save. Ideally the update() method would be agnostic regarding the payloads being passed between the signals. I think you've got the right idea with using sender
, though I don't think the API should be prescribing those types (ie pks vs querysets).
Not sure what the right answer is at this point -- maybe someone will come up with a more elegant approach.
comment:16 by , 5 years ago
Cc: | added |
---|
comment:17 by , 4 years ago
Cc: | added |
---|
comment:18 by , 4 years ago
For reference, here's a package implementing this : https://github.com/martinphellwig/django-query-signals
It includes also other bulk methods such as delete or bulk_create.
From the code, it looks like it suffers from edge case described above (the original queryset doesn't work for post_update, the update might cause different instances to be returned than was updated, consider qs.filter(deleted=False).update(deleted=True)
), but nothing unfixable it seems.
comment:19 by , 22 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:20 by , 10 months ago
Cc: | added |
---|
WIP https://github.com/django/django/pull/1932.