#6707 closed Bug (fixed)
ManyToManyField clears and recreates all data
Reported by: | Owned by: | Loic Bistuer | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | kmike84@…, sfllaw@…, jblaine@…, bradley.ayers@…, flavio.curella@…, botondus@…, info@…, info@…, cmawebsite@…, toracle@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Current ReverseManyRelatedObjectsDescriptor implement set method:
- clear all data
- set to new data.
This may cause a huge data changes, especially when using Django Admin, the default M2M field save_form_data() method use the set method to maintain m2m data from form:
def save_form_data(self, instance, data): setattr(instance, self.attname, data)
In most case we just want to add remove a small set data from m2m relaionship list. This patch use another implementation, first it delete all exist rows but not in the new data list, then only add new data which not in database.
Attachments (2)
Change History (46)
by , 17 years ago
Attachment: | django_m2m_set_method.diff added |
---|
comment:1 by , 17 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
This change runs a whole extra database query, which is far from zero cost. So each approach is better in different circumstances and there's no way to tell which one will be better ahead of time. Thus a change like this would just be changing things, not a universal improvement.
A better approach would be to track what's changed by tracking the old values before updating them, which has been suggested in another ticket. I'm not convinced that's a good idea for all fields, but it might be worthwhile for the multi-valued relations like many-to-many.
comment:2 by , 17 years ago
Yes, as you said, it's depends. And it's true the patch need a extra full database query. However when you use it in some m2m widget like the admin ones, you already do a full database query to show all choices. And the patch don't change the exist data(don't like before remove but add later.), this make sense in some situation. In one of my project, we have lot's signal hook with m2m relationship changed(with the help of #5390), each receiver have to run a heavy database logic, so we need this patch do reduce unnecessary changes.
Anyway it's a balance, not a universal improvement, I write here, but part people need it.
mtredinnick, thanks for feedback. But I didn't find the ticket you mentioned?
comment:3 by , 17 years ago
The problem is there's no way to tell when this will be better than the current behaviour. So saying "it's an improvement somtimes" doesn't help. It's also a distinct unimprovement in other cases (and is not only used in the admin, so putting in something just for that usage isn't a strong argument). For that reason I'm not in favour of making this change because it's just changing which are the worse cases. It happens it would favour your uses, but somebody's always going to have the bad side of the deal with something like this.
#4102 is the ticket I was referring to, although I'm not sure the implementation there is great because of the additional overhead. However in some cases the general idea may make sense (e.g. m2m fields). Testing and profiling is needed.
comment:4 by , 14 years ago
Current behaviour causes m2m_changed signal not to work properly (#13962). Most of times m2m_changed signal is used in Django admin. So m2m_changed signal is mostly useless at this state. Is there a way to detect if a m2m_changed is connected for a model field. This patch may be applied as another save method so Django admin may branch to that method if m2m_changed signal is connected to a receiver.
comment:5 by , 14 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Given that 1.2 gained signals for ManyToMany relationships, it's worth fixing this properly now. Sending out the correct signals seems like a worthwhile tradeoff for the performance cost of implementation, whichever the correct approach may be.
comment:6 by , 14 years ago
Patch needs improvement: | set |
---|
Greetings,
I have some experience with this sort of thing, and would love for the m2m_changed signals to be correctly issued by the admin. I will download the patch and check it out to see what it does, how hard it would be to contribute a fix. One strategy I've used to minimize database access is to create a hidden input that is the list of a given set of relations that is passed with updates, so you can use it to see what has changed since the page was rendered (it can be string-delimited using multiple delimiters, akin to old Pick-style database records which had value marks and sub-value marks, to hold a list of all three keys in each M2M relation when the page was generated). If you did this, you could compare the submitted list against the data structure encoded in this hidden form input to check which needs to be removed, which are new, and which are unchanged. It definitely isn't perfectly thread-safe (though the alternative of hitting the database for that list isn't, either), but it could be an answer if limiting database access is preferable to creating a larger chance of a race condition. You could also add a check for the freshness of that list (a simple count of relations for the current instance might suffice?) and use the passed value if the check passes, or revert to hitting the database if the check fails. If I did this, I'd make the check a separate method (or would love guidance as to the preferred way of making something modular and override-able within django), so that we could refine the implementation of the freshness check later.
In looking at the patch, does the manager already invoke the signals we want invoked as part of its processing? Perhaps I could start by placing that code in my django instance and seeing how the signals behave. At the least, I'd probably add documentation to the patch, and break it out so each line contains a single code task (I must admit to not being very pythonic in that regard - if there is only one thing on a line, then you know if you change that line, you only change that thing - like adding calcium to bones - keeps them tough but flexible).
I have never contributed code to django before, so if this is premature, please help me understand the community norms of where I should begin contributing (probably by proving my follow-through-ness by completing the two documentation patches I agreed to write).
Thanks,
Jonathan Morgan
comment:7 by , 14 years ago
Type: | → Cleanup/optimization |
---|
comment:8 by , 14 years ago
Severity: | → Normal |
---|
comment:9 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
UI/UX: | unset |
comment:10 by , 13 years ago
Cc: | added |
---|
comment:11 by , 13 years ago
Because __set__()
clears and adds, it does a lot of extra work when assignment happens, since each object in the new set has to be INSERTed in the database, even if it's just to add an extra item.
The extra query can be eliminated in _add_items()
, which can be told that its list of *objs
do not already exist so it doesn't have to check for them.
We've hit this problem with m2m_changed on any ModelForm
, so it's not just limited to the admin site.
If there's interest, I can write up a patch that implements this.
comment:12 by , 13 years ago
Cc: | added |
---|
comment:13 by , 13 years ago
I'm interested in any patch that will give me access to the new data and operations performed when a model's m2m field is altered. Right now, m2m_changed is totally useless to me, and is actually dangerous as others have pointed out in other bugs related to this one.
Me: Remove 2 relationships from these 200 that are set.
Admin: Okay! I removed 200 things and added all 198 items back that did not match your 2! And I sent not one single pre_remove or post_remove signal''
Me: That's not at all what I said, and it totally matters how you go about it and that your signals+actions properly reflect what is being asked and not reflecting how you went about doing it.
I suspect it's no surprise that I consider this a bug, not a "cleanup/optimization" as it's now categorized :(
comment:14 by , 13 years ago
Type: | Cleanup/optimization → Bug |
---|
Closed #16073 (which was specifically about incorrect m2m signals in the admin, as a result of this) as duplicate. Marking this a bug rather than a cleanup/optimization; previous to the existence of m2m signals it may have been the latter, but at this point the signaling problem makes it a bug.
comment:15 by , 13 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Status: | new → assigned |
6707_m2m_set.diff includes a patch that performs the minimum amount of DELETE and INSERT statements for the __set__
method. In addition, I have changed _add_items()
so it does not do an extra SELECT if __set__
has already removed duplicates, so we do not perform an extra query.
This patch applies to SVN trunk.
comment:16 by , 13 years ago
It occurred to me, while writing this patch, that the __set__
methods for ManyRelatedObjectsDescriptor and ReverseManyRelatedObjectsDescriptor are basically the same. Should those two Descriptors be unified into a base descriptor?
comment:17 by , 13 years ago
Cc: | added |
---|
This is fantastic, I just wanted to say thank-you very much for writing this latest patch!
comment:18 by , 12 years ago
I haven't tested this ticket, just skimmed the latest patch. To me it seems this is worth getting into Django, the savings can be pretty large if one regularly does small changes to m2m fields using list assignment.
comment:19 by , 12 years ago
akaariai: Are there any improvements to this patch that you can suggest?
comment:20 by , 12 years ago
Not anything I spotted immediately. As said, I haven't gone through this in full detail. If you can get somebody to review the patch it would of course be a plus.
I will try to get time to work on this. I can't promise anything, though.
comment:21 by , 12 years ago
It'd be great to get a core developer to look at whether the suggested refactoring above would be sensible.
comment:22 by , 12 years ago
Cc: | added |
---|
comment:23 by , 12 years ago
Cc: | added |
---|
Ran into this as well on multiple occasions.
This basically makes the m2m_changed signal almost useless :(
comment:24 by , 12 years ago
As I understand there is currently no way to determine which values were added to many2many field in the admin?
comment:25 by , 11 years ago
Cc: | added |
---|
comment:26 by , 11 years ago
Any news on this?
We too are bidden by this bug (and it is a bug when a signal is documented but never actually emitted) and would love to see a proper fix.
comment:27 by , 11 years ago
Cc: | added |
---|
comment:28 by , 11 years ago
This bug was opened "6 years ago" and we are still around the "patch". What would it take to get this fixed?
comment:29 by , 11 years ago
Shouldn't at least the documentation for previous versions including version 1.6 be updated to not include the pre_remove and post_remove actions in the m2m-changed signal?
comment:30 by , 11 years ago
Owner: | changed from | to
---|
comment:32 by , 11 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
There's no documentation yet and I still want to add more tests, but the latest effort for this issue is at https://github.com/django/django/pull/2500.
comment:33 by , 10 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Owner: | changed from | to
Resuming work on this issue, I brought the branch up-to-date and drafted some documentation. It'd be very helpful if some of those affected by this issue could give it a try.
comment:34 by , 10 years ago
I've tested the branch Loic has worked on and my issue still remains. m2m_changed is not triggered when the relationship is altered via a StackInline admin form. No doubt this is a significant bug.
comment:35 by , 10 years ago
Cc: | added |
---|
comment:36 by , 10 years ago
Summary: | Another implementation for ReverseManyRelatedObjectsDescriptor.__set__ method → ManyToManyField clears and recreates all data |
---|
comment:37 by , 10 years ago
I've rebased the PR and I'm quite happy with the result.
I checked regarding comment:34 and indeed when M2M are displayed as admin inlines m2m_changed
isn't triggered. It's because the M2M field is bypassed and the through
table is seen by the admin as a normal model with 2 FKs. That's an orthogonal problem but I added docs about it in the PR.
comment:38 by , 10 years ago
Patch needs improvement: | unset |
---|
comment:39 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:42 by , 9 years ago
@Loic is this commit going to be released for Django 1.7 still ??? I am using Django 1.7.8 and i noticed that its not there yet
comment:43 by , 9 years ago
@psychok7 this is actually considered more of a new feature / optimization, so it's not backported to 1.7. It actually missed the deadline for getting into 1.8 by a few weeks, but it will be in 1.9 which comes out towards the end of the year.
comment:44 by , 9 years ago
Cc: | added |
---|
patch of set method