#4102 closed New feature (fixed)
Allow UPDATE of only specific fields in model.save()
Reported by: | Owned by: | Collin Grady | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | update fields sql row table modified |
Cc: | yannvr@…, simon@…, andrewbadr.etc@…, dcramer@…, philippe.r@…, hv@…, jashugan@…, gav@…, richard.davies@…, omat@…, hwaara@…, andrehcampos@…, miracle2k, graham@…, nikitka@…, wwbj2r68n2@…, Alexander Koshelev, Béres Botond, cg@…, simon@…, maxirobaina@…, kmike84@…, sfllaw@…, anssi.kaariainen@…, dtran, German M. Bravo, Matt Long, bitwolaiye, inactivist@…, niwi@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
At present, django updates every single field when saving - this can be less than desirable performance-wise if you only need to update 1 or 2, and can help avoid some minor threading issues. (two functions that never edit the same columns can never conflict with each other using this)
This patch allows you to specify which columns should be saved, and the rest are left as they currently are in the database.
Example:
>>> p = Poll.objects.get(pk=1) >>> p.question = "How can we help you?" >>> p.save(['question']) # or p.save(save_fields=['question'])
The SQL generated is then only touching the question:
'UPDATE "polls_poll" SET "question"=How can we help you? WHERE "id"=1'
(from connection.queries, which is why the quoting isn't complete)
Attachments (36)
Change History (134)
by , 18 years ago
Attachment: | save_fields.patch added |
---|
comment:1 by , 18 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Triage Stage: | Unreviewed → Design decision needed |
comment:2 by , 18 years ago
Simon, tracking "before" vs. "after" states can't really be done automatically, since we would have to trap every attribute access to the class (which are, after all, just normal Python attribute accesses). Doing an extra database lookup to work out this information would also be bad.
I'm slightly in favour of this feature at the moment, but still thinking about it. I think it would need to be implemented with a manual list of the fields, as in Collin's patc if we did it.
comment:3 by , 18 years ago
I've uploaded a patch based on Collin's original (using only_fields
instead of save_fields
as the keyword argument -- this implies a possible except_fields
keyword argument, or perhaps exclude=False|True
which defaults to False
), including unit tests and docs.
Should an exception be thrown if invalid field names are given in the list?
comment:4 by , 17 years ago
This came up again at work and it was mentioned that you should be able to override __setattr__
for the meta class to trap attribute changes and set a dirty flag for various columns, which could enable some automation of this.
by , 17 years ago
Attachment: | 4102-dirty.patch added |
---|
initial shot at 'dirty' flag, seems to work so far
by , 17 years ago
Attachment: | 4102-dirty.2.patch added |
---|
updated patch to only set dirty if the new value is different than current
by , 17 years ago
Attachment: | 4102-dirty.3.patch added |
---|
fixed another bug - _dity check died if the field didn't exist but was being assigned to
by , 17 years ago
Attachment: | 4102-dirty.4.patch added |
---|
comment:5 by , 17 years ago
Found another issue in the previous patches - if you created an object, for instance Poll(id=1, question='What color is the sky?')
and then tried to save it, it would not recognize that you had modified those attributes.
Added a kwarg to __init__
to decide if the dirty flags should be washed at the end of __init__
, and modified the model creation in query.py
to use the flag, so that queries from the database are not flagged as dirty, but manual object creation can be done either way.
by , 17 years ago
Attachment: | 4102-dirty.5.patch added |
---|
Updated patch against [5832], fixed bug with kwargs instantiation and dirty flags
by , 17 years ago
Attachment: | 4102-dirty-set.patch added |
---|
converted patch to use set on suggestion of SmileyChris, also found another bug when you used a kwargs of a field name instead of attname
by , 17 years ago
Attachment: | 4102-dirty-set.2.patch added |
---|
removed keyword from __init__
, added call to _wash
in query.py
by , 17 years ago
Attachment: | 4102-dirty-set.4.patch added |
---|
sigh, patch
hates me - fixed duplicate test issue
comment:6 by , 17 years ago
Needs tests: | unset |
---|
This is looking nice and shiny. With some docs, it'd be ready for core IMO.
by , 17 years ago
Attachment: | 4102-dirty-set.5.patch added |
---|
Cleaner implementation: doesn't mess with init
by , 17 years ago
Attachment: | 4102-dirty-set.6.patch added |
---|
small fix (a del
exception is an AttributeError
, not a KeyError
)
comment:7 by , 17 years ago
Malcolm asked about the performance impact. It doesn't look good. Here's a quick mock test I created:
>>> from timeit import Timer # A control model >>> setup_a = ''' class Model(object): pass m = Model() ''' # A model which has a __setattr__ which does nothing >>> setup_b = ''' class Model(object): def __setattr__(self, name, value): super(Model, self).__setattr__(name, value) m = Model() ''' # A model which has a __setattr__ which matches this patch >>> setup_c = ''' class Model(object): def __setattr__(self, name, value): if name != '_modified_attrs' and (not hasattr(self, name) or value != getattr(self, name)): if hasattr(self, '_modified_attrs'): if value not in self._modified_attrs: self._modified_attrs.add(name) else: self._modified_attrs = set((value,)) super(Model, self).__setattr__(name, value) m = Model() ''' >>> Timer('m.an_attribute="test"', setup_a).timeit() 0.17214338693884201 >>> Timer('m.an_attribute="test"', setup_b).timeit() 1.6591357281431556 >>> Timer('m.an_attribute="test"', setup_c).timeit() 2.6405875607088092
by , 17 years ago
Attachment: | 4102-dirty-set.7.patch added |
---|
small fix to the set creation - need to add name to the set, not the value
comment:8 by , 17 years ago
I would think by doing value comparisons on the data is a main cause for the overhead. To mark something dirty do it based on the setting of the attribute only and not based on what data is being changed. If I am having to update a model with a 3.4 MB chunk of data it will be comparing that data with the 3 MB that may already be in there. At that point I don't care about only updating the fields I change since it will take much more time than just updating it.
by , 17 years ago
Attachment: | 4102-dirty-set.8.patch added |
---|
fixed another value/name mixup, optimized a few operations
comment:9 by , 17 years ago
More performance checks:
# A model which has a __setattr__ which *really* matches this patch >>> setup_c = ''' class Model(object): def __setattr__(self, name, value): if name != '_modified_attrs': if hasattr(self, '_modified_attrs'): if name not in self._modified_attrs: self._modified_attrs.add(name) else: self._modified_attrs = set((name,)) super(Model, self).__setattr__(name, value) m = Model() ''' # Optimized for speed, and dropped the value check as Brian suggested >>> setup_d = ''' class Model(object): def __setattr__(self, name, value): try: if name not in self._modified_attrs: self._modified_attrs.add(name) except AttributeError: if name != '_modified_attrs': self._modified_attrs = set((name,)) super(Model, self).__setattr__(name, value) m = Model() ''' >>> Timer('m.an_attribute="test"', setup_a).timeit() 0.17163466306471914 >>> Timer('m.an_attribute="test"', setup_b).timeit() 1.7086492836380041 >>> Timer('m.an_attribute="test"', setup_c).timeit() 2.3102257409810463 >>> Timer('m.an_attribute="test"', setup_d).timeit() 1.9328342009948187
So setup_d
is only 13% slower than a plain __setattr__
override... that's not bad...
comment:10 by , 17 years ago
(actually, I dropped the value check in setup_c
too, so it's still more like 2.71 for the speed of a __setattr__
which matches the current patch)
by , 17 years ago
Attachment: | 4102-dirty-set.9.patch added |
---|
Fixed duplicate tests due to svn glitch again
follow-up: 13 comment:11 by , 17 years ago
Attached the earlier tests (slightly corrected) - it seems that the majority of the extra time used is merely from defining __setattr__
, strangely enough...
Results of script, showing single vs double:
Single Assignment Control: 0.29817199707 noop: 1.92452192307 patch: 3.07881999016 optimized: 2.29438686371 Double Assignment Control: 0.298352003098 Noop: 1.92282295227 Patch: 3.08132410049 Optimized: 2.29805707932
The odd thing (to me at least) is that the set creation doesn't appear to take any more time than simply adding a value to the set (as shown by how the 2nd assignment takes just as long as the first).
If the performance difference caused by the __setattr__
stuff is too much, I think the original manual idea of being able to specify save/exclude fields in save()
could be a decent compromise.
comment:12 by , 17 years ago
BTW the latest patch obviously is very backwards-incompatible for anyone using obj.dict.update() to update a lot of attributes at once, or using obj.dict.update(form.cleaned_data) as I've seen on a few blog posts.
comment:13 by , 17 years ago
Replying to Collin Grady <cgrady@the-magi.us>:
The odd thing (to me at least) is that the set creation doesn't appear to take any more time than simply adding a value to the set (as shown by how the 2nd assignment takes just as long as the first).
It's because you were doing it in set up, not in the actual test!
Results of script:
Single Assignment Control: 0.168 noop: 1.689 (10.0x) patch: 2.581 (15.3x) optimized: 1.872 (11.1x) property: 1.494 (8.9x) Double Assignment Control: 0.368 noop: 3.400 (9.2x) patch: 5.222 (14.2x) optimized: 3.887 (10.6x) property: 2.951 (8.0x)
by , 17 years ago
Attachment: | 4102-timings.3.py added |
---|
"hyper-optimized" method, and showing the downfall of the properties method
comment:14 by , 17 years ago
Ok, so I've got the timing down to under 6x control.
I think it's also important to mention that the writing of an attribute isn't really where most of the time is going to be taken up anyway - it's not like this is slowing down model access or saving by any real measure.
Single Assignment Control: 0.170 noop: 1.939 (11.4x) patch: 2.566 (15.1x) optimized: 1.852 (10.9x) noop-h-op: 0.787 (4.6x) h-optimiz: 0.988 (5.8x) property: 1.473 (8.6x) new-attr: 1.779 (10.4x) Retreiving Control: 0.139 noop: 0.137 (1.0x) patch: 0.177 (1.3x) optimized: 0.162 (1.2x) noop-h-op: 0.136 (1.0x) h-optimiz: 0.135 (1.0x) property: 0.787 (5.7x) new-attr: 0.992 (7.2x)
by , 17 years ago
Attachment: | 4102-modified.patch added |
---|
Latest version of the patch, reimplemented checking against the existing value after a discussion with Collin because otherwise user input like forms/etc is going to mark everything modified regardless
follow-up: 16 comment:15 by , 17 years ago
Two things:
- while you are overloading setattr, you could also add an option to allow only defined fields to be set and rising exception on setting nonexistent field.
This is a great debugging mechanism to weed out incidental typos which usually lead to quiet dataloss until you notice it at some time much too late.
- Are you really sure you should optimize for form-update when saving on-change - and thus needing a compare? I don't really believe that is the case which people need this db performance advantage for... Usually save() gets performance-problematic when you want to change your data in bulk... and you just need to update a counter or a date field for thousands of records, but don't want to touch all the textfields
comment:16 by , 17 years ago
Replying to andraz@zemanta.com:
Two things:
- while you are overloading setattr, you could also add an option to allow only defined fields to be set and rising exception on setting nonexistent field.
That is well outside of the scope of this ticket.
- Are you really sure you should optimize for form-update when saving on-change - and thus needing a compare?
You're not being very clear, but I think you are asking why I re-implemented the change to check against existing values.
The point in that change isn't optimization but to help with the minor threading issues. And I think you may have overlooked the fact that it only does this comparison for fields when they are being set to ensure they really are "changing".
comment:17 by , 17 years ago
Cc: | added |
---|
comment:18 by , 17 years ago
Cc: | added |
---|
Why not just set self.db_values = db_values under save() and get() etc? Then compare current values to db_values during the loop that compiles the INSERT/UPDATEs?
comment:19 by , 17 years ago
It would likely be less efficient to store an entire second copy of all model data when you can simply store a modified flag. Especially if you had a large number of fields or fields with large amounts of data in them.
comment:20 by , 17 years ago
Sorry just read Brian's point, re the big blob data scenario. Agreed setattr overloading does sound better in that light, +1 :-)
Another point. Rather than suggest it should be kept _private, why not make it clearly available and show in the docs. It will be very useful for deciding when to update related tables or not, during save() overrides etc.
See this is ugly --
def save(self): if not self.id: retag = True else: # Unfortunately, we need to hit the db to see if things have changed check = Tag.objects.get(pk=self.id) retag = (check.name != self.name) super(Tag, self).save() if retag: self.retag_articles()
Be great to be able to do something like this --
def save(self): retag = ('name' in self.modified_fields) super(Tag, self).save() if retag: self.retag_articles()
comment:21 by , 17 years ago
I don't see how anything stops you from reading it like that already with this patch, it isn't a __
var, and you would then be internal to the model anyway, so it seems fine to me to read from it as is :)
comment:22 by , 17 years ago
Cc: | added |
---|
comment:23 by , 17 years ago
Patch needs improvement: | set |
---|
Need to write a new patch
(22:09:40) Magus-: SmileyChris: oh, btw, your most recent patch for 4102 is busted, I forgot to talk to you about it earlier :) (22:09:50) SmileyChris: oh noes! (22:09:56) SmileyChris: Magus-: what did I bust? (22:09:59) Magus-: you can't use self.__dict__ like that (22:10:03) Magus-: it breaks on fkeys and other things (22:10:09) Magus-: well, only fkeys so far
comment:24 by , 17 years ago
Cc: | added |
---|
comment:25 by , 17 years ago
Cc: | added |
---|
comment:26 by , 17 years ago
Cc: | added |
---|
comment:27 by , 17 years ago
Cc: | added |
---|
comment:28 by , 16 years ago
Patch needs improvement: | unset |
---|
comment:29 by , 16 years ago
You've obviously spent a lot of time investigating the performance overhead incurred by over-riding setattr, but have you done any benchmarking of the performance improvement gained by performing significantly smaller UPDATE statements? It may well be that the improvement from the more efficient UPDATEs makes the performance overhead of setattr irrelevant.
comment:30 by , 16 years ago
Cc: | added |
---|
comment:31 by , 16 years ago
Cc: | added |
---|
comment:32 by , 16 years ago
Owner: | changed from | to
---|
comment:33 by , 16 years ago
Cc: | added |
---|
comment:34 by , 16 years ago
Cc: | removed |
---|
comment:35 by , 16 years ago
Cc: | added |
---|
comment:36 by , 16 years ago
I've a table where some columns are read-only (have SELECT, but no UPDATE privilege on some columns).
Before that patch I was unable to modify an object and save() it, as save() tried to write all the fields, thus triggering an exception.
Now I can, thanks !!! (at the cost of enumerating in each save() all the fields I can/want update).
Maybe this 'read-only' attribute, per column basis, could be specified in the meta class.
Regards
comment:37 by , 16 years ago
Cc: | added |
---|
comment:38 by , 16 years ago
Cc: | added |
---|
comment:39 by , 16 years ago
Patch needs improvement: | set |
---|
Patch 4102.3 does not work correctly with multi-table inheritance -- child._modified_attrs is not visible by the parent, so a modified parent field is not updated. Something like this fixes it:
Index: third-party/Django-1.0.2-final+4102.3+2705/django/db/models/base.py
===================================================================
--- third-party/Django-1.0.2-final+4102.3+2705/django/db/models/base.py (revision 3362)
+++ third-party/Django-1.0.2-final+4102.3+2705/django/db/models/base.py (working copy)
@@ -358,7 +358,10 @@
non_pks = [f for f in meta.local_fields if not f.primary_key]
modified_attrs = self._modified_attrs
non_pks = [f for f in non_pks if (f.name in modified_attrs or f.attname in modified_attrs)]
- self._reset_modified_attrs()
+
+ # Reset modified attribute accumulator after parent fields have been updated
+ if cls == self.class:
+ self._reset_modified_attrs()
# First, try an UPDATE. If that doesn't update anything, do an INSERT.
pk_val = self._get_pk_val(meta)
by , 16 years ago
Attachment: | 4102.3.fix-multi-table added |
---|
Patches Django-1.0.2 with patch 4102.3 applied to fix multi-table inheritance
comment:40 by , 16 years ago
Apologies for the formatting snafu above. I've uploaded the patch as attachment 4102.3.fix-multi-table.
comment:41 by , 16 years ago
Cc: | added |
---|
comment:42 by , 15 years ago
Cc: | added |
---|
comment:43 by , 15 years ago
Has patch: | unset |
---|---|
milestone: | → 1.1 |
If query at the same time use function 'select_related' and 'only', it will be wrong.
s = Station_Info.objects.select_related('player_info').only('name','player_infonickname').get(pk=1)
s.name='test11'
s.save()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/lib/python2.6/dist-packages/django/db/models/base.py", line 420, in save
self.save_base(force_insert=force_insert, force_update=force_update)
File "/usr/lib/python2.6/dist-packages/django/db/models/base.py", line 462, in save_base
self.save_base(cls=parent, origin=org)
File "/usr/lib/python2.6/dist-packages/django/db/models/base.py", line 487, in save_base
rows = manager.filter(pk=pk_val)._update(values)
File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 449, in _update
return query.execute_sql(None)
File "/usr/lib/python2.6/dist-packages/django/db/models/sql/subqueries.py", line 120, in execute_sql
cursor = super(UpdateQuery, self).execute_sql(result_type)
File "/usr/lib/python2.6/dist-packages/django/db/models/sql/query.py", line 2385, in execute_sql
cursor.execute(sql, params)
File "/usr/lib/python2.6/dist-packages/django/db/backends/util.py", line 19, in execute
return self.cursor.execute(sql, params)
File "/usr/lib/python2.6/dist-packages/django/db/backends/mysql/base.py", line 89, in execute
raise Database.IntegrityError(tuple(e))
IntegrityError: (1048, "Column 'map_info_id' cannot be null")
comment:44 by , 15 years ago
Has patch: | set |
---|---|
milestone: | 1.1 |
Please don't mark tickets against the v1.1 milestone unless they are critical failures. New features are _definitely_ not for the milestone.
comment:45 by , 15 years ago
Cc: | added |
---|
comment:46 by , 15 years ago
#12172 raised a side effect that should be a logical consequence of implementing this feature.
comment:47 by , 15 years ago
Cc: | added |
---|
comment:48 by , 15 years ago
Cc: | added |
---|
comment:49 by , 15 years ago
Cc: | added |
---|
comment:50 by , 15 years ago
Cc: | added; removed |
---|
comment:51 by , 15 years ago
Cc: | removed |
---|
comment:52 by , 14 years ago
Cc: | added |
---|
comment:53 by , 14 years ago
Triage Stage: | Design decision needed → Accepted |
---|
comment:54 by , 14 years ago
Cc: | added |
---|
comment:55 by , 14 years ago
Cc: | added |
---|
comment:56 by , 14 years ago
Cc: | added |
---|
comment:57 by , 14 years ago
Cc: | added |
---|
comment:58 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
by , 13 years ago
Attachment: | 4102-r16288.diff added |
---|
Patch updated to r16288, code was adapted to current code status and tests were converted to unit tests.
comment:59 by , 13 years ago
Easy pickings: | unset |
---|---|
Keywords: | update fields sql row table modified added |
Patch needs improvement: | unset |
by , 13 years ago
Attachment: | 4102_Django1.3_JL20110610.patch added |
---|
modified attrs save patch which deals with pre_save methods
follow-up: 61 comment:60 by , 13 years ago
UI/UX: | unset |
---|
There are a few bugs in this feature that are fixed with my patch (4102_Django1.3_JL20110610.path).
Field pre_save methods, like the DateTime
auto_now, can change the value of a field just before a save, and these changes are not included in the modified_attrs list. I've made modifications to account for changes in field values by pre_save, and I've included a test to account for this change.
I've made a few other changes, which improve and fix bugs in this new feature:
- In the
__setattr__
method (base.py, line 370), I retrieve the_modified_attrs
list using a setdefault method on the object__dict__
. This is needed for situations when a model's__init__
has been overloaded and attribute access occurs before the parent__init__
is called.
- I've moved the
_reset_modified_attrs
call in the save_base to line 580 (base.py, line 580). Some Field pre_save methods, including some that I've developed, need access to the_modified_attrs
list before it is cleared.
In addition, I have a few suggestions that I believe would enhance this feature.
I propose that a 'save_all' flag be included in the save argument list. It could be used to revert to the original behavior to save all fields at once. Furthermore, a setting in the settings.py could change the default behavior of the model to save only modified fields or all.
The new feature could be extended with a new signal named TrackModification
or TrackChange
. On object creation, modification, or deletion, a signal would be emitted with a list of the modified attributes--these would, of course, have to include fields modified by pre_save
. Alternatively, this change could be integrated into the post_save
signal. I propose this as a feature because it would be relatively easy to implement, and very useful to have.
comment:61 by , 13 years ago
Replying to jlorieau:
The new feature could be extended with a new signal named
TrackModification
orTrackChange
. On object creation, modification, or deletion
Just to clarify on this point, I am proposing to emit a signal only when save
(and possibly __init__
and delete
) is called, and not on every invocation of __setattr__
. Basically, we could include the _modified_attrs
information to existing signals (post_save and post_init) for free, making them much more powerful--for example, in invalidating a cached attribute only when that attribute actually changes.
This would be a new ticket that builds on this feature, if this feature is approved.
comment:62 by , 13 years ago
Cc: | added |
---|
comment:63 by , 13 years ago
Cc: | added |
---|
comment:64 by , 13 years ago
Cc: | added |
---|---|
Patch needs improvement: | set |
Some quick notes:
- I don't think this works when using deferred field loading
- The _modified_attrs should go to the model._state, not directly to the model
- The _state.adding should maybe play a part in this. What about: if we are adding, then every field has changed by default? Maybe this is not so hot idea, but could be worth exploring.
- There is a trick one can use to speed up model
__init__
:# Pseudocode class Model(): def __setattr__(args): ... _my_setattr = __setattr__ def __init__(self, ...): # Check if inheriting class has defined a __setattr__ method if self._my_setattr == self.__setattr__: # If not, we can skip our own __setattr__ and use # directly the super __setattr__ _set = super(Model, self).__setattr__ else: # We need to call the __setattr__ defined in inheriting class # (backwards compatibility). _set = self.__setattr__ ... for field, val in izip(fields, args): _set(self, field.attname, val) ...
This way if there is no overriding __setattr__
there is almost no speed loss in Model's __init__
. On the other hand this is a dirty trick that could easily bite...
Also a check could be done to guard against overriding __setattr__
doing direct __dict__
manipulation and not calling our __setattr__
. On Django startup create a instance of every Model class and see if a modification registers to _modified_attrs. If not, raise an Exception. Python documentation says that __setattr__
should call the base __setattr__
for new style classes, but it is a safe bet that there are users who do not do that.
comment:65 by , 13 years ago
Correcting myself: Actually, I do think this works with deferred attributes, the deferred loading of attributes accesses directly the __dict__
of the instance, and if the attribute has not been accessed, save() will not access it. This is actually a big plus to this feature: if one uses .only() to load just the attributes needed in data manipulation and then saves the object back, things work as expected. Currently all the deferred attributes will be fetched before save, one at a time if I am not mistaken...
On the other hand, there is a minor backwards incompatibility. The following is valid at the moment (although not necessarily very wise thing to do):
# Lets fetch an object and update another object with it's data. m = M.objects.get(pk=1) m.pk = 2 # assume pk = 2 is in the DB. m.save() # Old code would have updated all attributes, new code doesn't.
This could be fixed by setting all attrs as changed when the pk has changed.
comment:66 by , 13 years ago
Cc: | added |
---|
comment:67 by , 13 years ago
Cc: | added |
---|
With the current patch in multitable inheritance, when you modify an already existing object and save it, it only saves data to the parent. This happens because _reset_modified_attrs()
is called during the topmost first parent's save_base()
call and further updates are ignored since self._modified_attrs
no longer have anything after that. I'm attaching a new patch to fix this.
by , 13 years ago
Attachment: | #4102-update_modified_fields_only.diff added |
---|
Fixes multitable inheritance
comment:68 by , 13 years ago
The patch has a fundamental problem. Some field types are mutable in-place. This means that the check in __setattr__
will never see a change if the value is mutated in-place, and hence this approach does not work for in-place mutable fields. See this post for some discussion related to this problem & some ideas how to solve the problem.
comment:69 by , 13 years ago
After reviewing the history of this ticket, it seems what started as a simple opt-in feature request of adding a field white-list to the Model's save function morphed into a complicated dirty flag approach that obviously has many edge cases and performance implications given that this ticket has been open for 5 years now. Just compare the very first proposed patch to the latest to see what I mean.
Unfortunately, this seemed to have happened without much discussion about the relative merits of each approach. I would prefer the Django ORM leave it to me to decide which fields I would like to update rather than relying on it to correctly infer which fields I would like updated. Clearly some people feel differently and favor the dirty flag approach for a more hands-off approach. As such, I propose adding support for both methods so that developers can choose the right approach for their use case.
comment:70 by , 13 years ago
Cc: | added |
---|
comment:71 by , 13 years ago
Implementing only_fields kwarg for model .save() sounds like a plan. That would be really useful feature, simple to implement and I think nobody will give a -1 to that idea.
In ticket #17332 is a patch which should allow (with minor changes) override of model._state. Also, the ModelState
does have some methods in it which allow tracking of dirty field state (of course, implemented in a custom state class). That way 3rd party projects could pretty easily implement tracking of dirty field state in the way they wish (hashing, flags, storing the old values...). Using the ._state information you can then implement custom .save() which automatically saves only changed fields. Note that the important parts of the patch are the ModelState
changes, and where .update()/clear() are called, rest is irrelevant to this ticket.
It seems clear that a default behavior of saving only changed fields will not get into Django core.
comment:72 by , 13 years ago
Cc: | added |
---|
when I use #4102-update_modified_fields_only.diff I meet a question that item.save() will update all field in the first time, but update specific changed field in the next.
so I change "self._reset_modified_attrs()" in the end of init not in the top of init. because in init setattr will be called, and problem is come.
comment:74 by , 13 years ago
I have been researching the same, and I agree with akaariai: implementing only_fields kwarg for model .save() sounds like a plan. Furthermore, it is simpler and more explicit.
Attached the patch based on the original and modified for the current version.
by , 13 years ago
Attachment: | only_fields_django-1.4.patch added |
---|
comment:75 by , 13 years ago
Needs tests: | set |
---|
A couple of design considerations:
- if only fields is given should it imply force_update=True. My take is yes, as I don't think there will be a use case for only_fields + insert. Set the field values to NULL instead. In addition only_fields + insert could lead to some complicated situations. So, maybe the kwarg should instead be update_fields?
- what if the model does not have fields given in the update_fields argument? Should it be an error or not. I guess erroring out could be wise, as otherwise this could hide data loss bugs. This should probably be done in .save() as a pre-check, and then .save_base just uses the given set without further checks.
- It might make sense to add the field set to pre/post save signals as an argument if this is doable from backwards compatibility viewpoint.
- Should a deferred model which is then saved have automatically update_fields set so that only the non-deferred fields will be saved. I think currently the fields are loaded from the database and then the update will update the fields to the just fetched values which does not make sense.
This should be tested with model inheritance, too. It is a common pain-point for this kind of feature. The current test doesn't actually seem to test that the name is updated, so in general more tests needed.
comment:76 by , 13 years ago
Cc: | added |
---|
Interesting considerations. I will perform more tests with your considerations in mind.
comment:77 by , 13 years ago
Here the new patch with more tests and some considerations implemented.
by , 13 years ago
Attachment: | update_fields_django-1.4.patch added |
---|
comment:78 by , 13 years ago
Looks good (although I haven't done a full review). Skipping deferred model handling is OK, it does actually contain some possible issues (load deferred model, set attributes from a form, save() must work). I think this could be dealt with, but lets just leave it out for now.
The pre/post save signal should get a new argument. There is some precedence to adding arguments to signals ("using" in 1.3 for example) so this should not be impossible to do from backwards compatibility point of view. The .save() is also documented to take kwargs, so changing save's signature should be OK, too.
by , 13 years ago
Attachment: | update_fields_django-1.4-2.patch added |
---|
follow-up: 82 comment:80 by , 13 years ago
The patch looks good to me, I can't spot any serious failures there. Some corner cases below. I would value other opinions on the issues below.
The patch still lacks the update_fields implies force_update=True part. Does somebody know of a valid use case for "only fields" behavior for inserts? My main concern here is that this breaks default values. Another option (as is done in the patch) is to not imply force_update, but do full save if the save leads to insert, so that the update_fields is in effect only for updates. The documentation is currently misleading about the behavior.
My take on this is to set force_update if update_fields is not None. There are two main use cases for this feature: performance (in which case you do want force_update=True), and "update only these fields, leave others as they are in DB". In the latter case doing an insert would be an error. On the other hand requiring the user to set the force_update=True is risk free from Django's perspective.
Should datetime fields with auto_now=True be always included in the fields list automatically?
If the update_fields is supplied but is empty shouldn't the save be skipped completely in that case (or if it contains only the PK field)?
And still one more thing: the update_fields is passed to the signal handler but its type is not known. If it happens to be a list (or any other mutable container), the signal handler can modify it in place. There is no safety checks for modifications done there. I am not sure if signal handlers being able to modify the list is useful or not, but at least the passed in type should be consistent: either a list or a tuple (actually, set or immutable set could make more sense).
EDIT: Just noted that there is .delete() calls in the end of the test methods: no need for that, Django's TestCase will do the cleanup for you.
comment:81 by , 13 years ago
Cc: | removed |
---|
follow-up: 83 comment:82 by , 13 years ago
Replying to akaariai:
The patch looks good to me, I can't spot any serious failures there. Some corner cases below. I would value other opinions on the issues below.
The patch still lacks the update_fields implies force_update=True part. Does somebody know of a valid use case for "only fields" behavior for inserts? My main concern here is that this breaks default values. Another option (as is done in the patch) is to not imply force_update, but do full save if the save leads to insert, so that the update_fields is in effect only for updates. The documentation is currently misleading about the behavior.
I've been thinking about this and I think you're right! update_fields
have to implies
force_update=True
. Although, I agree considerations!
My take on this is to set force_update if update_fields is not None. There are two main use cases for this feature: performance (in which case you do want force_update=True), and "update only these fields, leave others as they are in DB". In the latter case doing an insert would be an error. On the other hand requiring the user to set the force_update=True is risk free from Django's perspective.
Should datetime fields with auto_now=True be always included in the fields list automatically?
At this point, I think we should not automatically preselected fields "DateTimeField" with "auto_now = True". This option is for advanced use, and in this case one must know what is really saving.
If the update_fields is supplied but is empty shouldn't the save be skipped completely in that case (or if it contains only the PK field)?
If the update_fields is supplied but is empty, now, will have the same behavior when update_fields is not supplied. In my opinion the behavior is correct.
And still one more thing: the update_fields is passed to the signal handler but its type is not known. If it happens to be a list (or any other mutable container), the signal handler can modify it in place. There is no safety checks for modifications done there. I am not sure if signal handlers being able to modify the list is useful or not, but at least the passed in type should be consistent: either a list or a tuple (actually, set or immutable set could make more sense).
I think it should be a mutable object. Thus, if someone needs it, will have the ability to automatically add to the list datetime fields with auto_now = True.
I do not understand what you mean by specifying the type. If the parameter is None, the signal receives None. If the parameter list/tuple, the signal receives list/tuple.
Are you thinking about some kind of type checking?
EDIT: Just noted that there is .delete() calls in the end of the test methods: no need for that, Django's TestCase will do the cleanup for you.
I updated the documentation, but my English is not the best in the world. If you can improve it, I would be very grateful.
by , 13 years ago
Attachment: | update_fields_django-1.4-3.patch added |
---|
comment:83 by , 13 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
Replying to niwi:
At this point, I think we should not automatically preselected fields
DateTimeField
with "auto_now = True". This option is for advanced use, and in this case one must know what is really saving.
Agreed. This is like qs.update() - no addition of auto_now fields there.
If the update_fields is supplied but is empty, now, will have the same behavior when update_fields is not supplied. In my opinion the behavior is correct.
I am not sure about this one. Main concern here is that removing a field from update_fields can lead to implicit addition of all the model's fields to the update_fields which feels counter-intuitive.
I think it should be a mutable object. Thus, if someone needs it, will have the ability to automatically add to the list datetime fields with auto_now = True.
I do not understand what you mean by specifying the type. If the parameter is None, the signal receives None. If the parameter list/tuple, the signal receives list/tuple.
Are you thinking about some kind of type checking?
The problem is if the user gives update_fields as a tuple then the signal handler can not mutate it in place - if it is a set you can mutate it with add - if it is a list then you can append to it. So, I am thinking of converting the update_fields to a common type for the signal handler. However not converting can be seen as a feature, too.
I updated the documentation, but my English is not the best in the world. If you can improve it, I would be very grateful.
I will see what I can do. The most important thing right now is having somewhat correct information in correct places, not grammatical accuracy.
If I am not mistaken there are two open items:
- Should empty, but not None update_fields argument skip save completely instead of doing a normal save. My feeling is skip the save here.
- Should signal handlers be able to mutate the update_fields? It is possible to leave this up to the caller of .save() - if a list was given as update_fields, then it is mutable in the signal handler - if a tuple was given then it isn't. Maybe leaving this as is (but checking that isinstance(update_fields, (list, tuple))) is a good solution here.
I am leaving the "patch needs improvement" checked on grounds of not actually testing the patch...
comment:84 by , 13 years ago
I implemented the last considerations. And I added some more tests.
Thanks for your attention. ;)
by , 13 years ago
Attachment: | update_fields_django-1.4-4.patch added |
---|
comment:85 by , 13 years ago
Here is the latest patch, reviewed by akaariai in the pull request (https://github.com/django/django/pull/4)
by , 13 years ago
Attachment: | update_fields_django-1.5-1.patch added |
---|
comment:86 by , 13 years ago
Current pull request in one commit https://github.com/django/django/pull/33
follow-up: 88 comment:87 by , 13 years ago
The code looks generally sane to me.
A few remarks:
- Does the code behave properly with related models -- ie. you can
save(update_fields=['related'])
rather thansave(update_fields=['related_id'])
? I haven't seen a test for this. - There are tests for inherited models, which is a good idea. I'm not sufficiently familiar with the implementation of the ORM to tell if other features (like proxy models) are worth testing. If you think they are, a few more tests would be nice.
- This isn't a minor feature, it deserves a full paragraph in the release notes.
tests/modeltests/update_only_fields/__init__.py
should be totally empty.
comment:88 by , 13 years ago
Replying to aaugustin:
The code looks generally sane to me.
A few remarks:
- Does the code behave properly with related models -- ie. you can
save(update_fields=['related'])
rather thansave(update_fields=['related_id'])
? I haven't seen a test for this.
I will check this part and add more tests if necessary.
- There are tests for inherited models, which is a good idea. I'm not sufficiently familiar with the implementation of the ORM to tell if other features (like proxy models) are worth testing. If you think they are, a few more tests would be nice.
In my opinion, this does not affect these things. Only does the filter fields that enter the UPDATE statement. The rest of the default behavior is as if suddenly the model happens to have less fields than actually have.
In any case, I see to do more tests to be sure.
- This isn't a minor feature, it deserves a full paragraph in the release notes.
This part of me is complicated for me. My English is not very good.
tests/modeltests/update_only_fields/__init__.py
should be totally empty.
;)
comment:89 by , 13 years ago
Changes:
- add tests for proxied models.
- remove all content from tests/modeltests/update_only_fields/init.py
I confirm, testcases covers the first point (save related objects) and works correctly.
by , 13 years ago
Attachment: | update_fields_django-1.5-2.patch added |
---|
comment:90 by , 13 years ago
I just spotted that m2m fields are allowed in the "update_fields" arg. I guess they have to be forbidden, as this could lead to confusion: "I defined 'm2m' in the update_fields, but it did not get updated. Whats going on?". Maybe removing the PK field should be done, too. You can't actually update the primary key currently, and allowing it in the update_fields is misleading.
Maybe the right fix is to use something like [field.name for field in self._meta.fields if not field.primary_key] instead of get_all_field_names(). Not tested...
I can take care of the above points when committing the patch. Of course, if you want to do the work I am more than happy to let you do it... :)
comment:91 by , 13 years ago
I tried your suggestion and give the desired results. Now the fields are no longer allowed m2n. I added the corresponding test.
Regarding github and pull-requests ... always need to be all in a single commit? actually I'm a bit confused, and I hate to be opening pull-requests different each time you make a change or commit.
With this change I created a new branch, with the 3 commits I've made for this issue. And upload the patch here.
Unfortunately, documentation / paragraph in the release notes, I'll leave to you. I do not feel good ability to write this text.
by , 13 years ago
Attachment: | update_fields_django-1.5-3.patch added |
---|
comment:92 by , 13 years ago
I will take this ticket from here.
We don't have any official guidelines on how to work on github, but here is an approach I like.
# Assume you have django/django as "upstream" remote in your local git repo. # (if not: git add remote upstream git@github.com:django/django.git) # Start work on ticket. # Work in ticket_4102, make it track upstream/master branch. git checkout -b ticket_4102 upstream/master # do some commits: git add files git commit -m 'A commit message...' # check for whitespace errors git diff --check upstream/master # Spot some errors, correct files and commit git add files git commit -m 'DOH! whitespace error!' # The above commit should not go into Django's repository, so squash it # (HEAD~2 refers to last two commits on this branch) git rebase -i HEAD~2 # pick the original commit ("A commit message..."), squash (s as first char on line) the "DOH!" commit. # reword commit messages, usually keep the first message as is, drop the second message. # Run (relevant) tests! # ok, seems the work is ready, push it to your github repo (assumed to be origin here) git push origin ticket_4102 # make a pull request in github. # An annoying core-developer spots a world-ending issue (another whitespace error). # And, there have been some commits to the django/django repo too, so you need to update your patch to head. # In your local branch ticket_4102 rebase your work to current upstream master git fetch upstream git rebase # If conflicts above, correct & continue. # correct & commit the whitespace error, repeat the rebase -i step above # Run (relevant) tests! # Now, push again to origin. You have rebased your work. This changes the commit # history, and those who based their work on your work will hate you (they need to # rebase their work, too). However, this does not matter, as your branch is a # "topic branch" and nobody should trust your branch to be stable in commit history. # So force push changes: git push --force origin ticket_4102 # You have your work in a single corrected commit in the original pull request!
The above isn't tested.
Commit message guidelines (in short): A summary line of 50 chars. Separated with an empty line, paragraphs of 70 chars per line detailing the commit. Limits are soft, so if need be, break the characters per line rules.
A single commit always? No, separate your work into logical "blocks". For this ticket, there is just one logical block, so a single commit.
If the above seems confusing or a lot of work to do, here is another approach: At some point hand over your work to a committer who will finish it off.
As said, the above isn't official Django workflow, just my approach...
comment:93 by , 13 years ago
Thank you very much! It has helped me a lot to understand how git rebase works.
follow-up: 95 comment:94 by , 13 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Pull 41 contains an RFC patch - I will wait for a day or so for final comments on the patch.
comment:95 by , 13 years ago
Replying to akaariai:
Pull 41 contains an RFC patch - I will wait for a day or so for final comments on the patch.
I've just given the code a quick look over; other than a couple of minor typos (comments on the diff), it looks good to me. Congratulations to all involved in killing another 5 year old Django dragon!
comment:96 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fixed #4102 -- Allow update of specific fields in model.save()
Added the ability to update only part of the model's fields in
model.save() by introducing a new kwarg "update_fields". Thanks
to all the numerous reviewers and commenters in the ticket
Changeset: 365853da016f242937a657b488514e2f69fa6d82
comment:97 by , 11 years ago
Any interest in addressing the original interest of this ticket again? django-save-the-changes is one possible solution. I also opened a ticket on that project about the feasibility on porting it to core Django.
comment:98 by , 11 years ago
If, by referring to the "original" interest of the ticket, you're referring to automatically tracking the 'modified' fields -- then the answer is probably no. If you read the ticket history, you'll see why we didn't implement this approach, so unless you've got a new approach to the implementation, there isn't much to discuss.
I like this functionality for the stated performance reasons, but I can't help thinking that this could be done automatically by just looking for anything that's changed (although the tracking of before and after states may be more trouble than it's worth!).
If accepted, this would need a brief bit of documentation for database-api too (just a description and an example). Some tests would be good too.