Opened 7 years ago

Closed 10 months ago

#28344 closed New feature (fixed)

Add from_queryset parameter to Model.refresh_from_db()

Reported by: Patryk Zawadzki Owned by: Aivars Kalvāns
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Raphael Michel, Semyon Pupkov, Andrey Maslov, Alex Vandiver, Aivars Kalvāns, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have a project where we deal with many financial operations that rely on transactions and row-level locking a lot to guard against race conditions. Unfortunately it's a common pattern for a function to accept an instance of model A, start an atomic block, fetch a new instance of the same row with select_for_update(), do a change, call save() and then refresh the original instance. It would all be easier if I could just call instance.refresh_from_db(for_update=True) and it would save a me a DB roundtrip to refresh the original instance passed (Django ORM does not have a session manager so the old object will not otherwise reflect the changes).

Change History (38)

comment:1 by Tim Graham, 7 years ago

Summary: Add `for_update=False` to `refresh_from_db()`Add for_update parameter to Model.refresh_from_db()
Triage Stage: UnreviewedAccepted

Looks okay at first glance. Is the implementation fairly simple?

comment:2 by Mariusz Felisiak, 7 years ago

Owner: changed from nobody to Mariusz Felisiak
Status: newassigned

comment:3 by Simon Charette, 7 years ago

Version: 1.11master

Count me -0 on adding this given all the available select_for_update() options. I fear we might open a can of worms here and have to either say no to nowait, skip_locked and now of support in the future or allow for_update to be a dict delegating kwargs to the underlying select_for_update() call.

From my personal experience using select_for_update() with refresh_from_db() is not a common enough pattern to warrant the addition of this kwarg. Just me two cents.

comment:4 by Mariusz Felisiak, 7 years ago

Owner: Mariusz Felisiak removed
Status: assignednew

The basic implementation is fairly simple, but I also share concerns about full (with all options) support for select_for_update() with refresh_from_db().

comment:5 by Patryk Zawadzki, 7 years ago

Would refresh_for_update make more sense? My case is really common in domains where resources are either shared or have to be protected from race conditions.

comment:6 by Hugo Osvaldo Barrera, 7 years ago

I've had to deal with similar patterns many times (mostly, when avoiding race conditions is critical -- for example, when closing and charging an order).

However, I think I might prefer an alternative pattern for this; I'd very much prefer if a model instance can lock it's own row. Currently, I've to construct a single queryset like :

# Assume obj is a model instance:
MyModel.objects.filter(pk=obj.pk).select_for_update()
obj.some_attr = 13
obj.save()

A cleaner pattern for this sort of of thing would probably be something like:

with obj.lock_for_update():
    obj.some_attr = 13
# Would this be auto-saved at the end of the CM's block?

comment:7 by Patryk Zawadzki, 7 years ago

I don't think a context manager would work as there is no way to unlock a row save for committing the transaction.

Also would self.lock_for_update() be a refresh_from_db in disguise or would you prefer for it not to update the instance (potentially overwriting any unsaved changes)?

comment:8 by Hugo Osvaldo Barrera, 7 years ago

If the transaction is written as I did in my example, you don't need to call refresh_from_db, since you only ever keep the reference to one loaded instance.

I guess that saving when exiting the context manager would solve any doubts.

comment:9 by Patryk Zawadzki, 7 years ago

Hugo, I appreciate your example but in my case I am dealing with financial transactions where I absolutely need to keep a database audit trail. This means I can't just wrap everything in a large transaction and risk it getting rolled back. Instead I have a number of tiny atomic blocks that need to be placed with surgical precision. This means I have to fetch the same object multiple times for it to act as the transaction synchronization point (via SELECT FOR UPDATE within each atomic block).

Maybe it would be easier to introduce transaction.atomic_with_locked(obj) that would be the equivalent of:

with transaction.atomic():
    Foo.objects.select_for_update.get(pk=foo.pk)  # discarding the result is fine as the lock is now in place
    yield

comment:10 by Adam Hooper, 6 years ago

@Patryk that code block you suggest has a bug.

foo = Foo.objects.get(id=1)  # loads entire record (call these _old_ values)
# Now, what happens here if somebody _else_ writes to foo?
with transaction.atomic():
    Foo.objects.select_for_update.get(pk=foo.pk)  # selects and ignores _new_ values
    yield  # Modifies foo ... but foo.save() will save the _old_ values

I think this goes to show how necessary this feature is. Until it comes, I'll use this workaround:

foo = Foo.objects.get(id=1)
with transaction.atomic():
    Foo.objects.select_for_update.get(pk=foo.pk)
    foo.reload_from_db()
    yield

... which selects the row three times

comment:13 by Jure Erznožnik, 6 years ago

I'm investigating this ticket for possible implementation at PyCon UK sprints. Initial thoughts:

I like the pattern suggested above:

with obj.lock_for_update():
    obj.some_attr = 13
# Would this be auto-saved at the end of the CM's block?

However, this one implies both a transaction and an auto-save to me. So my proposal would be to "decorate" the lock_for_update() with three parameters:

  • atomic: bool=True
  • auto-save: bool=True
  • refresh_from_db: bool=True

Would this be an acceptable solution?

Last edited 6 years ago by Jure Erznožnik (previous) (diff)

comment:15 by Raphael Michel, 6 years ago

Cc: Raphael Michel added

comment:16 by Semyon Pupkov, 6 years ago

Cc: Semyon Pupkov added

comment:17 by Andrey Maslov, 5 years ago

Cc: Andrey Maslov added

comment:18 by Alessio Fachechi, 2 years ago

Sorry for commenting on a ticket update 3 years ago, but any plans on it? Honestly I don't think the trick-around is not a common enough pattern...

comment:19 by Jake Douglas, 17 months ago

ActiveRecord has obj.with_lock { do stuff } which I have used extensively in a variety of applications. I would appreciate something similar in Django.

comment:20 by Alex Vandiver, 12 months ago

Cc: Alex Vandiver added

in reply to:  13 comment:21 by Aivars Kalvāns, 11 months ago

Replying to Jure Erznožnik:

with obj.lock_for_update():
    obj.some_attr = 13
# Would this be auto-saved at the end of the CM's block?

However, this one implies both a transaction and an auto-save to me. So my proposal would be to "decorate" the lock_for_update() with three parameters:

  • atomic: bool=True
  • auto-save: bool=True
  • refresh_from_db: bool=True

Can you elaborate on what those parameters would do? If I understand refresh_from_db correctly - it opens the door to the bug mentioned in comment:10.

foo = Foo.objects.filter(...).get()
...
Foo.objects.select_for_update.get(pk=foo.pk)

it leads to a situation where the object foo differs from the now locked database record. That's why I would like to have a single operation to lock the object and refresh the object with the latest "locked" values. I'm all for something like refresh_from_db or refresh_for_update to avoid programming errors and reduce the number of database operations at the same time.

That said, I have objections to RoR way of obj.lock_for_update or obj.with_lock: it implies that the lock will be released once the block ends, but that is not the case. Once locked, the record will be locked until the main transaction is committed or rolled back. Nested transactions (savepoints) may release locks when rolled back, but it differs between databases: Oracle does not release locks, but Postgres does (IMHO). It looks nice, but it gives the wrong impression of how it works.

Last edited 11 months ago by Aivars Kalvāns (previous) (diff)

comment:22 by Aivars Kalvāns, 11 months ago

Cc: Aivars Kalvāns added

comment:23 by Aivars Kalvāns, 11 months ago

with transaction.atomic():
    a = Account.objects.get(id=1)
    with transaction.atomic():
        Account.objects.select_for_update().get(id=a.pk)
        a.refresh_from_db()
    # The lock is not released here
# The lock is released here

For obj.with_lock to match what is happening in the database we would have to ensure it's called outside transaction but that makes it useless when we have to acquire more than one lock

comment:24 by Aivars Kalvāns, 11 months ago

Has patch: set

in reply to:  3 comment:25 by Aivars Kalvāns, 11 months ago

Replying to Simon Charette:

I fear we might open a can of worms here and have to either say no to nowait, skip_locked and now of support in the future or allow for_update to be a dict delegating kwargs to the underlying select_for_update() call.

I think most of those parameters do not apply to operations on the specific model:

of when multiple models have to be locked, a QuerySet should be the right choice

skip_locked does not make sense when working with one specific instance of Model (primary key)

nowait might be interesting - something similar to threading.Lock.acquire(blocking=False). But unlike acquire which returns a boolean, select_for_update raises DatabaseError. Raising an exception from refresh_from_db would suggest the object is in an invalid state. Code that uses the object after the exception would raise eyebrows during code review. Making refresh_from_db return a boolean indicating if it succeeded or not could also lead to issues because people are not used to checking the result of refresh_from_db.

So I think the default select_for_update is the only right solution. Adding nowait might be an option together with a new dedicated method refresh_for_update but I haven't seen use for that in my domain.

comment:26 by Simon Charette, 11 months ago

Cc: Simon Charette added

of when multiple models have to be locked, a QuerySet should be the right choice

What about MTI when a single model involves multiple tables which is something refresh_from_db(fields) supports? MTI is the main reason of was added in the first place so isn't incoherent that refresh_from_db(fields) supports inherited fields but refresh_from_db(for_update) doesn't?

skip_locked does not make sense when working with one specific instance of Model (primary key)

That I agree with.

So I think the default select_for_update is the only right solution.

I'm still struggling to see the appeal for adding this option to be honest. The reported use case is to handle calls that pass model instance and want to lock the object for the duration of the transaction. Thing is refresh_from_db overwrites any attributes that might have been set on the objet passed by reference and results in the creation of a new temporary model instance anyway so what are the benefits over simply doing a queryset lookup with select_for_update by the primary key and doing a few attribute assignments?

in reply to:  26 comment:27 by Aivars Kalvāns, 11 months ago

Replying to Simon Charette:

So I think the default select_for_update is the only right solution.

I'm still struggling to see the appeal for adding this option to be honest. The reported use case is to handle calls that pass model instance and want to lock the object for the duration of the transaction. Thing is refresh_from_db overwrites any attributes that might have been set on the objet passed by reference and results in the creation of a new temporary model instance anyway so what are the benefits over simply doing a queryset lookup with select_for_update by the primary key and doing a few attribute assignments?

I think all this applies to refresh_from_db regardless of locking or not. I don't know what pros and cons were discussed before adding refresh_from_db. Nothing prevents us from assigning attributes after lookup by the primary key but refresh_from_db makes it easier when we do not care about previous values.

The locking part gets a bit more annoying in the financial domain because it's a common approach to optimize for concurrency: first reading models with all related models and performing status and conditions checks and locking models just before modifications. I also write entry journals and audit trails before doing updates and rely on rollbacks to clean that up if updates fail.

comment:30 by Raphael Michel, 11 months ago

To provide some additional perspective with some real-world use cases:

I agree that there is no use case for skip_locked here, and for of only with model inheritance (which I never used). I do see a theoretical use-case for nowait, but I have never come across wanting it in real code.

I have come across wanting a refresh_for_update-style thing as discussed here many times, though. As others mentioned, usually in relation with an encapsulated piece of business logic that gets passed an instance and requires locking. Sure, the business logic could just acquire a new version of that instance, but then the code calling the business logic needs to know that the instance it passed into the business logic no longer is a valid version and *also* needs to refresh their instance, which is not an intuitive interface.

In at least some situation, I even built my own little refresh_from_db within my own code, which certainly is not nice:

https://github.com/pretix/pretix/blob/40cdb0c5077ae34b56e0263cfcf87c3558a1d042/src/pretix/base/models/orders.py#L1797-L1814

So I'm +1 for adding *something* that helps with this, and I'd personally think a new refresh_for_update(using, fields, of, nowait) function is the clearest and most future-safe design.

(Sorry for first updating the description instead of adding a commend, I have not used trac in a while and used the wrong text box. I reverted it immediately)

comment:31 by Simon Charette, 11 months ago

Thanks for chiming in everyone.

So given the main benefits of refresh_from_db is about the encapsulation of object fetching and attribute assignment and the main concerns are about choosing an interface that isn't too limiting giving the large number of options that QuerySet has could an acceptable solution be that instead of focusing on select_for_update we'd allow a from_queryset: QuerySet to be passed to the method and would be used instead of self.__class__._base_manager when provided.

That would allow

queryset = Author.objects.select_for_update()
author.refresh_from_db(from_queryset=queryset)

while allowing any necessary option to be passed to select_for_update but also allow refresh_from_db to respect other criteria such as soft-deletion

queryset = Author.objects.all()  # As opposed to `_base_manager` the default manager could be doing `filter(deleted_at=None)`
author.refresh_from_db(from_queryset=queryset)  # Which would respect raising `DoesNotExist`

That would make implementing a model mixin that supports a select_for_update override quite easy

class BaseModel(models.Model):
    class Meta:
        abstract = True

    def refresh_from_db(self, using=None, fields=None, from_queryset=None, select_for_update=False):
        if select_for_update:
            from_queryset = (
                self.__class__._base_manager.db_manager(using)
                if from_queryset is None
                else from_queryset
            ).select_for_update()
        return super().refresh_from_db(fields=fields, from_queryset=from_queryset)
Last edited 11 months ago by Simon Charette (previous) (diff)

in reply to:  31 comment:32 by Aivars Kalvāns, 11 months ago

Replying to Simon Charette:

Thanks for chiming in everyone.

could an acceptable solution be that instead of focusing on select_for_update we'd allow a from_queryset: QuerySet to be passed to the method and would be used instead of self.__class__._base_manager when provided.

That would do it but I started thinking about what other possibilities it opens. Current parameters using and fields can be applied directly to the queryset and the whole function would be refresh_from_queryset instead of refresh_from_db.

queryset = Author.objects.using('default').only(*fields)
author.refresh_from_queryset(queryset)
or
author.refresh_from_db(from_queryset=queryset)

comment:33 by Simon Charette, 11 months ago

I don't think it would be a problem for refresh_from_db to apply fields and using if explicitly provided with from_queryset

  • django/db/models/base.py

    diff --git a/django/db/models/base.py b/django/db/models/base.py
    index 6eaa600f10..8a9cbd606e 100644
    a b def get_deferred_fields(self):  
    672672            if f.attname not in self.__dict__
    673673        }
    674674
    675     def refresh_from_db(self, using=None, fields=None):
     675    def refresh_from_db(self, using=None, fields=None, from_queryset=None):
    676676        """
    677677        Reload field values from the database.
    678678
    def refresh_from_db(self, using=None, fields=None):  
    704704                    "are not allowed in fields." % LOOKUP_SEP
    705705                )
    706706
    707         hints = {"instance": self}
    708         db_instance_qs = self.__class__._base_manager.db_manager(
    709             using, hints=hints
    710         ).filter(pk=self.pk)
     707        if from_queryset is not None:
     708            if using is not None:
     709                from_queryset = from_queryset.using(using)
     710        else:
     711            hints = {"instance": self}
     712            from_queryset = self.__class__._base_manager.db_manager(
     713                using, hints=hints
     714            )
     715        db_instance_qs = from_queryset.filter(pk=self.pk)
    711716
    712717        # Use provided fields, if not set then reload all non-deferred fields.
    713718        deferred_fields = self.get_deferred_fields()

That would mean that

refresh_from_db(using='other', from_queryset=Author.objects.using('default'))  # would use `other`
refresh_from_db(fields=['first_name'], from_queryset=Author.objects.only('last_name')) # would use ['first_name']

Why do you think the addition of a refresh_from_queryset method is warranted?

Last edited 11 months ago by Simon Charette (previous) (diff)

comment:34 by Aivars Kalvāns, 11 months ago

I'm ok with refresh_from_db, I was thinking out loud that the possibility of passing a queryset makes existing parameters redundant.

comment:35 by Aivars Kalvāns, 11 months ago

Just another idea: using from_queryset allows us to specify select_related on the queryset and with a couple of lines in refresh_from_db we could make it work and avoid extra queries when accessing related fields later.

comment:36 by Simon Charette, 11 months ago

I'm ok with refresh_from_db, I was thinking out loud that the possibility of passing a queryset makes existing parameters redundant.

Makes sense, we do have similar APIs where some flags are redundant with others so I don't think it's a huge deal.

Just another idea: using from_queryset allows us to specify select_related on the queryset and with a couple of lines in refresh_from_db we could make it work and avoid extra queries when accessing related fields later.

yeah it seems like it opens more doors and embrace the facilitation of attribute assignment pattern better.

comment:37 by Aivars Kalvāns, 11 months ago

I updated the PR

comment:38 by Mariusz Felisiak, 11 months ago

Owner: set to Aivars Kalvāns
Status: newassigned

comment:39 by Mariusz Felisiak, 11 months ago

Patch needs improvement: set

comment:40 by Aivars Kalvāns, 11 months ago

Patch needs improvement: unset

Solved all review comments

comment:41 by Mariusz Felisiak, 10 months ago

Summary: Add for_update parameter to Model.refresh_from_db()Add from_queryset parameter to Model.refresh_from_db()

comment:42 by Mariusz Felisiak, 10 months ago

Triage Stage: AcceptedReady for checkin

comment:43 by Mariusz Felisiak <felisiak.mariusz@…>, 10 months ago

Resolution: fixed
Status: assignedclosed

In f92641a6:

Fixed #28344 -- Allowed customizing queryset in Model.refresh_from_db()/arefresh_from_db().

The from_queryset parameter can be used to:

  • use a custom Manager
  • lock the row until the end of transaction
  • select additional related objects
Note: See TracTickets for help on using tickets.
Back to Top