Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#28010 closed New feature (fixed)

Add support for SELECT...FOR UPDATE OF

Reported by: Ran Benita Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: adam@…, Adam Chidlow Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using select_for_update() on a queryset appends FOR UPDATE to the query. If the query joins other tables, the effect is that all tables are locked. I believe this is not what most people using select_for_update() want, instead they only want to lock the row from the main table.

Example

Consider a scenario like this:

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.PositiveIntegerField()


class Order(models.Model):
    product = models.ForeignKey(Product, on_delete=models.PROTECT)
    status = models.CharField(max_length=100, choices=(
        ('NEW', 'New'),
        ('APPROVED', 'Approved'),
        ('CANCELED', 'Canceled'),
    )

    ...

    @classmethod
    def approve(cls, id):
        with transaction.atomic():
            order = Order.objects.select_related('product').select_for_update().get(pk=id)
            # ...Check order.product.price & stuff...
            order.status = 'APPROVED'
            order.save()

    @classmethod
    def cancel(cls, id):
        with transaction.atomic():
            # Similar...

Here, locking is needed in order to serialize approve() and cancel() on the same Order, to avoid race conditions.

I have also used select_related in approve(), in order to avoid an extra query. Potentially, there are many related objects, but to keep the example simple I added just one.

The interaction between select_related and select_for_update has the unintended consequence (IMO) of also locking the related Product row. Hence, if there is some concurrent task which updates product prices for example, it can cause conflicts, slowdowns, deadlocks, and other bad things.

The fix is to remove the select_related. But then extra queries are needed for each related object access without a reasonable workaround (I think?).

Possible solutions

PostgreSQL (which I am using) has an option to specify exactly which tables are to be locked, instead of locking all tables in the query: https://www.postgresql.org/docs/9.6/static/sql-select.html#SQL-FOR-UPDATE-SHARE. The syntax is SELECT ... FOR UPDATE OF orders_order.

Oracle also supports this with similar syntax, however it allows specifying which *columns* to lock: http://docs.oracle.com/javadb/10.8.3.0/ref/rrefsqlj31783.html. I guess there it is possible to lock specific columns of specific rows, while postgres only locks full rows.

I am not familiar with MySQL but it doesn't seem like it supports refining the FOR UPDATE.

Here are some possible solutions I can think of:

  1. Add support for select_for_update to specify which models to lock, e.g. select_for_update('self', 'product'). This will use the OF syntax.
  1. Only support the select_for_update('self') behavior, and make it implicit, i.e. select_for_update only ever locks the main model of the QuerySet.
  1. Add a way to do select_related on an already-existing object, something like a standalone select_related_objects after the existing prefetch_related_objects function: https://docs.djangoproject.com/en/1.10/ref/models/querysets/#prefetch-related-objects, and keep select_for_update as-is. Then it should at least be possible to do one extra query instead of one per related object.

Thanks for reading this far :)

Change History (14)

comment:1 by Simon Charette, 8 years ago

Triage Stage: UnreviewedAccepted
Version: 1.10master

Allowing the usage of FOR UPDATE OF is definitely something we want to add.

Now for the proposed options the one that makes the most sense to me is 1.. 2. would be backward incompatible given we actually lock all rows by default and some backends don't allow locking only a specific table which would make select_for_update() behave quite a bit differently on different backends.

comment:2 by Ran Benita, 8 years ago

Thanks Simon. I will try to implement approach 1. I will start with just supporting self (not implicit), and see if I can get it to work; that would already be useful.

Also, I need to correct my comment on Oracle a bit: while Oracle requires to specify a column, it does not "lock a column", it locks the entire row of the table containing the column. So it's equivalent to Postgres, only we should write SELECT ... FOR UPDATE OF "orders_order.id" in Oracle instead of SELECT ... FOR UPDATE OF "orders_order" in Postgres. Here's the doc: http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_10002.htm#SQLRF55372

Last edited 8 years ago by Ran Benita (previous) (diff)

comment:3 by Ran Benita, 8 years ago

Has patch: set

comment:4 by Adam Wróbel, 8 years ago

Cc: adam@… added

comment:5 by Tim Graham, 8 years ago

Summary: select_for_update() is too eager in presence of joinsAdd support for SELECT...FOR UPDATE OF

comment:6 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: newclosed

In b9f7dce8:

Fixed #28010 -- Added FOR UPDATE OF support to QuerySet.select_for_update().

comment:7 by Adam Chidlow, 7 years ago

This is a great feature. Thanks!

I've found an issue though, the validation seems to be a little-overzealous. It's not allowing reverse unique relationships (e.g. OneToOneField), even though they can be used with select_related.

comment:8 by Ran Benita, 7 years ago

Thanks for testing, Adam. I did not think of this scenario.

Please correct me of the following is incorrect:

Reverse OneToOne fields are nullable (might not exist on the other side), and Django does not allow select_for_update on such fields (you get "django.db.utils.NotSupportedError: FOR UPDATE cannot be applied to the nullable side of an outer join"). Hence, including such a field in of=(...) should not be allowed. In fact, in order to use select_for_update at all on such a query, you must use of=(...) *excluding* the reverse field, otherwise it will fail with the error above.

comment:9 by Adam Chidlow, 7 years ago

Hi Ran, I'm using select_for_update from the reverse side of a OneToOneField succesfuly, it just requires that you exclude(<related_field>=None) as well.

This is what I'm working with at the moment to be able to use of= as well: https://github.com/achidlow/django/commit/83fa6963a0b4b9f02c88b3abc84e20bce60d9035

(I'm not familiar with django's SQL internals so I don't know if that's the right fix, and in the unlikely event that it is, _get_field_choices also needs updating. However it's working for me right now)

comment:10 by Adam Chidlow, 7 years ago

Cc: Adam Chidlow added

comment:11 by Ran Benita, 7 years ago

I sent a patch https://github.com/django/django/pull/9043 to allow reverse relations.

comment:12 by Adam Chidlow, 7 years ago

Awesome, thanks! Do we need to change any flags here to bring the new pull request to someones attention?

comment:13 by Tim Graham <timograham@…>, 7 years ago

In 03049fb8:

Refs #28010 -- Allowed reverse related fields in SELECT FOR UPDATE .. OF.

Thanks Adam Chidlow for polishing the patch.

comment:14 by Tim Graham <timograham@…>, 7 years ago

In 6b5f2e3b:

[2.0.x] Refs #28010 -- Allowed reverse related fields in SELECT FOR UPDATE .. OF.

Thanks Adam Chidlow for polishing the patch.

Backport of 03049fb8d96ccd1f1ed0285486103542de42faba from master

Note: See TracTickets for help on using tickets.
Back to Top