Context Navigation

← Previous Ticket
Next Ticket →

#13906 closed Cleanup/optimization (wontfix)

REPEATABLE READ (as used by default on MySQL) breaks atleast QuerySet.get_or_create().

Reported by:	Sebastian Noack	Owned by:
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:	mysql transaction isolation
Cc:	graham@…, tomasz.zielinski@…, miloslav.pojman@…, jim.dalton@…, anssi.kaariainen@…, David Gouldin, clelland@…, gerdemb, mmitar@…, TTimo, cal@…, django@…	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	yes
Needs tests:	no	Patch needs improvement:	yes
Easy pickings:	no	UI/UX:	no

Description

MySQL is the only database supported by django, which support transactions but is using an other isolation level than READ COMMITED. MySQL is using REPEATABLE READ by default. That means that when you execute a select statement, the result set will be the same for all subsequent queries, selcting the same rows (even if no rows ere returned). This obviously breaks concurrent calls of QuerySet.get_or_create(). It is hard to writing code using transactions, running on multiple RDBMS using different isolation levels. So i suggest, to make the MySQL backend set the isolation level to READ COMMITTED on each new connection. And add a setting disabling this behaviour for those who really know what they are doing or those who have configured it already as global default in their MySQL server.

Attachments (2)

13906.diff (715 bytes ) - added by Graham King 14 years ago.: Adds commented out OPTIONS to conf/project_template/settings.py setting tx level
13906_get_or_create.diff (876 bytes ) - added by Ian Clelland 13 years ago.: Reordering reads in get_or_create

Download all attachments as: .zip

Change History (42)

comment:1 by Anssi Kääriäinen, 15 years ago

Setting transaction level to READ COMMITED does not solve your problem. A sequence showing erroneous behaviour in REPEATABLE READ:

t1: SELECT * FROM foo WHERE condition # transaction 1 get part.
t1: INSERT INTO ... # create
t2: SELECT * FROM foo WHERE condition # transaction 2 get part, t2 sees nothing here.
t1: commit()

What would (at least partly) solve your problem is unique index on the where condition, then the DB would be able to block the t2's create.

comment:2 by Sebastian Noack, 15 years ago

Can you explain a bit more what you are trying to show and what you expect to happen instead? Could you maybe also provide a text file with the complete sql sessions instead of some commented pseudo code, to better reproduce what you are talking about? Thanks.

comment:3 by Sebastian Noack, 15 years ago

I have written about that issue in my blog.

http://www.no-ack.org/2010/07/mysql-transactions-and-django.html

comment:4 by Anssi Kääriäinen, 15 years ago

Ok, I see your point.

However I seriously doubt that changing the default transaction isolation level will be accepted, as it is a big backwards incompatible change. A better option would probably be isolation level setting in settings.py.

comment:5 by Sebastian Noack, 15 years ago

The solution I have proposed includes adding a setting, but would make READ COMMITTED the default. Yes, this would break backward compatibility. But the user just have to set a setting, to get the old behaviour back. And on the other hand it makes most ORM code even more compatible with MySQL. So code that was tested against any other database backends would work more reliable on MySQL with that change.

Btw, I have looked at modeltests/transactions/models.py and noted that the tests in there are bypassed for mysql. I don't have looked in detail at the tests, but it looks like that there are known issues with MySQL and transactions. And I would not wonder if that is because of the different isolation level.

comment:6 by Adam Nelson, 15 years ago

I've created #14026 to track the request for a settings.py variable. This ticket should just be for changing the default and should probably be hashed out on the django-developers group.

I personally am in favor of putting something like this in milestone:1.3 or milestone:1.4

Although the change would be slightly backwards-incompatible it would be nowhere near as radical as other changes made in milestone:1.2 such as template caching which would have broken custom template tags that weren't thread safe.

comment:7 by Russell Keith-Magee, 14 years ago

Triage Stage:	Unreviewed → Accepted

Accepted that the problem exists; I have no particular opinion on the right solution, however. It may turn out to just be a documentation issue.

comment:8 by Graham King, 14 years ago

Cc:	graham@… added
Easy pickings:	unset
Has patch:	set
Keywords:	mysql transaction isolation added
Owner:	changed from nobody to Graham King
Severity:	→ Normal
Status:	new → assigned
Type:	→ Cleanup/optimization
Version:	1.2 → SVN

As a suggestion, I've attached a patch that adds these two lines to conf/project_template/settings.py:

 # Recommended for MySQL. See http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
 #'OPTIONS': {'init_command': 'SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED'},

With it commented in, on MySQL, all the tests still pass.

comment:9 by Tomasz Zieliński, 14 years ago

Cc:	tomasz.zielinski@… added

There are already tickets on this, one is here:

http://code.djangoproject.com/ticket/6641

along with a discussion: https://groups.google.com/forum/#!topic/django-developers/U4eiCVhEOxU

My opinion is that switching to READ COMMITED level might be hazardous because it's less tested:

http://www.mysqlperformanceblog.com/2010/02/11/read-commited-vs-repetable-read-in-tpcc-like-load/

"However, however.
For 10w run I got 60 Deadlock errors “1213, Deadlock found when trying to get lock; try restarting transaction” in READ-COMMITED mode, and 0 Deadlocks in REPEATABLE-READ mode."

and from the first comment below the article:

"For some time we have used Read-Committed and has encountered alot of weird bugs with this isolation level, it seems
that everytime the mysql team made a new feature, it wasnt tested as well with read-committed isolation level as the commonly used repeatable-read."

Therefore, my personal view is that we should fix get_or_create(), and supplement Django docs with some information about MySQL defaul isolation level and the consequences of using it.

follow-up: 13 comment:10 by Sebastian Noack, 14 years ago

In my opinion using transactions with MySQL is hazardous anyway, because of MySQL's entire transaction management is poorly tested, as I have recently proofed. It only matters whether your application expects MySQL's buggy behavior or not. And since any other RDBMS supported (and well tested) by django is using READ COMMITTED, in my opinion django-based applications will run more reliable on MySQL with READ COMMITTED.

Also note that there is no way to fix get_or_create for REPEATABLE READ, because of it would require to commit the transaction after the attempt to get the data and we can't do that if django's transaction management is in manual state.

comment:11 by Graham King, 14 years ago

I was wrong when I said the tests all still pass. I was only running the app tests (./manage test) instead of the core tests (./runtests.py). Will update here once I've run the real tests!

comment:12 by Graham King, 14 years ago

Owner:	Graham King removed
Status:	assigned → new

Re my previous comment, I can't run the tests. They don't run on MySQL / InnoDB because of #3615, and there's no point testing transaction isolation on MyISAM.

in reply to: 10 comment:13 by Tomasz Zieliński, 14 years ago

Replying to sebastian_noack:

In my opinion using transactions with MySQL is hazardous anyway, because of MySQL's entire transaction management is poorly tested, as I have recently proofed. It only matters whether your application expects MySQL's buggy behavior or not. And since any other RDBMS supported (and well tested) by django is using READ COMMITTED, in my opinion django-based applications will run more reliable on MySQL with READ COMMITTED.

Also note that there is no way to fix get_or_create for REPEATABLE READ, because of it would require to commit the transaction after the attempt to get the data and we can't do that if django's transaction management is in manual state.

I think that you are right about this and MySQLdb should also work in READ COMMITED mode by default, or at least it should be strongly advised in the documentation.

comment:14 by Adam Nelson, 14 years ago

We moved to READ COMMITTED 2 months ago and have seen far fewer database problems.

@graham_king I can't replicate your bug in MySQL 5.5.8

by Graham King, 14 years ago

Attachment:	13906.diff added

Adds commented out OPTIONS to conf/project_template/settings.py setting tx level

comment:15 by Graham King, 14 years ago

milestone:	→ 1.4

I changed the url in the patch to link to this ticket, instead of directly to Sebastien's blog.

Do you think the attached patch is the right way to go? If so, the ticket is Accepted, and it only adds a comment, so is it time for attention from a core developer? I'm tentatively adding Milestone 1.4.

I have been bitten by REPEATABLE READ when using Django's ORM from a long-running worker (the type of thing that listens on a message queue), so I'm a strong +1 here.

@adamnelson You mean you can run the ./runtests.py test suite on MySQL / InnoDB? You don't get an error importing the fixture in aggregation? This is probably off-topic, so could you mail me directly: graham@…

comment:16 by Miloslav Pojman, 14 years ago

Cc:	miloslav.pojman@… added

comment:17 by Karen Tracey, 14 years ago

UI/UX:	unset

The repeatable read default is tripping up one of our tests:

./runtests.py --settings=testdb.innodb select_for_update.SelectForUpdateTests.test_block
Creating test database for alias 'default'...
Creating test database for alias 'other'...
F
======================================================================
FAIL: test_block (modeltests.select_for_update.tests.SelectForUpdateTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kmtracey/django/hg-django/django/test/testcases.py", line 613, in skip_wrapper
    return test_func(*args, **kwargs)
  File "/home/kmtracey/django/hg-django/django/test/testcases.py", line 613, in skip_wrapper
    return test_func(*args, **kwargs)
  File "/home/kmtracey/django/hg-django/tests/modeltests/select_for_update/tests.py", line 209, in test_block
    self.assertEqual('Fred', p.name)
AssertionError: 'Fred' != u'Reinhardt'

----------------------------------------------------------------------
Ran 1 test in 0.301s

FAILED (failures=1)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

It's failing because the test continues to read the already-read value even though another transaction has committed a different value. We either need to fix the test to avoid this behavior or (better) figure out how to fix this problem more globally. Repeatable read does seem to cause more problems than anything else, but I'm a little worried about backwards incompatibility of changing the default behavior under Django. Hmpf.

comment:18 by Jim Dalton, 14 years ago

Cc:	jim.dalton@… added
Needs documentation:	set
Patch needs improvement:	set

It seems like the consensus is pretty strong that the problem is being caused by REPEATABLE READ and that the solution to the problem is switching the isolation mode to READ COMMITTED. It also sounds like, judging from TomaszZielinski's comment above, READ COMMITTED can cause erratic behavior sometimes.

MySQL users are basically going to have to pick their poison: a) go with REPEATABLE READ, which is more proven (and probably more dependable) but which will cause ORM behavior that is inconsistent with other backends, or b) go with READ COMMITTED, which will provide for consistent ORM behavior but could *possibly* result in other issues.

I don't see any other alternatives so I agree that the current patch has the right idea. That said, the current patch is more of a suggested workaround than a solution. I'd like to argue for stronger support of the set isolation level feature, akin to what's available for the Postgresql backend. I also think whatever we do needs documentation.

So I'm thinking about:

An explicit key isolation_level in the db OPTIONS dict for MySQL (akin to the autocommit key for Postgresql). Possible values would be one of ['READ-UNCOMMITTED', 'READ-COMMITTED', 'REPEATABLE-READ', 'SERIALIZABLE'].
A set_isolation_level method on the DatabaseWrapper which implements the SET TRANSACTION statement. This should be triggered in DatabaseWrapper.__init__. Should probably set an isolation_level attribute so the isolation level status can be checked.
Tests for the feature.
Documentation for the feature in database notes, with a brief explanation of the tradeoffs.

This is more or less what the OP was describing as his suggested fix.

I'd be happy to work on a patch for along these lines, but just wanted to see if anyone else thinks the above makes sense. Let me know...

comment:19 by Anssi Kääriäinen, 14 years ago

Cc:	anssi.kaariainen@… added

This raises the question: if MySQL has the option of setting different isolation levels, then why not allow it for other databases? If the reason is that the ORM is designed to only handle transactions in isolation level READ COMMITTED, then why allow MySQL the exception?

It would be nice to support different isolation levels. PostgreSQL 9.1 will have true serializable transactions which would be nice to use in Django applications. Other databases, (for example DB2 which does not use MVCC) can have bigger tradeoffs related to transaction isolation level.

comment:20 by Miloslav Pojman, 14 years ago

I would use SELECT ... FOR UPDATE or SELECT ... LOCK IN SHARE MODE option and get fresh data instead of changing global transaction level because of one concrete bug.

This should be easy now when Queryset.select_for_update method was implemented [16058].

Note that "REPEATABLE READ" isolation level is more strict that "READ COMMITTED" - which means that switching to "READ COMMITTED" level does not guarantee anything extra. If it fixes this problem it is just implementation detail which we should not rely on.

comment:21 by Anssi Kääriäinen, 14 years ago

SELECT FOR UPDATE does not solve this. At least in PostgreSQL if there is no row there, then it does nothing. MySQL's documentation says it has the same behavior. So, if there are 2 transactions inserting, then you would have:

SELECT FOR UPDATE # Sees nothing, thus locks nothing. 
# Another transaction inserts a row
# another transaction commits
INSERT # can't insert, IntegrityError
SELECT # can't see it.

I am not completely sure of this, but it seems that in REPEATABLE READ there is nothing you can do. By definition you can't see the row of the other transaction if it was not visible in the beginning of your transaction. Unique index ensures you can't insert it.

comment:22 by Miloslav Pojman, 14 years ago

MySQL manual states that:

A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus, it sets the same locks a searched SQL UPDATE would set on the rows.

http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html

I have tested it - see bellow - and it works for me. You need to use the FOR UPDATE statement in second select, not the first one.

Btw if SELECT FOR UPDATE is used for with the first select dead lock can occur - see second test bellow.

Using FOR UPDATE with second select

Client 1 - selects and gets empty result set.

mysql> DROP TABLE IF EXISTS t;
Query OK, 0 rows affected (0.05 sec)

mysql> CREATE TABLE t (c INT);
Query OK, 0 rows affected (0.12 sec)

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE c=1;
Empty set (0.00 sec)

Client 2 - inserts new row

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE c=1;
Empty set (0.00 sec)

mysql> INSERT INTO t VALUES(1);
Query OK, 1 row affected (0.00 sec)

Client 1 - still selects empty result set but if for update is used, it blocks (because Client 2 has lock on that row)

mysql> SELECT * FROM t WHERE c=1;
Empty set (0.00 sec)

mysql> SELECT * FROM t WHERE c=1 FOR UPDATE;

Client 2 - commits

mysql> commit;
Query OK, 0 rows affected (0.04 sec)

Client 1 - returns inserted result

+------+
|    1 |
+------+
1 row in set (7.03 sec)

Using FOR UPDATE with first SELECT

Client 1 - selects empty result set with FOR UPDATE option.

mysql> DROP TABLE IF EXISTS t;
Query OK, 0 rows affected (0.05 sec)

mysql> CREATE TABLE t (c INT);
Query OK, 0 rows affected (0.14 sec)

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE c=1 FOR UPDATE;
Empty set (0.00 sec)

Client 2 - selects empty result set too, trying to insert row blocks

mysql> set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE c=1 FOR UPDATE;
Empty set (0.00 sec)

mysql> INSERT INTO t VALUES(1);

Client 1 - trying to insert causes dead lock

mysql> INSERT INTO t VALUES(1);
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Client 2 - row is inserted

Query OK, 1 row affected (9.99 sec)

comment:23 by Jim Dalton, 14 years ago

@mila - I can see how SELECT FOR UPDATE would solve the particular problem with get_or_create. Though I haven't tried implementing anything, your demonstration is pretty convincing that it could work in this context.

One of my concerns is about the lock though. Since there is no explicit transaction commit in get_or_create, when would SELECT FOR UPDATE release its lock, assuming we implemented it here? The fact that it wouldn't get released until the commit (which in many cases is at the end of a request) could mean that it introduces some unexpected behavior into the mix. For example, if you did a regular SELECT following all this, in the same transaction, would it not block?

Also, this ticket is not confined to just the get_or_create issue. There's the larger question of whether or not READ COMMITTED is a more sensible default for MySQL to bring it in alignment with the other backends. The only two places I'm aware of where it has been an issue is here (get_or_create) and, interestingly enough, the select_for_update issue that kmtracey reported, but there could be others. I still believe there is a strong case to be made for a consistent isolation level for all backends, particularly since Django is written with the READ COMMITTED model in mind.

In short, I think your idea is valid (though I think there are some issues that would still need to be worked out), but I also still think READ COMMITTED is a more sensible default for Django.

comment:24 by Jim Dalton, 14 years ago

Note: I just submitted a patch for the issue Karen reported with regards to SelectForUpdateTests.test_block() in #16490. That issue was better solved (in my opinion) by committing the transaction in the test to get a fresh read rather than waiting on a larger decision on REPEATABLE READ vs. READ COMMITTED. So that particular issue doesn't really have much of a bearing on this discussion after all. get_or_create() is really the only place where there is an issue.

follow-up: 26 comment:25 by Anssi Kääriäinen, 14 years ago

Ok, I stand corrected. MySQL works differently than PostgreSQL here, and SELECT FOR UPDATE will actually work.

But there is another problem in SELECT FOR UPDATE: it returns data which is not visible if you do a regular select. That is select * from t where c=1 will return different results that select * from t where c=1 FOR UPDATE. This results in that if you later try to select the same info from the table without FOR UPDATE, it doesn't exist.

I am not sure if the cure is worse than the disease. It seems highly confusing that if you get_or_create(), you will get an object, but if you then try to query for the object again from the database for some reason, it is not there! Saving it doesn't work either, as Django's save is implemented as

   # Check if the object exists, if yes update, else insert.
   SELECT * FROM t WHERE c = 1; # This returns nothing, because of no SELECT FOR UPDATE, 
                                # so we are going to insert a new row into the DB
   INSERT INTO t values...      # Whoops, there actually was a row in there, which is not 
                                # visible to us, even though we got it from the DB in the same transaction...

If you do a save(force_update=True) it will actually update the existing row. My opinion is that the above is confusing as hell for users. I don't know what would happen if you do a SELECT FOR UPDATE in save when checking for the existing row. I suppose that could work...

I wonder what is the reason why we don't do .save() as UPDATE, check if something was updated, if not, then INSERT. That would save one query for the situations where you are updating. I don't know if the UPDATE without modifications is more expensive than the SELECT, so it might be that situations where you are inserting would be more expensive. The get_or_create returned object would be successfully saved using the above logic, too. Maybe there is an obvious reason which I am missing?

in reply to: 25 comment:26 by Miloslav Pojman, 14 years ago

Replying to akaariai:

But there is another problem in SELECT FOR UPDATE: it returns data which is not visible if you do a regular select. That is select * from t where c=1 will return different results that select * from t where c=1 FOR UPDATE. This results in that if you later try to select the same info from the table without FOR UPDATE, it doesn't exist.

I am not sure if the cure is worse than the disease. It seems highly confusing that if you get_or_create(), you will get an object, but if you then try to query for the object again from the database for some reason, it is not there!

Yes, that is a really strange behavior. I can not understand MySQL here.

But note that with the "READ COMMITTED" isolation level (PosgreSQL default) it can always happen if you are selecting one object twice.

SELECT * FROM t WHERE c = 1; # Returns result
SELECT * FROM t WHERE c = 1; # Whoops, it's not there, someone other transaction has deleted it.

Saving it doesn't work either, as Django's save is implemented as
   # Check if the object exists, if yes update, else insert.
   SELECT * FROM t WHERE c = 1; # This returns nothing, because of no SELECT FOR UPDATE, 
                                # so we are going to insert a new row into the DB
   INSERT INTO t values...      # Whoops, there actually was a row in there, which is not 
                                # visible to us, even though we got it from the DB in the same transaction...
If you do a save(force_update=True) it will actually update the existing row. My opinion is that the above is confusing as hell for users. I don't know what would happen if you do a SELECT FOR UPDATE in save when checking for the existing row. I suppose that could work...

I agree again - this is not a behavior I would expect. But this can also happen even now with the "READ COMMITTED" isolation level.

SELECT * FROM t WHERE pk = 1; # Returns nothing, lets do INSERT
INSERT INTO t VALUES ... ;    # Whoops, there actually is a row there, other transaction has just inserted it

Using SELECT FOR UPDATE when checking for existence is probably not a good way to go - as I showed in my second test in my previous comment it can cause dead lock.

I wonder what is the reason why we don't do .save() as UPDATE, check if something was updated, if not, then INSERT. That would save one query for the situations where you are updating. I don't know if the UPDATE without modifications is more expensive than the SELECT, so it might be that situations where you are inserting would be more expensive. The get_or_create returned object would be successfully saved using the above logic, too. Maybe there is an obvious reason which I am missing?

I wonder too. I think I have never needed the "does not exists, let's insert" branch - because I'm using auto increment primary keys and no select is required when pk is None. And even without auto increment, Model.objects.create is often used for creation.

But "it exists, let's update, I did one extra query" is pretty common - every time you get and then save object without force_update=True (e.g. every time you use model forms).

comment:27 by Anssi Kääriäinen, 14 years ago

There is one good reason why .save() should do a select first. It is very explicitly documented that this is what save() does [1]. I wonder if that is really a backwards compatibility requirement even if it is documented that explicitly.

For the matter if databases should be allowed to specify the isolation level, I don't understand the reason for not allowing it. The reasoning goes something like this:

Django is designed to work in READ COMMITTED mode. You can not change this because Django might not work properly in other isolation levels. However, for backwards compatibility reasons MySQL works in REPEATABLE READ mode and because of this it does not work correctly in some cases. We do not allow you to change this because Django might not work properly in other than READ COMMITTED mode. Hmmm...

So at least MySQL users should be allowed to set the isolation level to READ COMMITTED, but IMHO it would be good to allow isolation level settings for all databases.

One choice here would be to allow users to specify other isolation levels and document that Django is designed to work best in READ COMMITTED mode. Use other level at your own risk. The default for new projects is READ COMMITTED, but for backwards compatibility reasons REPEATABLE READ is used for MySQL if not otherwise specified in the settings.py file. The downside is that for new projects, the ISOLATION_LEVEL setting must be in the settings.py file.

Reasons for other databases are that some databases really work much better in other isolation levels (non MVCC databases for example), and that you might be using a legacy database that is designed to work in other isolation levels. Also, PostgreSQL 9.1 and true serializable transactions... :)

[1] https://docs.djangoproject.com/en/dev/ref/models/instances/#how-django-knows-to-update-vs-insert

comment:28 by David Gouldin, 14 years ago

Cc:	David Gouldin added

comment:29 by Ian Clelland, 13 years ago

Cc:	clelland@… added

I believe that this can be fixed without changing the isolation level, if MySQL savepoints are properly implemented (cf. #15507).

If we re-order the savepoint setting and the initial read in get_or_create, then rolling back to the savepoint in the exception handler releases the IX lock, effectively rolling back to a point *before the read happened*. At that point, the second SELECT is not constrained by the REPEATABLE READ isolation, and is free to read the current row from the database. I've tested this in the MySQL shell (MySQL 5.0.51a)

CREATE TABLE savepoint_test (a INT PRIMARY KEY);

Before:

Process 1	Process 2
BEGIN;
SELECT * FROM savepoint_test WHERE a=1;
> 0 rows returned
	BEGIN;
	SELECT * FROM savepoint_test WHERE a=1;
	> 0 rows returned
	INSERT INTO savepoint_test (a) VALUES (1);
	COMMIT;
INSERT INTO savepoint_test (a) VALUES (1);
> ERROR: Duplicate entry for key
SELECT * FROM savepoint_test WHERE a=1;
> 0 rows returned

After:

Process 1	Process 2
BEGIN;
SAVEPOINT p1;
SELECT * FROM savepoint_test WHERE a=1;
> 0 rows returned
	BEGIN;
	SAVEPOINT p2;
	SELECT * FROM savepoint_test WHERE a=1;
	> 0 rows returned
	INSERT INTO savepoint_test (a) VALUES (1);
	COMMIT;
INSERT INTO savepoint_test (a) VALUES (1);
> ERROR: Duplicate entry for key
ROLLBACK TO SAVEPOINT p1;
SELECT * FROM savepoint_test WHERE a=1;
> 1 row returned

by Ian Clelland, 13 years ago

Attachment:	13906_get_or_create.diff added

Reordering reads in get_or_create

comment:30 by Anssi Kääriäinen, 13 years ago

According to my quick test, this only works if you have done no queries at all before taking the savepoint. No matter which table you select from before the SAVEPOINT p1. So, this would be only a minor improvement: get_or_create works correctly, but only if it is the first operation in the transaction. You can see this if you put any select above the SNAPSHOT P1; for process 1.

The reason seems to be that MySQL takes the snapshot only when first needed. But if it happens to take the snapshot inside savepoint, for some strange reason it rolls the snapshot back when rollback to savepoint is called, and creates a new snapshot for the second select. If anything, that is a bug in MySQL, because you just had a non-repeatable read in repeatable read isolation.

comment:31 by Jacob, 13 years ago

milestone:	1.4

Milestone 1.4 deleted

comment:32 by gerdemb, 13 years ago

Cc:	gerdemb added

comment:33 by Mitar, 13 years ago

Cc:	mmitar@… added

comment:34 by TTimo, 13 years ago

Cc:	TTimo added

comment:35 by Cal Leeming, 13 years ago

I can confirm that using READ COMMITED transaction isolation can cause lots of other problems..

See related ticket:
https://code.djangoproject.com/ticket/18557

comment:36 by Cal Leeming, 13 years ago

Cc:	cal@… added

comment:37 by Aymeric Augustin, 12 years ago

Owner:	set to Aymeric Augustin
Status:	new → assigned

comment:38 by Aymeric Augustin, 12 years ago

Owner:	Aymeric Augustin removed
Status:	assigned → new

Since I recently re-implemented transaction management, I had a look at this ticket. Unfortunately, even with the new features (savepoints), I don't see an obvious solution.

When I was working on the transactions branch, I considered adding an ISOLATION_LEVEL key in DATABASES entry. Database backends now contain everything required to implement it. However, I decided against it, because it would imply that Django works correctly under any isolation level, and I don't think that's true. (There's a PostgreSQL-only option for backwards compatibility, though.)

get_or_create is still usable in the general case. It's just vulnerable to race conditions, despites containing code specifically to deal with that race condition. I don't think it's worth documenting the caveat until it's fixed.

comment:39 by Chris Streeter, 10 years ago

Cc:	django@… added

comment:40 by Tim Graham, 8 years ago

Resolution:	→ wontfix
Status:	new → closed

In Django 2.0, the MySQL backend will default to read committed (#27683). I think we can close this ticket in light of that. The resolution is "wontfix" in the sense that get_or_create() still doesn't work correctly (and probably can't?) under repeatable read.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues