Opened 14 years ago
Closed 12 years ago
#13870 closed New feature (fixed)
Document transaction/connection management outside the web server context
Reported by: | Patryk Zawadzki | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | timograham@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
First some background info:
- PostgreSQL provides several transaction isolation levels.
- Django's psycopg backend defaults to ghost-read-preventing isolation.
- psycopg isolation modes automatically create the isolating transaction whenever you execute your first SQL statement.
- psycopg expects you to tell it when you're done with your work by either running
connection.commit()
orconnection.rollback()
- When you terminate the transaction, psycopg will wait for the next SQL statement and start a new isolating transaction.
The problem is: Django breaks assumption 4 and never commits or rollbacks the isolating transaction. Note I'm not talking about the transactions explicitly started by django, I'm talking about the transaction implicitly started by requesting a non-zero isolation level. This has the nasty side-effect of leaving the connection permanently in-transaction.
Try it yourselves: start a django server, make it use the DB and then try to - say - VACUUM FULL
the table it selected from. Or try to ALTER
the table (run a south migration). You will end up with a hanging connection, waiting for Django to leave the transaction block.
The attached patch makes it work for psycopg2 by introducing a new method in the backend class. It needs a call to leave_transaction_management()
so it works best with the TransactionMiddleware
.
Now for a real solution:
I think it would be better to introduce some sort of enter_isolation_block()
/ leave_isolation_block()
API. "Enter" would call set_isolation_level(1)
(currently done unconditionally), "leave" would terminate the isolation transaction. Then introduce a default middleware that wraps all the views in these calls. It just doesn't seem intuitive to mix the transactions implicitly created by isolation API with the explicit transaction management API.
Attachments (2)
Change History (14)
by , 14 years ago
Attachment: | django-terminate-isolating-transactions.patch added |
---|
comment:1 by , 14 years ago
I ran into to this problem from Django 1.0 and ended up using the following workaround to explicitly close the connection after the model was used.
from django.db import connection connection.close()
In my case there was a shared table lock which ended up being held by the process using the model classes (because of the uncommited transaction) which in turn caused a deadlock as another process requiring an exclusive lock on the table would block forever.
comment:2 by , 14 years ago
I have had the same problem on Django 1.2. I just applied this patch and will report back on my progress.
In the past I have partially worked around this problem by manually calling connection.close(). I noticed that when I write custom command-line scripts that use my models via Django's ORM, then I have to close connections manually. I assume the transaction middleware doesn't execute (and therefore close the connection) in that sort of environment, but for some reason it still opens transactions.
The easiest way to notice when things go wrong is, well, every time I try to alter a table during development: The connection just hangs forever. The only way I found to work around it is to restart the postgresql process. (I can't find an easy way to break these locks from the command-line psql client or make them expire.)
It also looks like the transaction middleware code that's supposed to roll back transactions or close the connection is not guaranteed to execute, but I haven't looked very deeply into it. What happens when your code encounters errors and you get an unhandled exception?
Another possible workaround would be to somehow make all locks expire after a short period of time, but I can't find how to do that. Or to close connections from the postgresql server end after a short period of inactivity. Again, not even sure if that's possible. Or perhaps if it was possible to only enter a transaction if you perform an update query? Then at least this wouldn't happen quite so often.
But something is certainly wrong.
Another option would be to just steer clear of automatic transactions.
comment:3 by , 14 years ago
It looks like this might be the same issue that #9964 is trying to address. Can you confirm if it is?
comment:5 by , 14 years ago
Needs documentation: | set |
---|---|
Needs tests: | set |
Triage Stage: | Unreviewed → Design decision needed |
Per this discussion on django-dev (especially Gabriel's comment there), I think this is mainly a documentation issue. Adding a page on using the ORM outside of request/response flow would be best, imho. Dunno if adding 2 non-public method will help, here.
comment:6 by , 14 years ago
Adding methods won't help but moving their invocation to middleware or a decorator will let people decide if they want isolation and when to terminate it when dealing with models outside of views.
comment:7 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:8 by , 13 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Easy pickings: | unset |
Needs documentation: | unset |
Needs tests: | unset |
Patch needs improvement: | set |
Triage Stage: | Design decision needed → Accepted |
UI/UX: | unset |
My first thought was "Django probably started closing connections at the end of each request after this ticket was reported, the problem must be fixed"; the mailing list thread contains this clarification:
The problem is not with regular views but with Celery tasks,
long-running management commands such as daemons and any other place
where you access the ORM from outside of the usual
request→dispatcher→view→response flow. These cases all end up with an
isolating transaction spanning their whole life span. In case of
daemons it results in permanently blocking the database structure (for
example causing "VACUUM FULL" to hang).
Gabriel wrote a complete explanation here: http://stackoverflow.com/questions/1303654/threaded-django-task-doesnt-automatically-handle-transactions-or-db-connections
I'm accepting this as a documentation issue — Django is primarily a web framework and I don't like the idea of complicating transaction management for offline tasks.
by , 12 years ago
Attachment: | 13870.diff added |
---|
comment:9 by , 12 years ago
Cc: | added |
---|---|
Patch needs improvement: | unset |
Summary: | psycopg2 backend never terminates isolating transactions → Document transaction/connection management outside the web server context |
Type: | Bug → New feature |
Version: | 1.2 → master |
Added a doc patch based on Gabriel's stackoverflow post.
comment:10 by , 12 years ago
Patch needs improvement: | set |
---|
I think this needs to be changed:
As soon as you perform an action that needs to write to the database, ...
to mention that only write actions through the ORM count here.
This paragraph seems to be out of place / context:
This goes against the fact that PostgreSQL thinks the transaction requires a ``ROLLBACK`` because Django issued a ``SET`` command for the timezone.
Also, the section discussing the dirty flag isn't accurate. The dirty flag has nothing to do with doing the commit in autocommit mode - the commits are done manually in the write operations Django performs by issuing commit_unless_managed. The docs should only mention that as long as you perform any query which isn't a write using the ORM, then there will be an open transaction and this will not be closed automatically by Django.
Typo: meeded -> needed
The docs additions do not take into account multidb. I think it would be a good idea to add a context manager @close_connections which ensures all connections (known to Django) will be closed. This would simplify the solution part a lot: use @close_connections, be done with it (except if you are playing with threads).
It might be a good idea to commit the docs changes with the above suggested changes apart of the multidb stuff. We might want to do more extensive edit of the transaction handling docs for multidb, or we might want to add the close_connections decorator. But, I don't know what the schedule for those are, so they shouldn't block the currently available improvements.
comment:11 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:12 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Partial workaround (see description)