Opened 14 years ago
Closed 10 years ago
#15802 closed Cleanup/optimization (fixed)
Django stops functioning when the database (PostgreSQL) closes the connection
Reported by: | Owned by: | Aymeric Augustin | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.6 |
Severity: | Normal | Keywords: | database, postgres, postgresql, connection, closed |
Cc: | Jeroen Dekkers, sannies, mike@…, hv@…, anssi.kaariainen@…, reinout@…, depaolim@…, Ilya Antipenko, pdina, rckclmbr@…, Akos Ladanyi | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
In some cases (database restart, network connection lost, hard pgBouncer restart, etc...) the connection to the database is lost without giving Django a notification.
Currently Django doesn't handle that very gracefully... it just keeps trying to close an already closed connection (and gets errors because of that) so after you've somehow lost your database connection for a bit you have to restart all of your mod_wsgi processes (means executing a reload command on 5 servers for me right now) before Django starts working again.
The stacktrace:
Traceback (most recent call last): File "django/core/handlers/wsgi.py", line 267, in __call__ signals.request_finished.send(sender=self.__class__) File "django/dispatch/dispatcher.py", line 162, in send response = receiver(signal=self, sender=sender, **named) File "django/db/__init__.py", line 84, in close_connection conn.close() File "django/db/backends/__init__.py", line 72, in close self.connection.close() InterfaceError: connection already closed
Proposed patch, replace the close()
method in django/db/backends/__init__.py
so it becomes this:
def close(self): if self.connection is not None: try: self.connection.close() self.connection = None except InterfaceError: self.connection = None raise
I see no valid reason for giving a non-recoverable error while closing the database connection. Yes, something is wrong and yes, the exception can be shown. But looping the error is completely futile here.
Attachments (2)
Change History (42)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 14 years ago
comment:4 by , 14 years ago
Although it would be cleaner to only do this in case of an connection already closed
InterfaceError
. I can't really think of a valid reason to keep self.connection
after trying to close it.
Either it succeeded or failed, but the odds are that the connection is not usable anymore in either case. It shouldn't be much of a problem since the exception is re-raised anyway.
But a clean solution (catching this specific error silently) would be preferred indeed.
follow-up: 6 comment:5 by , 13 years ago
Easy pickings: | unset |
---|---|
UI/UX: | unset |
That SQLAlchemy code looks fragile in the medium-term. I tend to agree with comment 4: we make a decent attempt to close the connection and then call it "closed" anyway. Before committing something along those lines, can anybody think of a case where that is going to bite us (I can't)?
comment:6 by , 13 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Replying to mtredinnick:
That SQLAlchemy code looks fragile in the medium-term. I tend to agree with comment 4: we make a decent attempt to close the connection and then call it "closed" anyway. Before committing something along those lines, can anybody think of a case where that is going to bite us (I can't)?
Neither can I, I have been thinking about it since I wrote that ugly hack of a patch and I cannot think of anything going wrong. We could be leaking open connections, but those should get garbage collected and even that would be better than disabling that process/thread forever (for example gunicorn can't recover from this so this can kill production alltogether).
I will wait one more day for someone to speak up and then I will commit the patch (just connection = None, not the pseudo detection).
comment:7 by , 13 years ago
I agree with Malcolm. Checking exception messages is fragile, and I can't think of an obvious downside to just treating a closed connection as closed, especially if we log/report the fact that we tried and failed to close the connection (so the process isn't completely transparent).
comment:8 by , 13 years ago
For the record, I have had the code running in production since opening this ticket and I have not seen any problems yet.
So as far as empirical evidence goes, it is safe to commit a patch like that.
follow-up: 11 comment:10 by , 13 years ago
Cc: | added |
---|---|
Has patch: | set |
Resolution: | fixed |
Status: | closed → reopened |
I'm hitting the same problem, but when getting the cursor instead of closing the connection:
File "/home/jener-www/jener/yatemod/backend.py", line 38, in get_account_hashes account = Account.objects.get(username=username, domain=domain) File "/usr/lib/python2.6/dist-packages/django/db/models/manager.py", line 131, in get return self.get_query_set().get(*args, **kwargs) File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 338, in get num = len(clone) File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 81, in __len__ self._result_cache = list(self.iterator()) File "/usr/lib/python2.6/dist-packages/django/db/models/query.py", line 268, in iterator for row in compiler.results_iter(): File "/usr/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 701, in results_iter for rows in self.execute_sql(MULTI): File "/usr/lib/python2.6/dist-packages/django/db/models/sql/compiler.py", line 755, in execute_sql cursor = self.connection.cursor() File "/usr/lib/python2.6/dist-packages/django/db/backends/__init__.py", line 281, in cursor cursor = util.CursorWrapper(self._cursor(), self) File "/usr/lib/python2.6/dist-packages/django/db/backends/postgresql_psycopg2/base.py", line 173, in _cursor cursor = self.connection.cursor() InterfaceError: connection already closed
The problem is again that self.connection isn't None, but it isn't usable either, and we will loop over this error again and again. The attached patch will catch the exception and call close(), so that we can continue with creating a new connection.
by , 13 years ago
Attachment: | 15802-2.diff added |
---|
follow-up: 12 comment:11 by , 13 years ago
Replying to dekkers:
...
The problem is again that self.connection isn't None, but it isn't usable either, and we will loop over this error again and again. The attached patch will catch the exception and call close(), so that we can continue with creating a new connection.
...
And does it occur then at every request or just once? If just once I consider that fine - we wouldn't want to hide the fact that something bad happened to the connection from the user.
comment:12 by , 13 years ago
Replying to Honza_Kral:
And does it occur then at every request or just once? If just once I consider that fine - we wouldn't want to hide the fact that something bad happened to the connection from the user.
It happens at every request. With my patch I still get a DatabaseError in execute() once, but I agree that that is fine.
follow-ups: 14 21 comment:13 by , 13 years ago
I'm running django/postgres on an unreliable infrastructur (All network is in principle unreliable!). Of course I am hitting this problem too. Please help me understand the state of the issue and possible workarounds.
State
- Changeset [16708] has been committed to the trunk on 29/Aug/11 and provides an improvement but no complete fix
- Django 1.3.1 has been released 9/Sep/11
- Django 1.3.1 does NOT contain the fix. What a pity!
- The second patch completes the fix but it is not committed anywhere
- Postgres & Django without workaround or fix == no good idea
Workarounds
- Apply the attached patches to my production systems (Somehow I don't like this!)
- Ease the pain by adding a connection pool on each postgres client and connect locally
Solutions
- Convince the psycopg2 guys that their interpretation of pep-0249 is wrong as it seems that the mysql guys interpreted the following different (from pep-0249):
.close()
Close the connection now (rather than whenever __del__ is
called). The connection will be unusable from this point
forward; an Error (or subclass) exception will be raised
if any operation is attempted with the connection.
- Wait for Django 1.4
Is patching my installation the preferred solution at the moment? Is there some other workaround?
What is the 'real' solution to this issue? Should django just deal with the different behavior? Should psycopg2 change its behavior?
Since many people seem to have problems with that behavior(just google 'InterfaceError: connection already closed') it seems to be a common understanding that closing a closed connection should not raise an exception. Therefore I'd say we should open a ticket at psycopg2's issue tracker.
Do you agree? Or did I misunderstand the issue and its background?
-
Sebastian
follow-up: 16 comment:15 by , 13 years ago
The problem that is solved by my patch is that the connection is closed by the DB without Django knowing it (this happens when you for example restart the DB). Django tries to use self.connection and doesn't handle the exception. You could change the psycopg2 behaviour to automatically reconnect, but that might hide other problems, so I don't think that's a good idea.
I currently run Django 1.3.1 with both patches applied without any problems.
comment:16 by , 13 years ago
No, I don't mean to automatically reconnect - that goes way to far. I think psycopg2 should not raise an exception if an already closed connection is closed again. That's all.
You could change the psycopg2 behaviour to automatically reconnect, but that might hide other problems, so I don't think that's a good idea.
comment:17 by , 13 years ago
Cc: | added |
---|
comment:18 by , 13 years ago
Cc: | added |
---|
comment:19 by , 13 years ago
Is there anything I can do to get my patch applied before 1.4 is released?
comment:20 by , 13 years ago
Cc: | added |
---|---|
Patch needs improvement: | set |
In normal HTTP serving situations the connection doesn't get stuck into closed state. When the request is finished, the connections are closed and the already committed .close() fix will remove the broken connection. I am of course talking about 1.4 here, as it has the .close() fix already applied.
However, if using the connection in a long running task, then there will be problems. In that setting, the connection is usually never closed. We should really document that you should call connection.close() after you are done with it. Actually, we should document how to safely use Django's connections in task-running setting, I don't think we have that and there are multiple potential problems in that setting (Idle in TX for example).
The attached patch doesn't apply cleanly (15802-2.diff). It doesn't apply to the base directory, and even if applied into the django/ directory, it doesn't apply cleanly. In addition, if I am reading it correctly it is downright dangerous. I think what happens is this.
obj.save() # succeeds # connection is closed for some reason obj.save() # Hey, ._cursor() gave me a new connection! No error raised anywhere, perhaps I am even in a different transaction...
I don't believe there to be any safe way to automatically create a new connection without the user calling .close() in between. Surprising things could happen otherwise. The only possible exception would be that when leaving transaction management the connection could be closed on error. This could also help in many situations, because usually when you get an error (possibly due to closed connection), you will rollback the transaction (leave transaction management), and thus the connection would get unstuck.
Still one helpful thing would be to return a consistent ConnectionClosedError
when the connection is closed outside Django's control.
In short: I don't think there is anything to do to get this into 1.4. Too big changes required. The only solution you have is somehow detecting closed connections in your code and closing the connection manually in those cases.
comment:21 by , 13 years ago
Replying to anonymous:
- Convince the psycopg2 guys that their interpretation of pep-0249 is wrong as it seems that the mysql guys interpreted the following different (from pep-0249):
I've tried to do that: close() should be idempotent. But the reference DBAPI test suite fails as it explicitly check that the second close() raises an exception. I've discussed about the problem in the DBSIG <http://mail.python.org/pipermail/db-sig/2011-October/005811.html>, but never got to the point to fix the test suite. I should really write to Stuart Bishop about that.
edit: done: http://code.google.com/p/acute-dbapi/issues/detail?id=3
comment:22 by , 12 years ago
Status: | reopened → new |
---|
comment:23 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:24 by , 12 years ago
Cc: | added |
---|
comment:25 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
- connection.close() was made idempotent in Django, regardless of whether this gets fixed or not in pyscopg2.
- Django should not provide automatic reconnection within a request because that can break transaction integrity.
- The problem of long running processes is covered by #17887.
The fix seems to work and the discussion after the ticket was reopened didn't bring up any actionnable item.
comment:26 by , 11 years ago
Cc: | added |
---|
follow-up: 40 comment:31 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
I'm reopening this ticket because I want to think about reopening the connection automatically when no transaction is in progress.
There's probably a bunch of issues with that technique, like retrying a query that failed.
But since "connection closed" errors are a problem many users struggle with, it's worth making sure I haven't overlooked a solution.
comment:32 by , 10 years ago
Status: | new → assigned |
---|
comment:33 by , 10 years ago
Cc: | added |
---|
comment:34 by , 10 years ago
Cc: | added |
---|
comment:35 by , 10 years ago
Cc: | added |
---|
follow-up: 39 comment:36 by , 10 years ago
Version: | 1.3 → 1.6 |
---|
I'm seeing this in a Django 1.6.8 project. The exception is:
Traceback (most recent call last): File "/srv/myproject/local/lib/python2.7/site-packages/django/core/handlers/wsgi.py", line 194, in __call__ signals.request_started.send(sender=self.__class__) File "/srv/myproject/local/lib/python2.7/site-packages/django/dispatch/dispatcher.py", line 185, in send response = receiver(signal=self, sender=sender, **named) File "/srv/myproject/local/lib/python2.7/site-packages/django/db/__init__.py", line 91, in close_old_connections conn.abort() File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 381, in abort self.rollback() File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 180, in rollback self._rollback() File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 144, in _rollback return self.connection.rollback() File "/srv/myproject/local/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/srv/myproject/local/lib/python2.7/site-packages/django/db/backends/__init__.py", line 144, in _rollback return self.connection.rollback() django.db.utils.InterfaceError: connection already closed
I can reliably replicate it by running ab
against the app and doing a postgresql restart in the middle of it. All future requests fail until the process is restarted. I see this in both gunicorn and uWSGI. I tried changing the lazy-apps
option in uWSGI, but the issue persists.
comment:37 by , 10 years ago
Cc: | added |
---|
comment:38 by , 10 years ago
Type: | Bug → Cleanup/optimization |
---|
Since 1.7 there is a test dedicated on it. See commit 2e42c859da42a871244aca36162a0aad2b63c02b
The aaugustin proposal probably should be marked as "Cleanup/optimization", I don't see the bug still open since 1.7.
comment:39 by , 10 years ago
Replying to baumer1122:
I'm seeing this in a Django 1.6.8 project. The exception is: (...)
This is a valid bug report.
The same symptoms as the bug originally reported here can be triggered from another location.
The code where this error originates was deprecated in Django 1.6 and removed in Django 1.8:
- https://github.com/django/django/blob/stable/1.6.x/django/db/__init__.py#L88-L94
- https://github.com/django/django/blob/stable/1.8.x/django/db/__init__.py#L63-L64
As a consequence, this variant of the issue can be considered fixed in Django 1.8.
comment:40 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Well it should really only do this in the case that the InterfaceError is a "connection already closed" one, I remember seeing Mike Bayer bitching that there was no programmatic way to do it, so perhaps we should steal the hack he ended up using from him.