Opened 12 years ago
Closed 12 years ago
#19117 closed Bug (wontfix)
Database and memcached connections break after fork.
Reported by: | Sebastian Noack | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4 |
Severity: | Normal | Keywords: | |
Cc: | davidswafford | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If you have a management command that does CPU-heavy tasks, or when implementing a server for certain background tasks, its likely that you will use the multiprocessing module, to scale over multiple CPUs. However django implements connections to the database and memcached as singletons (created on first use, reused forever). So if you have used the database or memcached before forking, the child processes inherit the established connection. And when multiple processes use a connection at the same time (which can and will happen) the requests will fail in an ugly way.
However the multiprocessing module comes with a mechanism provided for such cases, that enables you to cleanup things after fork. My patch uses that mechanism, in order to reset the possibly created database and memcached connections after fork. So that the child process will create its own connection when it needs it.
Attachments (1)
Change History (8)
by , 12 years ago
Attachment: | 0001-Re-connect-database-and-memcached-after-fork.patch added |
---|
comment:1 by , 12 years ago
Needs tests: | set |
---|---|
Triage Stage: | Unreviewed → Design decision needed |
comment:2 by , 12 years ago
Of course mod_wsgi don't have any problems with that, as it forks the process before starting the Python interpreter and importing django. And I didn't talked about forking while processing a request. I talked about a management command or daemon running in the background. In my specific case it's the part of our application stack, that dispatches the newsletter. So I have a management command, that forks multiple worker processes to render the emails and send them via SMTP. For every management command like that, which runs for more than a few minutes, delegating tasks to child processes makes absolutely sense.
comment:3 by , 12 years ago
A quick workaround is to close the database and cache connection before forking; they'll be automatically reopened on the first subsequent access.
I'm not eager to add this code, because it's non-trivial, impossible to test, and rarely useful...
comment:5 by , 12 years ago
Hey Sebastion,
I'm recently hitting this issue as well. I'm building a scheduling system that kicks of background jobs that will be long-running. What's the recommended way to clear the DB session when forking? I'm using this with mixed results:
from django.db import transaction
@transaction.commit_manually
def clear_dbsession(*kargs, kwargs):
""" force Django to clear the existing DB session """
transaction.commit()
comment:6 by , 12 years ago
Cc: | added |
---|---|
Resolution: | wontfix |
Status: | closed → new |
comment:7 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
The problem is that many libraries used by Django do not support forking. For example you can't expect to use plain psycopg2 connection after fork.
The thing is, Django isn't designed to be used with fork(). It might work if you close all memcached and database connections before the fork. But then again it might not. Guaranteeing that everything will just work when using fork() will be nearly impossible.
You can discuss this design decision on DevelopersMailingList. Unfortunately the reason for the wontfix is that we can't make this work instead of we don't want to make this work, so getting this accepted will likely be hard.
You might want to explore other solutions, for example using subprocess module and explicitly communicating the initial state between processes, or using some message queue solution.
I'm pretty sure there are other things that might break. That's why the common solution for background tasks is to use task/message queues (like celery just to name one). Application servers like gunicorn or mod_wsgi have no trouble spawning multiple Django workers. I don't see any advantage in forking during processing of a request, so i'm not sure this is a use case we want to support.