#35059 closed Bug (fixed)
ASGI server leaves stale DB connections
Reported by: | James Thorniley | Owned by: | James Thorniley |
---|---|---|---|
Component: | HTTP handling | Version: | 5.0 |
Severity: | Release blocker | Keywords: | |
Cc: | Carlton Gibson, Andrew Godwin | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This seems to be a regression from 4.2 to 5.0.
We are using Django ASGI server via uvicorn, with a mariadb database. When updating our app servers from Django 4.2.8 to 5.0, we experienced a spike in the number of database connections, eventually going over the limit set for the DB and causing 500 errors due to
django.db.utils.OperationalError: (1040, 'Too many connections')
I've created a minimal project to reproduce what I think is the phenomenon here using local docker container for the DB:
https://github.com/jthorniley/django-asgi
I've put complete reproduction steps in that github link, but I will summarise here:
- When running a Django 5.0 ASGI server, some connections are not closed when requests complete, leaving stale connections in the DB. This can be confirmed by running curl against a local server and then
show processlist
in the mysql CLI when the requests are complete. - As a result, if you send a moderate number of parallel requests to the server, it will go past the limit of DB connections (I think the out-of-the-box limit for the mariadb docker container is around 150, but it is configurable). Then new connections will fail with the
Too many connections
error. Obviously its theoretically possible to go over this limit anyway but it happens sooner due to server processes not cleaning up connections after requests complete. - You can downgrade to 4.2.8 and run the same reproduction project and see that there is no issue there. The DB connections are closed after every individual request and hence provided there aren't more parallel requests than allowed connections to the DB you are safe.
Change History (22)
comment:1 by , 11 months ago
comment:2 by , 11 months ago
Triage Stage: | Unreviewed → Accepted |
---|
I also could reproduce this bug using James's repo.
comment:3 by , 11 months ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
Summary: | "Too many connections" regression from 4.2->5.0 - asgi server leaves stale DB connections → ASGI server leaves stale DB connections |
Triage Stage: | Accepted → Unreviewed |
Type: | Uncategorized → Bug |
comment:4 by , 11 months ago
git bisect found commit 7cd187a5ba58d7769039f487faeb9a5a2ff05540 however I think this is a red herring, not sure why it came to that.
Reverting the changes from 64cea1e48f285ea2162c669208d95188b32bbc82 on top of 5.0 fix the problems, so I think that is actually the problematic commit.
This commit moved the send_response
function into a task which can be cancelled if and when http.disconnect
is received. However the send_response
function was also responsible for calling response.close()
which is a sync function called asynchronously using sync_to_async
. This introduced a race condition as if the http.disconnect
is received during the call to the sync_to_async
wrapped code then the response.close()
can effectively be cancelled and the associated clean up (triggered by the request_finished
signal from inside response.close()
never gets called.
If created a PR here for review, it seems to solve the issue for me: https://github.com/django/django/pull/17675
comment:5 by , 10 months ago
Component: | Uncategorized → HTTP handling |
---|---|
Patch needs improvement: | set |
Resolution: | needsinfo |
Severity: | Normal → Release blocker |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
With the test cases, I think we can accept this as a bug in 64cea1e48f285ea2162c669208d95188b32bbc82. Thanks for the investigation James.
The request_finished
signal should be dispatched in the disconnect case.
I left a few initial comments on the PR. Happy to take another look later on.
comment:6 by , 10 months ago
Has patch: | set |
---|---|
Owner: | changed from | to
Patch needs improvement: | unset |
Status: | new → assigned |
Assigning to James following the proposed PR.
comment:7 by , 10 months ago
Patch needs improvement: | set |
---|
Setting patch needs improvements following the latest batch of review comments made by Carlton.
comment:8 by , 10 months ago
Patch needs improvement: | unset |
---|
comment:9 by , 10 months ago
Patch needs improvement: | set |
---|
Setting as patch needs improvement while I wait for the author to reply to my comments.
comment:10 by , 10 months ago
Patch needs improvement: | unset |
---|
follow-up: 12 comment:11 by , 10 months ago
I have this same problem, it caused a massive spike in database connections. I had to revert back to Django 4.2 for now until this is fixed. I came here to report but found your all already on top of it, thank you! Can't wait until this is fixed in 5.0.
comment:12 by , 10 months ago
Replying to Josh Orr:
I have this same problem, it caused a massive spike in database connections. I had to revert back to Django 4.2 for now until this is fixed. I came here to report but found your all already on top of it, thank you! Can't wait until this is fixed in 5.0.
Hi Josh, Does that the proposed patch work for you? Testing would be a huge help.
comment:14 by , 10 months ago
This is pretty much ready for checkin pending two question and one optional test change. I will wait for feedback from either the author or Carlton to merge ASAP (release is planned for Feb 6th). Thanks everyone who helped so far, by either building the fix, reviewing, debugging and/or testing the patch!
comment:15 by , 10 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
I also experienced this with Postgres. Downgraded to Django 4.2.8