Opened 3 years ago

Last modified 8 weeks ago

#33497 new New feature

Database persistent connections do not work with ASGI in 4.0

Reported by: Stenkar Owned by:
Component: Database layer (models, ORM) Version: 4.0
Severity: Normal Keywords: ASGI, Database, async
Cc: Sarah Boyce, Carlton Gibson, Florian Apolloner, Andrew Godwin, Anders Kaseorg, Patryk Zawadzki, Mikail, Alex, joeli, Marco Glauser, Rafał Pitoń, Marty Cochrane, lappu, Dmytro Litvinov, Suraj Shaw, Yiwei Gao Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello!

I've discovered that after upgrading Django to ver. 4 (currently 4.0.2), I started to see database FATAL: sorry, too many clients already errors in the Sentry.

For a database, I'm using containerized Postgres 14.1 and the connection between Django and Postgres is done by Unix socket.

Database settings look like this:

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": environ.get("POSTGRES_DB"),
        "USER": environ.get("POSTGRES_USER"),
        "PASSWORD": environ.get("POSTGRES_PASSWORD"),
        "HOST": environ.get("POSTGRES_HOST"),
        "PORT": environ.get("POSTGRES_PORT"),
        "CONN_MAX_AGE": 3600
    }
}

In production, I'm using ASGI (Uvicorn 0.17.4) to run the Django application (4 workers).

When everything is deployed and I have surfed around the Django admin site, then checking Postgres active connections, using SELECT * FROM pg_stat_activity; command, I see that there are 30+ Idle connections made from Django.

After surfing more around the admin site, I can see that more Idle connections have been made by Django.

It looks like the database connections are not reused. At one point some of the Idle connections are closed, but then again more connections have been made when more DB queries are made by Django.

I have one Django 3.2.11 project running on production and all the settings are the same, there are always max 10 persistent connections with the database and everything works fine.

Should that be like this in version 4.0?

Change History (45)

comment:1 by Stenkar, 3 years ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Stenkar, 3 years ago

Summary: Database persistent connection do not work with ASGI in 4.0Database persistent connections do not work with ASGI in 4.0

comment:3 by Mariusz Felisiak, 3 years ago

Cc: Carlton Gibson added
Resolution: needsinfo
Status: newclosed

Thanks for the report. ​Django has a routine to clean up old connections that is tied into the request-response life-cycle, so idle connections should be closed. However, I don't think you've explained the issue in enough detail to confirm a bug in Django. This can be an issue in psycopg2, uvicorn, or in custom middlewares (see #31905) it's hard to say without a reproduce.

Please reopen the ticket if you can debug your issue and provide details about why and where Django is at fault, or if you can provide a sample project with reproducible scenario.

Last edited 3 years ago by Mariusz Felisiak (previous) (diff)

comment:4 by Carlton Gibson, 3 years ago

Cc: Florian Apolloner Andrew Godwin added

Hi Stenkar.

Would you be able to put together a minimal test-project here, so that folks can reproduce quickly.

This may be due to Django 4.0 having per-request contexts for the thread sensitivity of sync_to_async() — See #32889.
If so, that's kind-of a good thing, in that too many open resources is what you'd expect in async code, and up to now, we've not been hitting that, as we've essentially been running serially.

Immediate thought for a mitigation would be to use a connection pool.

Equally, can we limit the number of threads in play using asgiref's `AGSI_THREADS` environment variable? (But see the discussion on the related Daphne issue about whether that's the right place for that at all.)

This is likely a topic we'll need to deal with (eventually) in Django: once you start getting async working, you soon hit resource limits, and handling that with structures for sequencing and maximum parallelism is one of those hard-batteries™ that we maybe should provide. 🤔

comment:5 by Florian Apolloner, 3 years ago

I think https://github.com/django/asgiref/pull/306#issuecomment-991959863 might play into this as well. By using a single thread per connection, persistent connections will never get clean up.

Version 0, edited 3 years ago by Florian Apolloner (next)

comment:6 by Stenkar, 3 years ago

Resolution: needsinfo
Status: closednew

Thank you for the comments.

I created a project where it's possible to spin up a minimal project with docker-compose.

https://github.com/KStenK/django-ticket-33497

I'm not sure that I can find where or what goes wrong in more detail, but I'll give it a try.

comment:7 by Carlton Gibson, 3 years ago

Triage Stage: UnreviewedAccepted
Type: BugNew feature

OK, thanks Stenkar.

I'm going to accept this as a New Feature. It's a change in behaviour from 3.2, but it's precisely in allowing multiple executors for sync_to_async() that it comes up. (In 3.2 it's essentially single-threaded, with only a single connection actually being used.) We need to improve the story here, but it's not a bug in #32889 that we don't have async compatible persistent DB connections yet. (I hope that makes sense.)

A note to the docs about this limitation may be worthwhile.

comment:8 by Florian Apolloner, 3 years ago

Thinking more about this I do not think the problem is new. We have the same problem when persistent connections are used and a new thread is generated per request (for instance in runserver.py). Usually (ie with gunicorn etc) one has a rather "stable" pool of processes or requests; as soon as you switch to new threads per connection this will break. In ASGI this behavior is probably more pronounced since by definition every request is in it's own async task context which then propagates down to the db backend as new connection per request (which in turn will also never reuse connections because the "thread" ids change).

All in all I think we are finally at the point where we need a connection pool in Django. I'd strongly recommend to use something like https://github.com/psycopg/psycopg/tree/master/psycopg_pool/psycopg_pool but abstracted to work for all databases in Django.

comment:9 by Anders Kaseorg, 3 years ago

Cc: Anders Kaseorg added

Possibly related: #33625 (for memcached connections).

comment:10 by Patryk Zawadzki, 2 years ago

Cc: Patryk Zawadzki added

comment:11 by Mikail, 2 years ago

Cc: Mikail added

comment:12 by Patryk Zawadzki, 2 years ago

This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.

The request_finished signal does not terminate them as they are not idle for longer than MAX_CONN_AGE.

The request_started signal does not terminate them as it never sees those connections due to the connection state being asgiref.local and discarded after every request.

Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if MAX_CONN_AGE is combined with ASGI.

comment:13 by Carlton Gibson, 2 years ago

Keywords: async added

comment:14 by Alex, 2 years ago

Cc: Alex added

in reply to:  12 comment:15 by joeli, 2 years ago

Replying to Patryk Zawadzki:

This is marked as a "new feature," but it's an undocumented breaking change between 3.2 and 4.0. Connections that were previously reused and terminated are now just left to linger.

The request_finished signal does not terminate them as they are not idle for longer than MAX_CONN_AGE.

The request_started signal does not terminate them as it never sees those connections due to the connection state being asgiref.local and discarded after every request.

Allowing parallel execution of requests is a great change, but I feel Django should outright refuse to start if MAX_CONN_AGE is combined with ASGI.

I agree. I would even go as far as calling this a regression, not just an undocumented breaking change. No matter the reasons behind it or the technical superiority of the new solution, fact of the matter stands that in 3.2 ASGI mode our code worked fine and reused connections. In 4.x it is broken unless using MAX_CONN_AGE = 0, which disables a feature in Django that used to work.

comment:16 by joeli, 2 years ago

Cc: joeli added

comment:17 by Marco Glauser, 23 months ago

Cc: Marco Glauser added

comment:18 by Rafał Pitoń, 20 months ago

Cc: Rafał Pitoń added

comment:19 by Marty Cochrane, 19 months ago

Cc: Marty Cochrane added

comment:20 by Florian Apolloner, 18 months ago

I have created a draft pull request for database connection pool support in postgresql: https://github.com/django/django/pull/16881

It would be great if people experiencing the problems noted here could test this (this would probably help in getting it merged).

comment:21 by lappu, 18 months ago

Cc: lappu added

We just ran into this while upgrading from 3.2 to 4.2. During a QA round our staging environment MySQL server running on AWS RDS t3.micro instance exceeded its max connections (70? or so, while normally the connections stay below 10).

I git bisected the culprit to be https://github.com/django/django/commit/36fa071d6ebd18a61c4d7f1b5c9d17106134bd44, which is what Carlton Gibson suspected.

We are also running uvicorn.

comment:22 by Andreas Pelme, 15 months ago

We have been using CONN_MAX_AGE=300 since it was introduced in Django 1.6 and rely in it for not having to reconnect to the database in each http request. This change really caught us off guard. We upgraded from django 3.2 to 4.0 and our site went completely down in a matter of seconds when all database connections was instantly depleted.

Giving each http request its own async context makes a lot of sense and is a good change in itself IMO. But I would argue that this change is not backward compatible. CONN_MAX_AGE does still "technically" work but it does clearly not behave as it has been doing for the last 10 years.

Specifying CONN_MAX_AGE is recommended in a lot of places, including Django's own docs:

At the very least, I think this needs to be clearly called out in the release notes and docs on "Persistent connections". I think we need to deprecate/remove CONN_MAX_AGE. Or is there even a reason to keep it around?

I am very much in favor of getting basic db connection pooling into django. We will try to give https://github.com/django/django/pull/16881 a spin and put it in production and report back. Would love to have something like that available out of the box in Django! We use Postgres and would be happy with having such a pool which would replace CONN_MAX_AGE for our use case.

However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?

in reply to:  22 comment:23 by Sarah Boyce, 12 months ago

Replying to Andreas Pelme:

However, that would only work for postgres. What is the situation with mysql/oracle? Does mysqlclient come with a pool like psycopg?

Oracle: https://python-oracledb.readthedocs.io/en/latest/user_guide/connection_handling.html#connpooling
mysqlclient doesn't appear to support this out of the box. Looks like mysql-connector-python would have support though: https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html 🤔

comment:24 by Sarah Boyce, 12 months ago

Cc: Sarah Boyce added

comment:25 by Sarah Boyce, 12 months ago

Has patch: set

Will need to add solutions for other backends but don't see a reason why we can't do this incrementally - the current patch is for postgreSQL: https://github.com/django/django/pull/17594

comment:26 by Florian Apolloner, 12 months ago

Well, I guess the main question is if we want to create our own pool implementation or reuse what the database adapters provide and assume that they can do it better for their database than we can do it generically :D

comment:27 by Mariusz Felisiak, 11 months ago

Owner: changed from nobody to Sarah Boyce
Patch needs improvement: set
Status: newassigned

comment:28 by Sarah Boyce, 11 months ago

Patch needs improvement: unset

comment:29 by Dmytro Litvinov, 11 months ago

Cc: Dmytro Litvinov added

comment:30 by Sarah Boyce, 10 months ago

There is already a ticket (and now a PR) to support database connection pools in Oracle: #7732~

Edit: Ah sorry I got too excited and misread this, session pool not connection pool .

Last edited 10 months ago by Sarah Boyce (previous) (diff)

comment:31 by johnthagen, 9 months ago

As a user of Django, I was looking through the steps needed to optimize my application, and came across

It was then after a long amount of research/reading that I came across this ticket. We are planning to soon upgrade our application to use uvicorn workers and run under ASGI so that we can serve web sockets alongside our HTTP API, and thus we now have pause to perform this standard optimization due to this issue.

Questions:

  1. Does this issue affect all ASGI Django HTTP endpoints, even synchronous ones (e.g. no async, no database_sync_to_async(), etc.)?
  2. Should the docs be updated to add a warning about the fact that persistent connections are not currently fully compatible with ASGI? I'd be happy to open a PR as it would have saved me personally a lot of time if I had discovered this earlier in my research process.

comment:32 by Mariusz Felisiak, 9 months ago

Triage Stage: AcceptedReady for checkin

comment:33 by Mariusz Felisiak <felisiak.mariusz@…>, 9 months ago

In fad334e:

Refs #33497 -- Added connection pool support for PostgreSQL.

Co-authored-by: Florian Apolloner <florian@…>
Co-authored-by: Ran Benita <ran@…>

comment:34 by Mariusz Felisiak, 9 months ago

Has patch: unset
Triage Stage: Ready for checkinAccepted

comment:35 by Mariusz Felisiak, 8 months ago

Owner: Sarah Boyce removed
Status: assignednew

comment:36 by Suraj Shaw, 7 months ago

I have created a pull request for database connection pool support in oracle : https://github.com/django/django/pull/17834
It would be great if people experiencing the same problem could test this with oracle backend (this would probably help in getting it merged).

comment:37 by Suraj Shaw, 7 months ago

Cc: Suraj Shaw added

comment:38 by johnthagen, 5 months ago

For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?

comment:39 by Yiwei Gao, 4 months ago

Cc: Yiwei Gao added

comment:40 by nessita <124304+nessita@…>, 3 months ago

In 7380ac5:

Fixed #35688 -- Restored timezone and role setters to be PostgreSQL DatabaseWrapper methods.

Following the addition of PostgreSQL connection pool support in
Refs #33497, the methods for configuring the database role and timezone
were moved to module-level functions. This change prevented subclasses
of DatabaseWrapper from overriding these methods as needed, for example,
when creating wrappers for other PostgreSQL-based backends.

Thank you Christian Hardenberg for the report and to
Florian Apolloner and Natalia Bidart for the review.

Regression in fad334e1a9b54ea1acb8cce02a25934c5acfe99f.

Co-authored-by: Natalia <124304+nessita@…>

comment:41 by Natalia <124304+nessita@…>, 3 months ago

In 26c0667:

[5.1.x] Fixed #35688 -- Restored timezone and role setters to be PostgreSQL DatabaseWrapper methods.

Following the addition of PostgreSQL connection pool support in
Refs #33497, the methods for configuring the database role and timezone
were moved to module-level functions. This change prevented subclasses
of DatabaseWrapper from overriding these methods as needed, for example,
when creating wrappers for other PostgreSQL-based backends.

Thank you Christian Hardenberg for the report and to
Florian Apolloner and Natalia Bidart for the review.

Regression in fad334e1a9b54ea1acb8cce02a25934c5acfe99f.

Co-authored-by: Natalia <124304+nessita@…>

Backport of 7380ac57340653854bc2cfe0ed80298cdac6061d from main.

in reply to:  38 ; comment:42 by dseb, 8 weeks ago

Replying to johnthagen:

For Postgres users, will the new Django 5.1 postgres connection pools mitigate this issue? Should ASGI users use this rather than persistent connections?

It seems that the answer is no, connection pools do not yet mitigate this issue. In my testing, enabling connection pooling in an ASGI context causes connection leaks. Even with a very high max connection count and a small timeout value, I'm seeing occasional OperationalError: couldn't get a connection after X sec errors.

in reply to:  42 ; comment:43 by Florian Apolloner, 8 weeks ago

Even with a very high max connection count and a small timeout value, I'm seeing occasional OperationalError: couldn't get a connection after X sec errors.

This make sense, a small timeout value increases the chance to see this error if you are already running out of connections. Whether there really is a leak or not is impossible to tell without having more information about your codebase and settings.

in reply to:  43 comment:44 by dseb, 8 weeks ago

Replying to Florian Apolloner:

This make sense, a small timeout value increases the chance to see this error if you are already running out of connections.

My mistake, I meant to say "large timeout value". My containers each answer ~2 requests per second, average response time is about 400ms. It shouldn't take many connections to service this workload (even one connection should work, I think), and yet, with pool timeouts set to 30s and the pool max size set to 16, I'm seeing many such errors when pooling is enabled. Which is why it seems to me that the new pooling support hasn't solved this particular problem.

Last edited 8 weeks ago by dseb (previous) (diff)

comment:45 by Florian Apolloner, 8 weeks ago

Note: See TracTickets for help on using tickets.
Back to Top