Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#31134 closed Bug (invalid)

ASGI does not closing DB connections properly.

Reported by: Wojciech Bartosiak Owned by: nobody
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords: asgi database
Cc: Andrew Godwin, Carlton Gibson Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I’m testing Django 3.0.1 async features and ASGI performance and I have noticed that when in the configuration:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': example,
        'USER': example,
        'PASSWORD': example,
        'HOST': 'localhost',
        'PORT': 5432,
        'ATOMIC_REQUESTS': True
    }
}

Or when ATOMIC_REQUESTS is set to False but in View I have something like this:

from django.http import HttpResponse
import datetime
from django.db import transaction


def current_datetime(request):
    with transaction.atomic():
        now = datetime.datetime.now()
        html = "<html><body>It is now %s.</body></html>" % now
    return HttpResponse(html)

Connections/sessions are not being closed and every new request opens a new connection causing:
django.db.utils.OperationalError: FATAL: sorry, too many clients already

This problem does not exist in WSGI mode and/or with/without django_postgrespool2 (with WSGI) this does not exist.
When a server in ASGI mode attacker can use this vector to cut off service.

ASGI command:

uvicorn --workers 5 --interface asgi3 --loop asyncio --lifespan off project.asgi:application

WSGI command:

uvicorn --workers 5 --interface wsgi --lifespan off project.wsgi:application

Change History (12)

comment:1 by Mariusz Felisiak, 5 years ago

Cc: Andrew Godwin Carlton Gibson added
Severity: NormalRelease blocker
Summary: ASGI does not closing DB connections properlyASGI does not closing DB connections properly.
Triage Stage: UnreviewedAccepted

Thanks for this report.

comment:2 by Carlton Gibson, 5 years ago

Hmmm. This isn't as straight-forward as it looks.

Connections/sessions are not being closed and every new request opens a new connection

Testing locally, this isn't true for me. I'm seeing connections go up to the 10-15 range, but then drop again at a later point. It's certainly not the case that connection numbers are rising monotonically with the request count.

Thus, is it PostgreSQL config that's in play? Wojciech, could you help me pin down the exact reproduce case?

in reply to:  2 comment:3 by Wojciech Bartosiak, 5 years ago

machine     x86_64
release     19.2.0
sysname     Darwin
version     Darwin Kernel Version 19.2.0: Sat Nov  9 03:47:04 PST 2019; root:xnu-6153.61.1~20/RELEASE_X86_64
python      3.7.3

My docker-compose.yml file:

version: "3"
services:
  my_db:
    image: postgres
    container_name: "my_db"
    environment:
      - POSTGRES_USER= example
      - POSTGRES_PASSWORD= example
      - POSTGRES_DB= example
    expose:
      - "5432"
    ports:
      - "5432:5432"
    volumes:
      - ./persistence/postgres-data:/var/lib/postgresql/data

PostgreSQL version:

# select version();

PostgreSQL 12.0 (Debian 12.0-2.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
Version 0, edited 5 years ago by Wojciech Bartosiak (next)

in reply to:  description comment:4 by Wojciech Bartosiak, 5 years ago

I have made some tests to dive into the problem.

When I execute ASGI server with only two workers and ab with 50 concurrent connections I can see that connections during test are being closed but after the command, they are being open:

https://i.ibb.co/Wf20Vrf/Screenshot-2020-01-07-at-11-27-08.png
https://i.ibb.co/W3hGhZs/Screenshot-2020-01-07-at-11-27-13.png

comment:5 by Carlton Gibson, 5 years ago

It looks like connections could be closed better — yes, once the server is idle open connections are lingering (I suspect they're not closed until Django tries to re-use them in subsequent requests) — but (as yet) it's not looking like there's an in issue with a runaway number of connections. Upping -c to 200 and higher, the relation to the number of open connections seems to be dependent on the number of workers, rather than the request count.

Example: finishing a test with 4 workers and 200 concurrent requests, I have open DB connections maxing out (and stable) at around 70. From there making single requests one at a time sees the connection count drop by one on most requests — some it skips, presumably because the connection is reused and still valid.

However, CONN_MAX_AGE=0 (the default) should imply the connection being closed each request. (Not necessarily the ideal combination with atomic requests for performance testing but...) This is normally done in the request_finished signal, so I'll look into why that's not happening.

Thanks for the follow-up!

in reply to:  5 comment:6 by Wojciech Bartosiak, 5 years ago

Replying to Carlton Gibson:

It looks like connections could be closed better

When I have switched to django-postgrespool2 I have noticed error described here: https://docs.sqlalchemy.org/en/13/errors.html#error-3o7r
Maybe this information will give you some hint about connections.

HINT: Error 3o7r exists for WSGI and ASGI connections.

This is my configuration to reproduce this error:

DATABASES = {
    'default': {
        'ENGINE': 'django_postgrespool2',
        'NAME': 'example',
        'USER': 'example',
        'PASSWORD': 'example',
        'HOST': 'localhost',
        'PORT': 5432,
        'ATOMIC_REQUESTS': False,
        'DISABLE_SERVER_SIDE_CURSORS': True
    }
}

DATABASE_POOL_CLASS = 'sqlalchemy.pool.QueuePool'

DATABASE_POOL_ARGS = {
    'max_overflow': 5,
    'pool_size': 1,
    'recycle': 5
}

Uvicorn start commands:
ASGI:

uvicorn --workers 5 --interface asgi3 --loop asyncio --lifespan off project.asgi:application

WSGI:

uvicorn --workers 5 --interface wsgi --lifespan off project.wsgi:application

Testing command:

 % ab -c 20 -n 1000 -s 120 http://localhost:8000/a/
This is ApacheBench, Version 2.3 <$Revision: 1843412 $>
[...]
Percentage of the requests served within a certain time (ms)
  50%      7
  66%      8
  75%      8
  80%      9
  90%     10
  95%     14
  98%     28
  99%     79
 100%  30038 (longest request)

Error:

sqlalchemy.exc.TimeoutError: QueuePool limit of size 1 overflow 5 reached, connection timed out, timeout 30 (Background on this error at: http://sqlalche.me/e/3o7r)

comment:7 by Wojciech Bartosiak, 5 years ago

Ok. I must admit that this ticket can be closed.

I finally understood DB connection lifecycle and uvicorn workers. First of all uvicorn for every worker starts 10 threads so for 2 workers there will be 20 opened connections, for 10 workers it will be 100 connections.

Django requires to have non-transactional requests to close the connection after CONN_MAX_AGE and because when I set ATOMIC_REQUESTS to True, Django will always leave the open connection but firstly when the connection was open longer than CONN_MAX_AGE, Django will reestablish that connection once again.

comment:8 by Carlton Gibson, 5 years ago

Resolution: invalid
Severity: Release blockerNormal
Status: newclosed

OK, thanks for the follow-up: that's saved me a cycle tracking that down. (Note to self: always try with uvicorn and Daphne to make sure it's not the protocol server...)
Please do come back if you see follow-up issues.

in reply to:  8 comment:9 by Wojciech Bartosiak, 5 years ago

Replying to Carlton Gibson:

OK, thanks for the follow-up: that's saved me a cycle tracking that down. (Note to self: always try with uvicorn and Daphne to make sure it's not the protocol server...)
Please do come back if you see follow-up issues.

I'm trying to understand the implementation and when trying to use WGSI with QueuePool with CONN_MAX_AGE=0 Django trying to close the connection but QueuePool just returning it to the pool.

This works fine even when I have the sum of pool_size and max_overflow less than 10 (threads per worker) and ab -c30 -n10000 but when I change to ASGI, then after 20-30 requests problem with Queue pool limit size pops out, so this made me think that the closing connection in ASGI is buggy.

Andrew Godwin wrote:

[...]more investigation would be needed to determine if this is the ASGI handler not calling close_old_connections correctly (there's a chance it gets called in the wrong thread, due to the way signals interact with async).

I'm not good in core Django implementations but maybe some more experienced developer would like to investigated this?

comment:10 by Carlton Gibson, 5 years ago

Having looked at it at some length today, it seems it's working as expected. (Not sure about the QueuePool issue, but that looks unrelated—not the issue reported here—but also expected: you've overwhelmed the QueuePool, if I read that SQLAlchemy page correctly...?) Very happy though to look a minimal projects demonstrating a concrete issue with Django.

Last edited 5 years ago by Carlton Gibson (previous) (diff)

in reply to:  10 comment:11 by Wojciech Bartosiak, 5 years ago

Replying to Carlton Gibson:

you've overwhelmed the QueuePool, if I read that SQLAlchemy page correctly...?

With WSGI and this simple transaction example, the connections are being returned correctly, even I'm overwhelming the Queue. With ASGI returning (closing) the DB connection somehow works wrong, maybe signals are not working properly with ASGI?

comment:12 by Wojciech Bartosiak, 5 years ago

Finally, after hours, hours diving into the code, I finally understood how Django sets the connection, release connection and all that signal handlers. I can confirm a couple of things:

  1. ASGI and WSGI works fine with connections
  2. When ASGI enabled, the PgBouncer should be used with CONN_MAX_AGE = 0 and DISABLE_SERVER_SIDE_CURSORS = True
  3. django_postgrespool2 has a bug with connection pooling because accessing to the pool is not thread-safe and adding simple threading.Lock fixed issue - so this needs to be reported
  4. The best performance I have with simple Gunicorn + WSGI with --workers X, --threads 1, CONN_MAX_AGE is not 0

X is (2 x $num_cores) + 1

Note: See TracTickets for help on using tickets.
Back to Top