Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#31905 closed Bug (fixed)

ASGI: Dead persistent postgres connections are not closed when the database is accessed in middleware

Reported by: Michael Galler Owned by: Michael Galler
Component: Database layer (models, ORM) Version: 3.1
Severity: Release blocker Keywords: ASGI Postgres Persistent
Cc: Andrew Godwin Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

How to reproduce:

Create a project with django-admin

Added persitent postgres connection

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'test_db',
        'USER': 'postgres',
        'PASSWORD': '1234',
        'HOST': 'localhost',
        'CONN_MAX_AGE': 60
    }
}

Add a middleware that accesses the database

from django.contrib.auth.models import User
from django.utils.deprecation import MiddlewareMixin


class FoobarMiddleware(MiddlewareMixin):

    def process_request(self, request):
        print(User.objects.all())

Add the middleware to settings, apply migrations.
Start ASGI Server for example with:

uvicorn --loop asyncio --workers 1 foobar.asgi:application --host 0.0.0.0 --reload

Visit localhost:8000 to open persistent connection

Restart Postgres, after that all connections get the following error

Environment:


Request Method: GET
Request URL: http://localhost:8000/

Django Version: 3.1
Python Version: 3.8.5
Installed Applications:
['django.contrib.admin',
 'django.contrib.auth',
 'django.contrib.contenttypes',
 'django.contrib.sessions',
 'django.contrib.messages',
 'django.contrib.staticfiles']
Installed Middleware:
['django.middleware.security.SecurityMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.middleware.common.CommonMiddleware',
 'django.middleware.csrf.CsrfViewMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.contrib.messages.middleware.MessageMiddleware',
 'django.middleware.clickjacking.XFrameOptionsMiddleware',
 'foobar.middleware.FoobarMiddleware']



Traceback (most recent call last):
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\backends\base\base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "c:\code\project\_lib\venv\lib\site-packages\django\utils\asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\backends\postgresql\base.py", line 233, in create_cursor
    cursor = self.connection.cursor()

The above exception (connection already closed) was the direct cause of the following exception:
  File "c:\code\project\_lib\venv\lib\site-packages\asgiref\sync.py", line 330, in thread_handler
    raise exc_info[1]
  File "c:\code\project\_lib\venv\lib\site-packages\django\core\handlers\exception.py", line 38, in inner
    response = await get_response(request)
  File "c:\code\project\_lib\venv\lib\site-packages\django\utils\deprecation.py", line 126, in __acall__
    response = await sync_to_async(self.process_request)(request)
  File "c:\code\project\_lib\venv\lib\site-packages\asgiref\sync.py", line 296, in __call__
    ret = await asyncio.wait_for(future, timeout=None)
  File "C:\Program Files\Python38\lib\asyncio\tasks.py", line 455, in wait_for
    return await fut
  File "C:\Program Files\Python38\lib\concurrent\futures\thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "c:\code\project\_lib\venv\lib\site-packages\asgiref\sync.py", line 334, in thread_handler
    return func(*args, **kwargs)
  File ".\foobar\middleware.py", line 8, in process_request
    print(User.objects.all())
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\models\query.py", line 263, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\models\query.py", line 269, in __len__
    self._fetch_all()
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\models\query.py", line 1303, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\models\query.py", line 53, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\models\sql\compiler.py", line 1152, in execute_sql
    cursor = self.connection.cursor()
  File "c:\code\project\_lib\venv\lib\site-packages\django\utils\asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\backends\base\base.py", line 259, in cursor
    return self._cursor()
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\backends\base\base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\backends\base\base.py", line 237, in _cursor
    return self._prepare_cursor(self.create_cursor(name))
  File "c:\code\project\_lib\venv\lib\site-packages\django\utils\asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "c:\code\project\_lib\venv\lib\site-packages\django\db\backends\postgresql\base.py", line 233, in create_cursor
    cursor = self.connection.cursor()

Exception Type: InterfaceError at /
Exception Value: connection already closed

Only a restart of the application fixes the problem

This problem occurs only with ASGI, with WSGI for example the devserver this does not occur.

Change History (17)

comment:1 by Michael Galler, 4 years ago

I noticed that the signal close_old_connections has a different database wrapper.
In the signal and in the middleware i added a print

signal close_old_connections [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001E90258C040>]
signal close_old_connections [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001E90258C040>]
middleware [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001E90258C730>]

In wsgi mode these objects are the same

comment:2 by Carlton Gibson, 4 years ago

Cc: Andrew Godwin added
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

I noticed that the signal close_old_connections has a different database wrapper. In the signal and in the middleware...

Thanks Micheal — that's a good spot.

Issue could well be in the interplay with django.db.utils:

class ConnectionHandler:
    def __init__(self, databases=None):
        """
        databases is an optional dictionary of database definitions (structured
        like settings.DATABASES).
        """
        self._databases = databases
        # Connections needs to still be an actual thread local, as it's truly
        # thread-critical. Database backends should use @async_unsafe to protect
        # their code from async contexts, but this will give those contexts
        # separate connections in case it's needed as well. There's no cleanup
        # after async contexts, though, so we don't allow that if we can help it.
        self._connections = Local(thread_critical=True)

If there are different _connections in play it would explain how the connection is not getting reset in request_started.

I didn't reproduce yet, but I'm going to provisionally accept on the basis of the report.

Time to call in the cavalry I think — cc-ing Andrew. :)

Last edited 4 years ago by Carlton Gibson (previous) (diff)

comment:3 by Michael Galler, 4 years ago

So i found the problem, the middleware mixin calls the process_request without thread_sensitive, if change this in my middleware the error is gone.

Due to this problem the connections are never closed, and after a while postgres refused the connections because of too many connections

comment:4 by Carlton Gibson, 4 years ago

So i found the problem, the middleware mixin calls the process_request without thread_sensitive...

Yes, that would explain the different _connections. Super. A middleware can run anything, so like signals, they need to be thread-sensitive: makes sense.

(I was going to ask, do you have the views your using too? — If they're sync then we're adding another sync_to_async() call, which would be another opportunity to switch threads... but maybe it's not needed now.)

Last edited 4 years ago by Carlton Gibson (previous) (diff)

comment:5 by Michael Galler, 4 years ago

So i did some more testing:

async view with thread_sensitive=False in middleware

signal [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001B8A0F369D0>]
middleware [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001B8A0F4E430>]
view async [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001B8A1917F10>]

3 different wrappers :D

sync view with thread_sensitive=False in middleware

signal [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001BA0506C760>]
middleware [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001BA0507D3D0>]
view sync [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001BA0506C760>]

2 different wrappers

async view with thread_sensitive=True in middleware

signal [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001735143B880>]
multi org middleware [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x000001735143B880>]
view async [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x0000017351E19AF0>]

2 different wrappers

sync view with thread_sensitive=True in middleware

signal [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x00000237523AA8B0>]
middleware [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x00000237523AA8B0>]
view sync [<django.db.backends.postgresql.base.DatabaseWrapper object at 0x00000237523AA8B0>]

1 wrapper

Views:

async def async_test(request):
    await asyncio.sleep(0.1)
    print('view async', connections.all())
    return HttpResponse('Hello, async world!')


def sync_test(request):
    print('view sync', connections.all())
    return HttpResponse('Hello, sync world!')
Last edited 4 years ago by Michael Galler (previous) (diff)

comment:6 by Carlton Gibson, 4 years ago

Hi Michael,

... the middleware mixin calls the process_request without thread_sensitive, if change this in my middleware the error is gone.

Right, so here we have it:

    async def __acall__(self, request):
        """
        Async version of __call__ that is swapped in when an async request
        is running.
        """
        response = None
        if hasattr(self, 'process_request'):
            response = await sync_to_async(self.process_request)(request)
        response = response or await self.get_response(request)
        if hasattr(self, 'process_response'):
            response = await sync_to_async(self.process_response)(request, response)
        return response

Src

Those sync_to_async() calls should use the thread_sensitive parameter.

We don't really test the actual DB connection teardown but we should be able to assert that the middleware gets run in the same thread as the calling code, and that the connection is the same (thread.get_native_id() and id(connection), assuming we can't pass the actual objects 🤔)

Not 100% sure on the test location but tests/deprecation/test_middleware_mixin.py seems a reasonable candidate. (If we need views and urls, because we need the handler itself then, there are already some set up in tests/handlers, which might then be a better place.)

How does that sound Micheal? Would you fancy making a PR?

comment:7 by Michael Galler, 4 years ago

That fixes only the problem in the middleware, the connections opened in async views are still not closed.

I added a print to the open and close in DatabaseWrapper to see what is happening

on a sync view everything works fine:

new connection 102828
INFO:     127.0.0.1:58775 - "GET /test_sync/ HTTP/1.1" 200 OK

now waiting 10 secs to ensure that connection is closed due 'CONN_MAX_AGE': 5

close connection 102828
new connection 102516
INFO:     127.0.0.1:58775 - "GET /test_sync/ HTTP/1.1" 200 OK

now on async

new connection 98652
new connection 33540
INFO:     127.0.0.1:59297 - "GET /test_async/ HTTP/1.1" 200 OK

#waiting again more than five seconds

close connection 98652
new connection 101424
INFO:     127.0.0.1:59544 - "GET /test_async/ HTTP/1.1" 200 OK

Only one of the two expired connections are closed, the connection in the async view is always reused.
When the postgres server restarts, the async views are broken until the application is restarted, because these are never closed.

The problem is the connection handler create a new databasewrapper for the async view due to thread_critical.
The comment already points out the problem.

    def __init__(self, databases=None):
        """
        databases is an optional dictionary of database definitions (structured
        like settings.DATABASES).
        """
        self._databases = databases
        # Connections needs to still be an actual thread local, as it's truly
        # thread-critical. Database backends should use @async_unsafe to protect
        # their code from async contexts, but this will give those contexts
        # separate connections in case it's needed as well. There's no cleanup
        # after async contexts, though, so we don't allow that if we can help it.
        self._connections = Local(thread_critical=True)

comment:8 by Carlton Gibson, 4 years ago

That fixes only the problem in the middleware, the connections opened in async views are still not closed.

I was thinking (and I need to test more to see) that the connection opened in the async view would need to be handled via sync_to_async() (with thread_sensitive) too.
(Then we'd just have one all the way down...) I think that's safe, but it's not enforced.

Again, not sure without testing but, do we need to make accessing the connection an async unsafe operation, I wonder? 🤔 — in theory any ORM activity in an async context should trigger an error.

But we need to fix the middleware issue — so I think we should push on there.

Last edited 4 years ago by Carlton Gibson (previous) (diff)

comment:9 by Michael Galler, 4 years ago

Has patch: set
Owner: changed from nobody to Michael Galler
Status: newassigned

in reply to:  8 comment:10 by Michael Galler, 4 years ago

I submitted a PR

Replying to Carlton Gibson:

That fixes only the problem in the middleware, the connections opened in async views are still not closed.

I was thinking (and I need to test more to see) that the connection opened in the async view would need to be handled via sync_to_async() (with thread_sensitive) too.
(Then we'd just have one all the way down...) I think that's safe, but it's not enforced.

I think this would be a good solution to overcome this problem.

I found another problem related to async and the SessionMiddleware

The request.user is packed in a SimpleLazyObject

class AuthenticationMiddleware(MiddlewareMixin):
    def process_request(self, request):
        request.user = SimpleLazyObject(lambda: get_user(request))

If its not unpacked in another Middlware you cant access it in an async view
None of the following variants work

async def async_test(request):
    user = request.user # Exception SynchronousOnlyOperation
    user = await sync_to_async(request.user)() # Exception SynchronousOnlyOperation
    user = await sync_to_async(request.user) # Exception SynchronousOnlyOperation

Should I open another ticket for this ?

Last edited 4 years ago by Mariusz Felisiak (previous) (diff)

comment:11 by Carlton Gibson, 4 years ago

Should I open another ticket for this ?

Yes, I think so. If you can narrow it down to a test case, that would be amazing! . Thanks

comment:12 by Carlton Gibson, 4 years ago

Needs tests: set

in reply to:  8 comment:13 by Michael Galler, 4 years ago

Replying to Carlton Gibson:

That fixes only the problem in the middleware, the connections opened in async views are still not closed.

I was thinking (and I need to test more to see) that the connection opened in the async view would need to be handled via sync_to_async() (with thread_sensitive) too.
(Then we'd just have one all the way down...) I think that's safe, but it's not enforced.

Again, not sure without testing but, do we need to make accessing the connection an async unsafe operation, I wonder? 🤔 — in theory any ORM activity in an async context should trigger an error.

But we need to fix the middleware issue — so I think we should push on there.

The problem with the unclosed connections from async views I could overcome with the following signal:


from asgiref.sync import async_to_sync, sync_to_async
from django.db import connections


def close_all_old_connections(**kwargs):
    def _inner_close_if_unusable_or_obsolete():
        # beware dont use connections.all, all() creates new objects if there are none
        for alias in connections:
            try:
                conn = getattr(connections._connections, alias)
                conn.close_if_unusable_or_obsolete()
            except AttributeError:
                pass

    async def close_all_old_connections_async():
        await sync_to_async(_inner_close_if_unusable_or_obsolete)()

    async_to_sync(close_all_old_connections_async)()


try:
    # try to get the active loop
    # if it fails with a RuntimeError the server is running in WSGI mode and we dont need this signals
    loop = asyncio.get_running_loop()
    request_started.connect(close_all_old_connections)
    request_finished.connect(close_all_old_connections)
except RuntimeError:
    pass

I'm not quite sure why I need to wrap it twice, but it works.
It closes obsolete and dead connections correctly.
It also fixes the problem if the middleware calls sync_to_async without thread_sensitive
Maybe you want to adapt this.

comment:14 by Andrew Godwin, 4 years ago

Definitely agree we need to add thread_sensitive to the sync_to_async calls in the middleware - not doing that was my fault. Honestly tempted to say we should make it the default, and if you want a version that doesn't share threads, you can use a variant that does its own DB cleanup (like channels' database_sync_to_async).

Connections opened in async views will be in a different thread to the synchronous close_old_connections handler if they don't use thread_sensitive=True - the answer there is to always use thread_sensitive=True in any code that wraps or does ORM operations (including the request.user lazy-load, for now).

comment:15 by Mariusz Felisiak, 4 years ago

Needs tests: unset
Triage Stage: AcceptedReady for checkin

comment:16 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 547a07f:

Fixed #31905 -- Made MiddlewareMixin call process_request()/process_response() with thread sensitive.

Co-authored-by: Carlton Gibson <carlton.gibson@…>

comment:17 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In b6d2419:

[3.1.x] Fixed #31905 -- Made MiddlewareMixin call process_request()/process_response() with thread sensitive.

Co-authored-by: Carlton Gibson <carlton.gibson@…>

Backport of 547a07fa7ec4364ea9ecd2aabfdd16ee4c63003c from master

Note: See TracTickets for help on using tickets.
Back to Top