Opened 3 months ago

Closed 3 months ago

Last modified 3 months ago

#35853 closed New feature (wontfix)

Disable garbage collection during transaction.atomic.

Reported by: Matej Spiller Muys Owned by:
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords: gc, atomic, transaction
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We noticed that occasionally atomic blocks are keeping transaction open for longer than expected.
This was traced to GC being run during atomic operation affecting the DB performance since locks are released too late.

We added tracing to GC and transactional.atomic() and we noticed that GC is keeping DB transaction longer from 0.5 to 1 second.

To avoid that we extended atomic object to call gc.disable before creating new transaction and gc.enable after commit or rollback.
This positively impacted database load. Because we use mod_wsgi with a lot of workers and horizontally scaling the app it happens quite often that GC is run inside a DB transaction making whole system slower.

It would be nice to have this buildin in Django (opt-out, opt-in) for better DB performance.

Change History (4)

comment:1 by Natalia Bidart, 3 months ago

Component: UncategorizedDatabase layer (models, ORM)
Resolution: wontfix
Status: newclosed

Hello Matej Spiller Muys, thank you for taking the time to create this report.

It might be worth asking on the Django forum for help/input and then confirming if something new would need adding to Django (and what that would look like). We usually ask contributors to get feedback from the community through a forum discussion first, so we can confirm there is community consensus that Django should add this feature.

From my personal view, this seems like a very specific need arising from a niche use case. I believe you could implement your own atomic decorator on top of Django's and disable the garbage collector on entry and re-enable on exit? I'm not sure this applies to the broader ecosystem, and Django is a framework designed to offer robust and accurate solutions for common scenarios. But if you get positive feedback from the forum, please comment back and link the forum conversation here so we reopen the ticket.

comment:2 by Matej Spiller Muys, 3 months ago

The need for having db transactions as short as possible is quite general rule.

In other languages, GC is not blocking the main thread. In Python that is unfortunately not the case.
Basically if you can get eliminate GC you can scale better. And also DBA will be much happier because it won't hog the database.
We are monitoring the database performance and are trying to have transactions duration at max 0.5 seconds.
If you have shorted transactions you also need less concurrent connections as well.

We are currently having our own atomic decorator that is disabling GC but I filled the issue because it seems like could be useful for wider audience. I will write to forum and see if there is more general need for high performance Django minding DB resources.

Example code that we use:

@contextlib.contextmanager
def disable_gc():
    _gc_was_enabled = gc.isenabled()
    if _gc_was_enabled:
        gc.disable()
    try:
        yield
    finally:
        if _gc_was_enabled:
            gc.enable()

@contextlib.contextmanager
def atomic(using=None, savepoint=True, durable=False):
    with disable_gc(), transaction.atomic(using, savepoint, durable):
        yield

comment:3 by Claude Paroz, 3 months ago

Maybe a benchmark with as sample project could also help making a decision here?

in reply to:  2 comment:4 by Natalia Bidart, 3 months ago

Replying to Matej Spiller Muys:

The need for having db transactions as short as possible is quite general rule.

I disagree with this. While I recognize that managing transaction thresholds and garbage collection may be critical for large-scale services, such as Instagram, I believe that the average Django project does not require such measures. Django has been around for nearly 21 years, and this is the first time such a request has been reported in the issue tracking system. This suggests that the need for disabling GC to allow lowering the transaction max time is not a common concern among the broader Django users.

We are currently having our own atomic decorator that is disabling GC but I filled the issue because it seems like could be useful for wider audience. I will write to forum and see if there is more general need for high performance Django minding DB resources.

Nice, do you have any benchmarks available, following Claude's ask? Please note that besides needing community consensus, we'd need a clear win on performance for a change like the one proposed. This is documented in https://docs.djangoproject.com/en/dev/internals/contributing/bugs-and-features/#requesting-performance-optimizations.

Example code that we use:

@contextlib.contextmanager
def disable_gc():
    _gc_was_enabled = gc.isenabled()
    if _gc_was_enabled:
        gc.disable()
    try:
        yield
    finally:
        if _gc_was_enabled:
            gc.enable()

@contextlib.contextmanager
def atomic(using=None, savepoint=True, durable=False):
    with disable_gc(), transaction.atomic(using, savepoint, durable):
        yield

I believe this code is not thread-safe nor reentrant-safe. For a proposal like this to be considered for inclusion in the core framework, we would need strong guarantees that it meets these criteria and that garbage collection is not inadvertently "disabled forever" due to unforeseen bugs.

While the need for this functionality is valid and presents an interesting challenge, I reiterate that this is a niche use case, perhaps relevant to only 1% or even 0.5% of Django services deployed in the wild. As previously mentioned, Django aims to provide a robust, stable, and secure platform for common web service requirements, and this particular need feels outside the realm of what is considered "common."

Note: See TracTickets for help on using tickets.
Back to Top