Opened 10 months ago

Closed 10 months ago

Last modified 10 months ago

#35279 closed Cleanup/optimization (invalid)

Memory Leak with `prefetch_related`

Reported by: Ken Tong Owned by: nobody
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: memory leak
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Memory Leak after calling queryset.prefetch_related() or prefetch_related_objects()

To reproduce:

import gc
from django.db import models
from django.db.models import prefetch_related_objects


class Foo(models.Model):
    id = models.AutoField(primary_key=True)


class Bar(models.Model):
    id = models.AutoField(primary_key=True)
    foo = models.ForeignKey(Foo, on_delete=models.CASCADE)


def prepare_data():
    if Foo.objects.exists():
        return
    foo = Foo()
    foo.save()
    bar = Bar(foo=foo)
    bar.save()


def test1():
    # no prefetch
    for foo in Foo.objects.all():
        for bar in foo.bar_set.all():
            print(foo.id, bar.id)


def test2():
    # queryset.prefetch_related()
    for foo in Foo.objects.prefetch_related("bar_set").all():
        for bar in foo.bar_set.all():
            print(foo.id, bar.id)


def test3():
    # prefetch_related_objects()
    foo_list = list(Foo.objects.all())
    prefetch_related_objects(foo_list, "bar_set")
    for foo in foo_list:
        for bar in foo.bar_set.all():
            print(foo.id, bar.id)


def run():
    prepare_data()

    # warn up
    test1()
    test2()
    test3()

    gc.collect()

    gc.set_debug(gc.DEBUG_LEAK)

    gc.collect()
    print(f"baseline - garbage count: {len(gc.garbage)}")

    test1()
    gc.collect()
    print(f"test1 - garbage count: {len(gc.garbage)}")

    test2()
    gc.collect()
    print(f"test2 - garbage count: {len(gc.garbage)}")

    test3()
    gc.collect()
    print(f"test3 - garbage count: {len(gc.garbage)}")

    gc.set_debug(0)


run()

Output

1 1
1 1
1 1
baseline - garbage count: 0
1 1
test1 - garbage count: 0  # no memory leak
1 1
test2 - garbage count: 23  # 23 objects leaked
1 1
test3 - garbage count: 46  # another 23 objects leaked

Change History (6)

comment:1 by Ken Tong, 10 months ago

Hi Team,

So far I am adding the code below in the appropriate lines in order to fix the memory leak in my projects. Hopefully there will be a fix and documented way to properly clean up the cache.

foo._prefetched_objects_cache.pop("bar_set")

Thank you for your attention!

comment:2 by Mariusz Felisiak, 10 months ago

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

Interesting, thanks for the report. Tentatively accepted for further investigation.

comment:3 by Antoine Humbert, 10 months ago

The following code snippet shows the same result:

import gc


class Parent:
    
    def __init__(self):
        self.cache = {}
        
        
class Child:
    
    def __init__(self, parent):
        self.parent = parent
        
        

def test():   
    foo = Parent()
    bar = Child(parent=foo)
    foo.cache["bars"] = [bar]
    print(foo.cache, bar.parent)


test()
gc.collect()
print(len(gc.garbage))

gc.set_debug(gc.DEBUG_LEAK)
gc.collect()
print(len(gc.garbage))

test()
gc.collect()
print(len(gc.garbage))

Results in following output

{'bars': [<__main__.Child object at 0x6f520cdd90>]} <__main__.Parent object at 0x6f520cd6d0>
0
0
{'bars': [<__main__.Child object at 0x6f520b32d0>]} <__main__.Parent object at 0x6f520b1fd0>
gc: collectable <Parent 0x6f520b1fd0>
gc: collectable <Child 0x6f520b32d0>
gc: collectable <list 0x6f520b1600>
gc: collectable <dict 0x6f520b1e80>
4

Removing the gc.set_debug statement, the gc.garbage is always empty, so it looks like à side effect of DEBUG_LEAK.

{'bars': [<__main__.Child object at 0x7535cf1d90>]} <__main__.Parent object at 0x7535cf1650>
0
0
{'bars': [<__main__.Child object at 0x7535cd7310>]} <__main__.Parent object at 0x7535cd5fd0>
0

As per the gc documentation:

To debug a leaking program call gc.set_debug(gc.DEBUG_LEAK). Notice that this includes gc.DEBUG_SAVEALL, causing garbage-collected objects to be saved in gc.garbage for inspection.

So, using DEBUG_LEAK leads to collected objects to be present in gc.garbage. So, I would say that looking at gc.garbage in this case does not identifies a memory leak. On the contrary, it shows objects that were garbage collected

Last edited 10 months ago by Antoine Humbert (previous) (diff)

comment:4 by Ken Tong, 10 months ago

Thank you for your detailed explanation, Antoine. I confirm that memory leak is a false alarm and I am sorry about it

comment:5 by Ken Tong, 10 months ago

Resolution: invalid
Status: newclosed

comment:6 by Mariusz Felisiak, 10 months ago

Triage Stage: AcceptedUnreviewed

TIL

Note: See TracTickets for help on using tickets.
Back to Top