Opened 10 years ago

Closed 10 years ago

#25072 closed Bug (fixed)

GDALRaster Garbage Collection not working

Reported by: Daniel Wiesmann Owned by: nobody
Component: GIS Version: dev
Severity: Normal Keywords: memory gis raster
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I noticed that when creating many GDALRasters in a single context, the memory allocated to GDALRasters does not get released properly. For instance, the following loop as an ever growing memory consumption:

from django.contrib.gis.gdal import GDALRaster
n = 1000
for x in range(n):
    rast = GDALRaster({
        'driver': 'MEM',
        'width': 500,
        'height': 500,
        'nr_of_bands': 1,
        'srid': 3086,
        'origin': (500000, 400000),
        'scale': (1, -1),
        'skew': (0, 0),
        'bands': [{
            'nodata_value': 10,
            'data': range(500**2)
        }],
    })

Adding rast.__del__() to the loop solves the problem, but I guess that should be automatic.

Attachments (2)

25072.diff (1.3 KB ) - added by Claude Paroz 10 years ago.
Allow GDALRaster garbage collection
25072_segfault.diff (1.3 KB ) - added by Daniel Wiesmann 10 years ago.
This test leads to a segfault on my system.

Download all attachments as: .zip

Change History (16)

comment:1 by Daniel Wiesmann, 10 years ago

Unfortunately this goes beyond my knowledge of how python works. I am happy to work on this if its productive, but would need help.

comment:2 by Tim Graham, 10 years ago

I don't have much experience with garbage collection, but after doing some reading I think this is expected as there is no guarantee that the garbage collector will run in code like this. If you put that code in a function, is __del__() called when the function exits (i.e. local variables should be garbage collected at that point)?

comment:3 by Claude Paroz, 10 years ago

Triage Stage: UnreviewedAccepted

It might be related to bands. I made some tests, and as soon as I access self.bands, the raster in not garbage collected. Perhaps because there is a circular reference between raster (bands cached property) and bands (band.source). That's a guess for now...

by Claude Paroz, 10 years ago

Attachment: 25072.diff added

Allow GDALRaster garbage collection

comment:4 by Claude Paroz, 10 years ago

Daniel, could you test the attached patch and see if that changes anything in your experiments?

comment:5 by Daniel Wiesmann, 10 years ago

Thanks Claude, I can confirm that your patch solves the problem for my case. I just tried the patch on both the loop posted above and the use case where I spotted the problem in the first place. The memory usage stayed flat in both runs.

comment:6 by Daniel Wiesmann, 10 years ago

I just noticed that I started getting segmentation faults when accessing the GDALBand data method after applying the patch. I will have a closer look at what is going on there.

comment:7 by Claude Paroz, 10 years ago

Yes, you'll get segmentation faults if you use a band while its source has been garbage collected. The patch needs improvements to safeguard against this, if possible.

Last edited 10 years ago by Claude Paroz (previous) (diff)

by Daniel Wiesmann, 10 years ago

Attachment: 25072_segfault.diff added

This test leads to a segfault on my system.

comment:8 by Daniel Wiesmann, 10 years ago

I tracked down the segmentation fault I was getting, see the diff I attached. Somehow accessing the data directly from a queryset without creating intermediate objects gives me a segmentation fault. With an intermediate object it does not.

comment:9 by Claude Paroz, 10 years ago

Here's another possible solution: https://github.com/django/django/pull/4971

That is stop to cache GDALRaster.bands, which is the cause of the other side of the circular reference.

comment:10 by Daniel Wiesmann, 10 years ago

That looks fine to me, I think most rasters have only a few bands.

Maybe we can add a method to get only one specific band by index? Like this single bands can be accessed efficiently. Something along the lines of:

def get_band(self, index):
    if index < 0 or index >= capi.get_ds_raster_count(self._ptr):
        raise GDALException('Band index out of range.')
    return GDALBand(self, index)

comment:11 by Claude Paroz, 10 years ago

Has patch: set

Your wishes are my orders :-) Please check the latest PR version (while still checking the initial memory issue).

comment:12 by Daniel Wiesmann, 10 years ago

Thanks Claude!

I just ran your patch through the loop and applied use case and it worked perfectly, including low memory use.

The issues that came up through this ticket has made me realize that it could be important to only instantiate specific bands (some time series rasters might have lots of bands, its is not common, but happens). That's why I asked for this...

comment:13 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

comment:14 by Claude Paroz <claude@…>, 10 years ago

Resolution: fixed
Status: newclosed

In d72f8862:

Fixed #25072 -- Prevented GDALRaster memory to be uncollectable

Setting GDALRaster.bands as a cached property was creating a circular
reference with objects having del methods, which means the memory
could never be freed.
Thanks Daniel Wiesmann for the report and test, and Tim Graham for the review.

Note: See TracTickets for help on using tickets.
Back to Top