Memory leak in snappy/jpy when passing/returning arrays

It seems snappy (or jpy) is unable to free the memory corresponding to arrays being passed to or returned from Java. Please find below a minimal example, which quickly fills up the memory until the JVM crashes.

Python code:

import snappy                                                                                                                                              
import numpy as np                                                                                                                                         
                                                                                                                                                           

SlcImage = snappy.jpy.get_type('org.jlinda.core.SLCImage')                                                                                                 
AbstractMetadata = snappy.jpy.get_type(                                                                                                                    
    'org.esa.snap.engine_utilities.datamodel.AbstractMetadata')                                                                                            
product = snappy.ProductIO.readProduct('product.dim')                                                                                                           
master_root = AbstractMetadata.getAbstractedMetadata(product)                                                                                              
master_meta = SlcImage(master_root, product)
for idx in range(1000):
    temp = np.asarray(master_meta.test(np.ones((1, int(3e7)))))

Java code:

public double[] test(double[] pixel) {
    double[] tr = new double[pixel.length];
    for (int i = 0; i < pixel.length; i++) {
            tr[i] = 30.0 + pixel[i];
    }
    return tr;
}

Should this be freed explicitly?

I managed to resolve this issue. Please see related closed issue on GitHub.

1 Like

Hi @estebanaguilera, was this fix ever integrated into snappy? Could this be the problem we are experiencing with memory leaks in other posts? Without having knowledge of Java code, is it possible to implement your fix manually?

I’m not aware of any fix on the snappy side. My previous solution needs some Java coding.

Regarding your link to other posts, it points to here.

Thanks @estebanaguilera I’ve updated that post to here: Snappy not freeing memory. So to implement your fix I would need to manually edit some Java code in jpy? Would you be able to point me to where this is, or create a PR with the necessary changes in your Github issue?

If you provide an MWE in this thread, I could provide some advice. In any case, this is an issue for the snappy dev team.

HI @estebanaguilera. Here is what I have. Thanks for your help.

Snappy 7.0.3, Python 3.6.9, Ubuntu 18.04.4 LTS.

from snappy import ProductIO, GPF, jpy
import numpy as np
import resource

# Source is a subset of S3A_OL_1_EFR_.SEN3 file
source_file = '/home/mark/project/product.dim'

# Specify processor parameters (e.g., RayleighCorrection)
HashMap = jpy.get_type('java.util.HashMap')
brr_parameters = HashMap()
brr_parameters.put('sourceBandNames',
                   'Oa06_radiance,Oa07_radiance,Oa08_radiance,Oa09_radiance,'
                   'Oa10_radiance,Oa11_radiance,Oa12_radiance,Oa18_radiance')

for i in range(1000):
    # Read source 
    source = ProductIO.readProduct(source_file)

    # Compute a product using GPF
    bp = GPF.createProduct('RayleighCorrection', brr_parameters, source)

    # Read a band to numpy array
    h = int(bp.getSceneRasterHeight())
    w = int(bp.getSceneRasterWidth())
    band = bp.getBand('rBRR_06').readPixels(0, 0, w, h, np.zeros((h, w), np.float32))

    # Dispose of products
    bp.dispose()
    source.dispose()

    # Print memory usage
    print("Run %s. Memory usage: %s" % (str(i), str(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)))
    print("\n")

product.zip (209.8 KB)

Assuming the readPixels method is leaking memory, I wouldn’t use snappy but rather a combination of GPT and numpy.

I suggest the following alternative processing approach:

  1. First, use GPT from the command line to apply the RayleighCorrection operator. You can automate this call using python’s subprocess. Note that you can pass operator parameters in the gpt call: gpt my_graph.xml -PmyParam=50.2, assuming you have ${myParam} in my_graph.xml.
  2. Then, each output band can be read as follows (this won’t leak memory):
    import numpy as np
    
    path = '/home/mark/project/output.data/rBRR_06.img'
    shape = (44, 57)  # take it from rBRR_06.hdr (samples, lines)
    dtype = '>f8'  # take it from rBRR_06.hdr
                   # if byte order is 1, use `>`, else `<`
                   # if data type is 4, use 'f4'; if it's 5, use 'f8'
    band = np.memmap(path, shape=shape, dtype=dtype, offset=0, mode='r')
    

Tip: If you want to do out-of-core processing in python, you can load that band array into a dask array (see here).

Hi @estebanaguilera

Thanks for your reply.

+1 for the example for reading the product using numpy and for the reference to dask. I am sure this will help a lot of people, as it will significantly reduce the memory usage by avoiding snappy ProductIO.readProduct.

After some testing using my MWE, I found that the leak is not restricted to the readPixels method but can even be reproduced when just creating a product using GPF, or merely reading a product using ProductIO.readProduct and then disposing of it again.

Your suggestion to use GPT instead of snappy is a good one. I made a comment on another thread making the same suggestion.

Seems like that is the only solution until the deeper memory leak with snappy can be identified and corrected.

1 Like