Snappy: reading product as in-memory

gbaier · October 8, 2018, 6:16am

I’m working with temporary files and snappy. Is there a way to read a product from disk completely as in-memory, so that the actual files can be deleted after reading? As far as I understand org.esa.snap.core.datamodel.Product does not own any data, but only links to files on disk.

TonioF · October 9, 2018, 9:58am

Well, you can explicitly read all data into memory (call any of the read-methods on all of the bands, e.g., readRasterDataFully()). However, at some point SNAP will try to clean the memory and get rid of some of the data that has been read. You have limited options to prevent that, the best you can do is to increase your tile cache. When you are dealing with a large product, this probably won’t work.
If you don’t need the product object, you could read in all data into arrays ( band.getSourceImage().getData().getPixels() ). Also, this probably isn’t feasible for large products, neither.

Overall, what you want to do is possible for smaller products, but I don’t encourage you do this. I don’t know what you want to do, but I feel there might be a better way.

gbaier · October 9, 2018, 12:35pm

Hi Tonio,

thanks for the reply. Indeed I’m running into problems with SNAP trying to clean the memory for large products and unfortunately I need the actual product instance.

Since you brought it up that probably there’s a smarter way to achieve what I want. What I’m trying to do is wrap GPT operator calls using subprocess in Python, so that they look similar to calling snappy directly. The reason why is that snappy calls are either not multithreaded or seem to run into problems with the GIL and I was hoping to circumvent this. For this I need to write the input to disk, possibly a tmpfs in RAM or a fast SSD, which is still feasible. But also the output product should be transparently returned for the function and also cleaned up from this. My idea was just to read it completely in-memory. Below you find my admittedly ugly solution that I came up with, with thw problem of manually having to clean up out_prod, when it is no longer needed.

import tempfile
import snappy
import secrets

def gpt_wrapper(operator, params, products, tmpdir):
    """ wraps the call to a GPT operator

    This function writes all products to disk
    for processing with GPT and reads the output. The input products are
    cleaned up. The output however must remain in tmpdir since SNAP products
    link to files on disk.

    Parameters
    ----------

    operator: string
        the operator that is called. Execute "gpt -h" to get a list
    params: dictionary
        parameters for the operator
    products: list of SNAP products
    tmpdir: temporary directory for storing files


    """

    opts = []
    gpt = snappy.snap_home + '/bin/gpt'
    for key, val in params.items():
        opts.append('-P{}={}'.format(key, val))

    with tempfile.TemporaryDirectory(dir=tmpdir) as inner_tmpdir:
        in_files = []
        # write products to disk for processing with GPT
        for prod in products:
            prodpath = inner_tmpdir + '/' + prod.getName() + '.dim'
            beamdimap.write_product(prod, prodpath)
            in_files.append(prodpath)

        # get a hopefully unique name for the output product
        target_name = tmpdir + '/' + 'target_' + operator.lower() + '_' + str(
            secrets.randbelow(100000)) + '.dim'

        proc = subprocess.run(
            [gpt, operator, '-t', target_name, *opts, *in_files],
            stdout=subprocess.PIPE)
        out_prod = beamdimap.read_product(target_name)
    return out_prod, proc

TonioF · October 11, 2018, 9:14am

Okay, I see your problem. Writing out and reading back in a product causes some overhead here, but keeping the output product in the temp folder would be one solution. Do you need this method for a specific purpose or should it be a general solution? When you want to use the output product as input to another operator it would be sufficient to call the GPT on the product that is at the end of the chain. Or don’t you want to have an output product written at all?