Temporary fix for snappy memory issues

Potential fix/work-around to problems observed in these posts:


This issue has been driving me nuts. Others and myself couldn’t find a way for snappy to process multiple images one after the other without it consuming all of the memory on a machine without clearing it.

After a lot of digging around I’ve stumbled across Subprocesses in Python. Effectively, you spawn a new Python process via a .py file which runs and then terminates. This terminating behaviour frees the memory snappy is using, much akin to just killing the script normally.

The line of code I use to spawn my processing pipeline is:
pipeline_out = subprocess.check_output(['python', 'src/SarPipeline.py', location_wkt], stderr=subprocess.STDOUT)

Note: pipeline_out is the STDOUT from the script, so in my case to find out what file has just been processed I have print("filepath: " + path_to_file) in src/SarPipeline.py so that I can traverse pipeline_out, extract the line that begins with filepath: and then serve that file via my Flask API.

This is by no means an ideal nor pretty, but it works. I can call my SAR API as many times as I like and the memory usage always drops back to next to 0.

Main issues now:

  • Needs a load of extra error handling and code to find the desired output lines

  • You lose the console output from snappy in logs/when following the process on the command line

  • You must reference the file you’re spawning in the subprocess relative to where the parent script was run from

Sorry for the huge post!

Hopefully this will help alleviate any memory/caching issues people are having when trying to provide a service utilising snappy or a pipeline that processes multiple images.

Please feel free to reply to this/message me with any concerns!

Ciaran

10 Likes

Thanks Ciaran for sharing this idea.

I have tried it and confirm that it provides a way to free the memory usage after performing heavy operations.

Because subprocess can only ingests string arguments, I have used it in combination with the “json” package in order to give a bit more flexibility (the idea being to encode/decode python objects into json strings). Building on Ciaran’s example:

import json
mydict =  dict(
     parameter1='a',
     parameter2='b',
)
mydict_str = json.dumps(mydict)   # encode dictionnary to json string
pipeline_out = subprocess.check_output(['python', 'src/SarPipeline.py', mydict_str], stderr=subprocess.STDOUT)

# in SarPipeline.py:
import sys
import json
mydict_str = sys.argv[1]
mydict = json.loads(mydict_str)  # decode json string into dictionnary

As Ciaran pointed out this solution has many inconveniences, but until the problem is not resolved in SNAP it is helpful.

4 Likes

Great shout on the JSONified parameters!

Hopefully this workaround will at least unblock some peoples projects etc!

In my experience, executing snappy commands in a subprocess is very slow in Python.

If speed is important to you, a faster solution is to simply execute the GPT tool from the command line and not use snappy at all for processing. You can then read in the product using snappy ProductIO.readProduct() making sure to dispose of it afterwards product.dispose() and then continue processing in python. Or you can write it to HDF5 format and read it using h5py. I found it took 14s with subprocess and snappy, versus 11s with GPT.

This is not nearly as fast as using snappy to do the processing (6s), however it avoided all the memory problems when bulk processing.

2 Likes

Thanks @Ciaran_Evans! I found a (relatively) clean solution based on your idea.

For every SNAP process use concurrent.futures.ProcessPoolExecutor like here, then you still have logs, the function can stay on the same file, and can pass arguments regularly (not only string) etc…

Notice the you can’t pass SNAP’s Products (and other things) as an argument. There is a workaround here for some cases.

Hope it helped someone!

1 Like