Temporary fix for snappy memory issues


#1

Potential fix/work-around to problems observed in these posts:


This issue has been driving me nuts. Others and myself couldn’t find a way for snappy to process multiple images one after the other without it consuming all of the memory on a machine without clearing it.

After a lot of digging around I’ve stumbled across Subprocesses in Python. Effectively, you spawn a new Python process via a .py file which runs and then terminates. This terminating behaviour frees the memory snappy is using, much akin to just killing the script normally.

The line of code I use to spawn my processing pipeline is:
pipeline_out = subprocess.check_output(['python', 'src/SarPipeline.py', location_wkt], stderr=subprocess.STDOUT)

Note: pipeline_out is the STDOUT from the script, so in my case to find out what file has just been processed I have print("filepath: " + path_to_file) in src/SarPipeline.py so that I can traverse pipeline_out, extract the line that begins with filepath: and then serve that file via my Flask API.

This is by no means an ideal nor pretty, but it works. I can call my SAR API as many times as I like and the memory usage always drops back to next to 0.

Main issues now:

  • Needs a load of extra error handling and code to find the desired output lines

  • You lose the console output from snappy in logs/when following the process on the command line

  • You must reference the file you’re spawning in the subprocess relative to where the parent script was run from

Sorry for the huge post!

Hopefully this will help alleviate any memory/caching issues people are having when trying to provide a service utilising snappy or a pipeline that processes multiple images.

Please feel free to reply to this/message me with any concerns!

Ciaran


Snappy not freeing memory
#2

Thanks Ciaran for sharing this idea.

I have tried it and confirm that it provides a way to free the memory usage after performing heavy operations.

Because subprocess can only ingests string arguments, I have used it in combination with the “json” package in order to give a bit more flexibility (the idea being to encode/decode python objects into json strings). Building on Ciaran’s example:

import json
mydict =  dict(
     parameter1='a',
     parameter2='b',
)
mydict_str = json.dumps(mydict)   # encode dictionnary to json string
pipeline_out = subprocess.check_output(['python', 'src/SarPipeline.py', mydict_str], stderr=subprocess.STDOUT)

# in SarPipeline.py:
import sys
import json
mydict_str = sys.argv[1]
mydict = json.loads(mydict_str)  # decode json string into dictionnary

As Ciaran pointed out this solution has many inconveniences, but until the problem is not resolved in SNAP it is helpful.


#3

Great shout on the JSONified parameters!

Hopefully this workaround will at least unblock some peoples projects etc!