Using SNAP on Amazon Web Services


#1

I was wondering if anyone had any experience of running SNAP on amazon web services, particularly in regard to performance of the graph processing tool.

The processes I will be running are basic processing of Sentinel-1 SLC imagery, including stages like splitting, coregistering, terrain flattening merging and terrain correction. To run my graph once takes about an hour on my laptop (4 cores, 16gb ram).

Specifically, I would like to know:

  • Is it more efficient to have several instances, and run processes in parallel across these, or is it better to have one very powerful instance and run processes linearly?
  • Is there a point in system resources where SNAP will no longer increase from increasing memory or number of cores?
  • What are the best parameters for the gpt, in gpt.vmoptions?

Any other advice people have with regards to this topic would be very welcome. Otherwise I’ll just give it a go and report back!

Thanks,


#2

Hi,

I’ve no experience with AWS but we run our cluster and I can give some general advices for running gpt.
I think it is best to run one processing chain on one instance. If you split it you need to transfer the results from one instance to the other. An this will slow down the processing.

In the gpt.vmoptions you should set set the Xmx value for the heap space.
If you have 16 GB RAM you can set

-Xmx13G

Also important is the cache size gpt uses. This is the -c parameter of gpt.
Or you change the default value for it in etc/snap.properties.
Its name is snap.jai.tileCacheSize. I think a good value is 70%-80% of the heap space.

Influence on the performance has also the snap.jai.defaultTileSize property. Also in the snap.properties file. It can be also specified on the command line or in the gpt.vmoptions as system property.
This property defines the size of the tiles which are computed on each core. And therefor it influences the memory usage and the performance if you run out of memory.
A huge tile size can result into a single product and then only one core is used and the computation is not performed in parallel.
A to small size can result into to many threads and this results in thread management overhead.
You just need to play with these settings to find the best for you use case.


Batch processing of biophysical retirevals using multiple CPUs
#3

Here’s a tutorial from ASF on how to run SNAP on AWS:

https://www.asf.alaska.edu/asf-tutorials/data-recipes/rtc-sentinel-1-in-the-cloud/


#4

The tutorial posted by @mengdahl is a good start, in fact it is quite comfortable to handle SNAP on an AWS instance. My recommendations concerning this topic are:

  • select an Ubuntu server as Instance
  • in an optimal case, you have a prepared docker to set up your machine, otherwise, install a Python 3.x or 2.x via Anaconda, install jpy, install snap, configure snap to talk with your Python installation (the same things you would do on your local machine)
  • use a Ubuntu/unix based OS desktop machine as local operator in order to avoid X11 forwarding conflicts (happened to me when I worked with a team where Macs are used, so be aware of this possible problem), do not worry abot the hardware of the local machine, it just has to enter the internet, an old laptop running Knoppix is an inexpensive solution…
  • you can use SNAP via Python or gpt, however, if you wan to use the GUI, cd to your snap/bin dir, I guess it should be something like this:
    cd /home/user/snap/bin
    and call launch snap with
    sh snap
    Now it depends on your internet connection how usable the GUI is, but my experience is, its quite good :slight_smile:

Which is very good explained in the asf tutorial is the handling of the input and output data, for a download after processing.