Running SNAP on a cluster

Dear all,

I am trying to run SNAP gpt command line prompts on the CREODIAS virtual machines. To improve the speed of the processing, I am opening a number of connections to the virtual machine and then running SNAP on them independently.

Is there a known limit to how many SNAP instances I can have running at the same time without having any issues in the processing?

Additionally, if other users on the virtual machine are using SNAP will this impact my processing?

Best wishes,
Harry

There aren’t useful general guidelines because cluster structure (interconnects, etc.) and workloads vary widely. For example, some tasks are time-critical and need the shortest possible time to compute one result, while others need to maximize thruput generating 1000’s of results.

You need to learn how to use the monitoring tools available for your platform. Tuning software and hardware configurations for a particular workload can be a big job (and more difficult if you can’t control competing workloads).

Many remote sensing workloads are I/O intensive and don’t do well on hardware designed for heavy numerical workloads. My experience with clusters has been sharing systems indented for numerical models. The systems all had limited I/O capacity. Most of our remote sensing workloads would saturate the I/O capacity with a few CPU’s. CREOS is a much newer hardware environment (we had arrays of rotating disks) but some workloads could quickly fill a local NVME store and then encounter bottlenecks moving data to long-term storage.

You should review SNAP GPF: configuring the JAI TileCache and TileScheduler.