How to parallelise the snap

egeyumi · September 26, 2021, 9:20pm

I am working snap on 48 core 128gb ram computer. But still get the same speed as my 4 core 16gb computer. The first one makes at least 8 of my pc, so is there a way to make the process parallelise in a computer.

ABraun · September 27, 2021, 5:42am

Did you already increase the memory available to SNAP?
Instructions on how to do that are given here: I’m getting the error “Cannot construct DataBuffer”, “GC overhead limit exceeded” or “Java Heap Space”. What can I do?

Info you gave 128 GB RAM, you can try inserting 80G or even 90G

marpet · September 27, 2021, 6:40am

In general SNAP does the parallelization automatically, but it does it on a tile basis. The data is split into tiles and each is calculated on a separate thread. But if your data has only 4 or 8 or 2 tiles, then only these number of threads are used. The number of tiles depends on the size of your input images and also how the tiling is defined by the data format.
Beside this there are algorithms which prevent a meaningful parallelization based on tiles because they need the full image as input, for example.
You can change the default tiling.
When using gpt you can set this property:

-Dsnap.jai.defaultTileSize=100
The default tile size is 512.
You can also put this into the gpt.vmoptions file

You can also change the value for this property in snap.properties ($INSTALL_DIR/etc/), then also the tiling in SNAP Desktop is changed.

egeyumi · September 27, 2021, 1:15pm

Thank you I am waiting to talk to professor to download notepad++ to change those, the computer can’t open the files somehow.

egeyumi · September 27, 2021, 1:18pm

Thank you for your reply. But will it be strange if I say I didn’t understand. Is just changing in snap.properties enough or do I need to also add Dsnap.jai.defaultTileSize=100 line to a gpt.vmoptions file ? Basicly what I want to ask is that doing only one option is necessary right? Or doing 2 simultaneously is advisable ?

marpet · September 28, 2021, 6:19am

Changing the snap.properties is enough. This has affects both gpt and commandline.
But sometimes you only want to change it for a certain processing or only for the command line. The you can go for the other options.

gnwiii · September 29, 2021, 10:33am

It is possible that disk I/O speed is a bottleneck. I used to run batch processing on a 24-core system with a single hard disk, borrowed from a numerical modelling project. Using GNU parallel and the NASA IOCCG processing system, more than 6 processes did not increase thruput. For numerical models, a single hard disk was adequate, but for remote sensing workloads disk I/O speed is often more critical than the number of cores.

egeyumi · October 1, 2021, 2:06pm

Is there any way to circumvent it that you can advise ?

gnwiii · October 1, 2021, 3:39pm

Performance optimization is complicated, but worthwhile if you have 1000’s of
batch tasks to run. There are different RAID types if you can afford multiple drives, and large SSD’s are a whole different price category and performance level. A typical
configuration uses “compute servers” with striped internal drives for fast “scratch” storage and a separate high-end and high reliability storage system with replication to an offsite data centre. Each task will have a different point of diminishing returns for
the number of parallel jobs which you have to determine by experimentation and use of monitoring tools.

Intel’s multi-core CPU’s with NUMA memory often have counter-intuitive performance characteristics due to cache usage. If the system is used for other tasks it can be difficult to get reproducible performance. Some systems reduce CPU speed to control CPU temperatures when running CPU-intensive tasks on many cores, so you may need to monitor CPU core temperatures while trying to tune performance. There are also scheduling systems that will run batch tasks during off-peak hours. A system can be used for interactive visualization (e.g., quality control checks) during the day can work through a list of batch jobs at night.

egeyumi · October 2, 2021, 2:45pm

Thank you for your reply. But I am using remote pc to run them. I can’t even change software characteristics let alone hardwire characteristics. Your suggestion seems to have a high potental, but in my current limitations, it is not possible. I guess they have to just wait normal process to take its time.

egeyumi · October 2, 2021, 2:47pm

Thank you all, it seems I will just wait for it.

gnwiii · October 2, 2021, 9:53pm

Don’t give up yet. You should be able to use OS monitoring tools to document the performance bottlenecks. Remote systems often have job management software with remote monitoring of processing status and resource usage, and “RAM” disk (using some of the memory as fast file storage) that can be used to store intermediate files that are going to be discarded at the end of the processing chain. Once you know where the bottlenecks are you can get help here and from Java forums to tweak SNAP’s Java configuration settings for your use case.

egeyumi · October 3, 2021, 8:09pm

That’s a much easier way, and I will try this tomorrow. Thanks a lot.