Multi-Threading and Performance of snappy / jpy

hitzelbe · November 8, 2017, 8:47am

I’m creating a pre-processing chain for SAR products based on snappy, using mainly a Terrain Correction operator.
Performance is poor compared to using SNAP or gpt from the shell. Obviously, the reason that calling the java method via jpy from python does not use multi-threading, at least in my case (Ubuntu 16.04 in a VM with 32 GB, python 3.4): During the execution of

ProductIO.writeProduct(terrain_corrected_product,
terrain_corrected_product_file_path,
“GeoTIFF-BigTIFF”)

the running script occupies only one (in this case: out of four) cores of the CPU. This is of course much slower than the same routine called from SNAP.

According to Performance of snap desktop and snappy however, it should: (perhaps):

the called java code should actually be executed multithreaded. at least for the GPF calls and the execution of operators. The number of threads is determined by the number of cores available.

I know that (C-)Python is a bit awkward regarding parallelism, but has anyone found a solution to this problem ? Calling gpt in a subprocess could be of course one, but not the most elegant, I would say.

Thanks in advance.

johngan · November 8, 2017, 1:21pm

Hi,

I have attached an example of snappy that performs the steps for interferometry using multithreding. This script was produced by Alaska Satellite Facility. This might be an interest of you.

snappy_topsar_insar_share.py (8.4 KB)

hitzelbe · November 9, 2017, 7:23am

Thanks a lot.

Just tested it, (on python 3.4 with minimal modifications) to be sure. And the problem persists.
Actually, have tried this before - the python multiprocessing and threading modules do not solve that, the jpy calls don’t distribute threads on the CPU cores.

My current solution, before switching to Java, is just to call gpt in a subprocess. This is not optimal for obvious reasons, but significantly faster, and works for my use case.
One could also use a distributed approach such as celery (http://www.celeryproject.org/) and let different nodes run on different images in parallel (actually, this works even locally, tested yesterday), but this causes other trouble, and is not solving the basic problem. A switch to Java would definitely be a better solution then.

Just wondered if someone else has found another possibility within Python, or does not face the problem.

Anyway - thanks for your help.

johngan · November 9, 2017, 10:10am

Calling GPT from python using subprocess is bit slow (I hope you do not use jupyter notebook to run the process. It gets really slow). I wanted to automate the steps for doing InSAR and I realized that using subprocess in python is bit slow.
When I run the same steps using GPT in command line (providing the graph I created) it was faster.
Hence, I was bit disappointed that in python the process is slow. We all know that pyhton is slow unfortunately.

hitzelbe · November 9, 2017, 10:33am

No jupyter, it is pure python here. And I have noticed too that a gpt subprocess is slower than calling gpt from shell. But the difference is acceptable. Using snappy however for bulk pre-processing is just really slow.
I don’t think that we should blame Python here for generally being slower. It is, probably, but the problem seems to come mainly from the fact that snappy/jpy cannot expose the SNAP java library with its full (or at least : nearly) full power.
I reckon that this might be a fundamental conceptual problem of the way C-Python implements threads and extensions. So the guys that have written jpy and snappy had no chance - it is just not possible.
I fear that eventually, I will have to switch to Java for resolving that.

The code will be faster.
I will be slower…

Thanks again!

johngan · November 9, 2017, 11:02am

I just kind of gave up on using snappy for batch processing due to speed issues. I switched to running GPT from shell.
I have not managed to parallelize the processing steps in python and make it faster. Its nightmare.

andretheronsa · April 12, 2019, 7:21am

Is there any update on snappy speed issues?
I have an some code to automate batch processing written fully in python and I need to optimize it for speed. I have servers with cores and RAM available.

florian.beyer · March 9, 2021, 3:06pm

Hello, I am also intrested in this issue.
Was there a chance since November 2017, or is it still the same?

cristianoLopes · March 9, 2021, 5:15pm

This problem is understood and while we would like to find a solution, it is/has not been easy.

The problem is that SNAP is a Java application. Java source code is compiled to what is called java bytecode that gets executed by a Java Virtual Machine (JVM). Python source code is also converted to bytecode but its own version (transparently or not) by the python interpreter (think of it as a Python VM - PVM). In order to use snappy one thus have to run the PVM that needs to start and communicate with a JVM running SNAP code. That’s what jpy is supposed to do. As you can imagine this is not as straightforward as just saying it.
Considering the fact that CPython (the most used PVM) has a global lock at its core (i.e. each instance of the PVM only runs one instruction at a time), while the JVM is fully parallelised this adds the need for very complex logic to avoid problems within the communication.
The end result is something that although working is not really performant.

So assuming that this snappy based processing is sufficiently important (i.e do we need it or is a wrapper around gpt sufficient?) which is the first question! Can we find anyone with enough coding skills in the python world available to help on this? I.e. to find better ways to create this bridge between the JVM and PVM…
SNAP is open-source so contributions are always welcome…

creigatmi · August 30, 2021, 6:52am

The processes and threads are independent sequences of execution, the typical difference is that threads run in a shared memory space, while processes run in separate memory spaces.

A process has a self-contained execution environment. A process generally has a complete, private set of basic run-time resources; in particular, each process has its own memory space.
Threads exist within a process — every process has at least one. Threads share the process’s resources, including memory and open files. This makes for efficient, but potentially problematic, communication.

An example keeping the average person in mind:

On your computer, open Microsoft Word and a web browser. We call these two processes.

In Microsoft Word, you type something and it gets automatically saved. Now, you have observed editing and saving happens in parallel - editing on one thread and saving on the other thread.