Outsource preprocessing

fkessler · May 4, 2020, 8:56am

Hey Guys,
I am currently re-doing an earlier project including preprocessing of a lot of S1-Scenes. As this is taking quite a while for each scene, does some of you use cloud structures or something like this to preprocess the data?

My preprocessing consists of:

Apply Orbit-File
Thermal Noise Removal
Remove GRD Border noise
Calibration
Speckle-Filter
Band-Maths
TC -> 10m Resolution

Additional Question: What is limiting/influencing the time taken the most? Processors, Graphic card or RAM?

mengdahl · May 4, 2020, 9:04am

RAM and hard drive speed are the most limiting factors. SSDs are highly recommended.

fkessler · May 4, 2020, 9:48am

I am using a SSD and have 16 GB RAM right now. It still takes about 20 Minutes per scene using the gpt. Can i increase the RAM used for SNAP? Is there a maximum %-Part of my RAM I should put in this when using my Computer only for this processing at the time?

ABraun · May 4, 2020, 10:27am

some recommend 70% (here), others report best speed with 50% (here)
It’s worth to try and compare different settings

fkessler · May 4, 2020, 10:29am

Thanks for the answer, I am currently changing it using the Config-Optimiser and the Compute button there. Is this influencing all setting used in the gpt as well? Like Java-Cache etc.?

gnwiii · May 4, 2020, 11:54am

Blockquote
What is limiting/influencing the time taken the most? Processors, Graphic card or RAM?

None of the above (assuming you aren’t so tight on RAM that your system is swapping).

Most remote sensing processing is dominated by I/O. This is quite different from most numerical modelling efforts, and needs a specialized configuration. In particular, parallel processing systems need storage on each node, as opposed to sharing one storage unit across nodes. I/O should make use of solid-state disks and avoid reading and writing to the same rotating disk. Some workflows use temporary files which should be put on separate fast temporary filesystem in RAM or solid state storage media.

Have a look at calvalus and calvalus2.

fkessler · May 4, 2020, 12:54pm

Thanks for your reply! I will check out calvalus during the next days

funkenbeachin · May 4, 2020, 7:04pm

I found, from experience, rather than full understanding of machine architecture, that SNAP 7 does well using default resource options. 32G of RAM is worth the investment. SNAP makes use of the GPU that appeals to game users. Clipping to the region of interest saves processing time.

fkessler · May 5, 2020, 8:27am

I am interested in whole scenes, thus I can not subset it. I am using a gaming laptop, is there a way to use my Nvidia GForce GTX1050?

gnwiii · May 5, 2020, 2:53pm

The Java JVM used by SNAP can use a GPU for certain operations, but a) it is most effective for computations that do lots of maths on small data, and b) support for recent GPU models may be limited. Some Java users report that upgrading the GPU actually reduced performance.

There are interesting projects such as Tornado VM that are effective for some types of calculations, but for SNAP there may not be many operations that could benefit and then only for users with specific hardware configurations.

@fkessler – For now, you can monitor GPU temperature – if the GPU is being used temperatures will increase. gpt uses the gpt.vmoptions file.

funkenbeachin · May 5, 2020, 7:09pm

I did nothing to enable the Nvidia GForce GTXxxx processor. I found it was in use by the Win Task Manager > Performance > GPU monitor. The gaming machines seem to be well suited to graphics-intensive GIS applications.

gnwiii · May 5, 2020, 10:12pm

The market doesn’t seem to be a big enough vendors to design machines specific to remote sensing. Gaming machines are designed to deliver high frame rates using relative small data. You can also buy machines optimized for video editing which put more emphasis on large data and I/O performance, or machines optimized for numerical modelling.

fkessler · May 6, 2020, 11:55am

Thanks for the response! Can you tell me what my gpt.vmoptions would/should look like if I want to use the GPU?

Right now it looks like this:

< # Enter one VM parameter per line
< # For example, to adjust the maximum memory usage to 512 MB, uncomment the following line:
< # -Xmx512m
< # To include another file, uncomment the following line:
< # -include-options [path to other .vmoption file]
< -Xmx11G

gnwiii · May 6, 2020, 9:20pm

Some background:

Historical overview of Java GPU support

Oracle seems to have put their JDK documentation behind the paywall, but IBM docs are available. I have used the IBM JDK on linux to run the SeaDAS 7 (BEAM GUI), but snappy had a problem with it.

Using GPU’s to achieve massive parallelism in Java 8

IBM SDK, Java Technology Edition, Version 8: Windows User Guide says:

If you have set the -Xjit:enableGPU option, the JIT uses performance heuristics to determine which workloads to send to the GPU for processing.

Related concepts: “Graphics processing unit (GPU)” on page 99
You can improve the performance of your Java applications by offloading certain processing functions from your processor (CPU) to a graphics processing unit (GPU); specific hardware and software requirements apply.

“How the JIT compiler uses a GPU” on page 100
You can enable the Just-In-Time (JIT) compiler to offload certain processing tasks to a general-purpose graphics processing unit (GPU). The JIT determines when to offload these tasks based on performance heuristics.

Related reference: “Writing Java applications that use a graphics processing unit” on page 128

The CUDA4J application programming interface (API) contains many classes for managing operations between the CPU and the graphics processing unit (GPU). The com.ibm.gpu API can be used to sort arrays of primitive types (int, long, float, double) on the GPU instead of the CPU.

It is very possible that other observations of GPU usage were driven by one component such as the WorldWinds module, so even if SNAP uses the GPU it may not be helpful for GPF processing.

List item