GPT Benchmark

Hi Everyone,

I thought it might be useful for all of us to share our computation times as benchmarks for the benefit of SNAP team and other users.

So, using a standard gpt run (i.e. without any options to increase cache/tile size etc) , it takes my computer (16GB RAM, 2.93 GHz, 64 Bit Win 10) about 20-27 mins. to do the following steps on a single 1SDV* GRD dataset:

Apply Restituted Orbital File
GRD Border Fix
Thermal Noise Removal
Calibration (sigma nought, beta nought)
Terrain Flattening (SRTM 3 ArcSec)
Speckle Filtering (Gamma, 3x3)
Terrain Correction (SRTM 3 ArcSec)

How long does it usually take for your datasets to process?

Result of my benchmark:

I’m using snap-desktop! (with GUI) not gpt in terminal (can’t do this for some time).
My operating system is OS X El Capitan 10.11.6.
For test I use IW GRDH (VV-VH) product.

Resources used on my macbook by snap:
~50% / 2.8 GHz Intel Core i7
8.5 / 16 GB 1600 MHz DDR3

Here is graph:

And my result is: 273.98334 minutes (366344 B/s 9158 Pixels/s)

Also, many program are running on my computer (like postgresql client, pycharm, chrome, skype:) ). Can’t do clear experiment at this moment. I will try to get better result next time (with no gui and background apps) .

The time taken looks rather a lot. I have discovered that there a few simple things that can be done to improve the processing time dramatically. No guarantees as to which ones have any real impact but I have been testing all of them. My processing time has now gone to a minute to 5 mins.

  1. Use faster hard drives either SSD or at least 7200 RPM HDD.
  2. Use lots of RAM typically 128GB+
  3. Do not install any library (e.g. S2, S3 etc) other than S1TBX if all you want is to process S1 data. It keeps the java class numbers in memory to minimal and startup of gpf will be quicker.
  4. Use the cache size (-c) and multithread (-q) options if you have lots of RAM and CPU resources.
  5. Use JVM switches to increase heap size, garbage collection in the gpt.options file.
  6. If you have lots of RAM then a single chain of operations in a single xml file is much quicker than breaking the chain into multiple xmls with intermediate files - the latter has too much i/o overhead
  7. Pre-downloading the Orbital files and DEM files and keeping them in the auxdata folder will also save some amount of time.
  8. Use a 64-bit machine.
  9. I am unsure whether this makes an impact but I have replaced the built-int JRE of SNAP with the latest version of the 64-bit JRE client that I downloaded from oracle separately.
5 Likes

Can I use subset operator before others?
For example, I have batch graph:

  • Read
  • Apply Orbit File
  • ThermalNoiseRemoval
  • Calibration
  • Multilook
  • Terrain-Correction
  • LinearToFromdB
  • Subset
  • Write
    So, if I use Subset to make small image before Calibration, Terrain-Correction, etc will it be good ?

Actually there should be no difference, except some of the operators are doing something special.
GPF is only computing the data which is necessary. In this case what is needed to write the result to disk.
But you can try it. It should be not much work to change the order.

Here is some results of my investigation:

  1. Subset is last operator before write:
    Read, Apply Orbit File, ThermalNoiseRemoval, Calibration, Multilook, Terrain-Correction , LinearToFromdB , Subset, Write
  2. Subset is first operator after Read:
    Read, Subset, Apply Orbit File, ThermalNoiseRemoval, Calibration, Multilook, Terrain-Correction , LinearToFromdB, Write

    But I got something interesting. Look at result images:
    This is for first graph:

    And this is for second graph.

    Here is visual image from sentinel-2 for better understanding images.

thanks for the comparison. The one with the subset at the beginning reduces the data size early thus reaving less data to process for the remaining steps.

Could the colors result from different minimum/maximum values in the color manipulation tag?

You need to make sure to do the comparison on a “fresh” instance of SNAP so that it is certain that none of the data is residing in caches (skewing the results).

Your processing-speed seems rather slow based on the numbers…

It does not really work that way - SNAP has a “pull-processing architecture” so the writer at the end starts asking for the data from the preceding operators. The resident developer wizards like @marpet & @lveci can explain how this actually works and how the order of operators might or might not affect performance.