externalDEMNoDataValue can greatly change performance

I originally noted this effect in a post about SNAP/S1TBX v8 performance.

Different values of externalDEMNoDataValue affect the performance of both v7 and v8. It is very strange that the DEM no data value can affect the performance so much. I have run many tests to convince myself!

This is the graph, and I am setting the externalDEMNoDataValue the same for both Terrain-Correction and Terrain-Flattening.

v8
externalDEMNoDataValue =  0.0      real 12 min, 350 min CPU
externalDEMNoDataValue = -9999.0,  real 27 min, 550 min CPU
externalDEMNoDataValue = -32768,   real 37 min, 532 min CPU

v7
externalDEMNoDataValue =  0.0      real 2.5 min, 36 min CPU
externalDEMNoDataValue = -9999.0,  real 3 min, 47 min CPU
externalDEMNoDataValue = -32768,   real 6 min, 83 min CPU

You can see the great difference in performance between v7 and v8 here, but that is not the subject of this post. The intent here is to just show the relative difference between different externalDEMNoDataValue, and that both v7 and v8 are affected.
And the other oddity: v8 with -9999 does use a little more CPU than -32768 on the couple tests.

This should be checked @lveci @jun_lu @marpet .

are you using the same dem file? or you have 3 different dem files?

Good question. I am using the same external DEM for all tests, which does not contain any fill: ie, no 0, -9999, or -32768. That was a point that was not clear in my post. Thank you @traktor.

Thank you for reporting the problem. A ticket ([SITBX-894] externalDEMNoDataValue can greatly change performance - JIRA) has been created to track the problem. We will look into it

1 Like

IMH the 0.0 value is not suitable since it is reflection of the real situation on ground and here the question should be is what ‘no-data’ value is most appropriate to be used in SNAP.

The no-data value is constantly causing some problems since years. I’ve been wondering what technological solution would solve it once and for all. A binary mask instead of a value perhaps?

I suggest the “most appropriate” no-data values are the standards: -9999, and -32768. Both have been used on standard (int16) DEM products. Although now that some DEMs are not integer, that may not be appropriate?

Why do we need a DEM no-data value? Most DEMs are complete and are void-filled.

The DEM no-data value just needs to be other than 0.0 since that is a valid elevation. We do not want a performance hit when using other than 0.0

Adding TileCache to the version 8 graph improves performance and is now comparable to v7 using different values for externalDEMNoDataValue. However, this thread is about how different values for externalDEMNoDataValue affect the performance, which is still true here.

I used the graph below so performance results directly compare those previous in this thread here. Note: Operationally, ThermalNoiseRemoval should be included before Calibration.

v8 
externalDEMNoDataValue =  0.0      real 2.3 min, 36 min CPU
externalDEMNoDataValue = -9999.0,  real 3 min, 50 min CPU
externalDEMNoDataValue = -32768,   real 5 min, 79 min CPU

Above is with nodata undefined in the TIFF file, which is the ~same as with different nodata values defined in the DEM TIFF file used.
TileCache_v8.xml (6.7 KB)

2 Likes

I wonder if these durations are constant over several runs or if they are subject to chance. Sorry in case you had already written that but when I tested and compared the computation times in my studies the observed durations were not always reproducible.

I run on a quiescent Linux server, and repeat several times. This should avoid any variation caused by OS caching of the files or other variables. The values shown are an average of several executions, which generally vary by only a few percent. I use /dev/shm for SNAP cache since that is real memory, and start by testing different sizes for Java Xmx, but have found that 16 GB is about the sweat spot in performance for my graphs and image sizes. The execution (real) and CPU time will vary depending on CPU type/clock and file I/O, but is directly comparable here since I am using the same CentOS 7 machine with 768 GB memory for all my tests:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                72
On-line CPU(s) list:   0-71
Thread(s) per core:    2
Core(s) per socket:    18
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
Stepping:              4
CPU MHz:               3000.000
BogoMIPS:              6000.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
1 Like