GPT performance better on 6-core i7 vs 2x18-core Xeons

I’ve got a lot of S1a scenes which I am processing using the SNAP toolbox v3.0 and the following steps:

  1. Radiometric calibration
  2. Terrain flattening
  3. Terrain correction
  4. Linear to dB conversion

Both TF and TC use a subset of a 200m DEM for the region of interest, and the TC step also reprojects the data to a polar stereographic projection. The results look pretty good from initial investigation.

For the processing scheme, I’ve created a graph XML file using the SNAP gui. I then use some shell scripting to update this XML file for each of the S1a scenes I have in a specific folder. Then I run GPT for each of the XML files, one at a time.

I have tested this on two systems:

  1. Mac OS X with a 6-core Intel i7 processor, 32Gb ram, fast RAID storage.
  2. Ubuntu linux 16.04 with two 18-core Intel Xeon E5 processors (36-cores total), 64GB ram, fast nvme SSD storage.

The Mac is by far quicker, despite having weaker specs. The only advantage the mac has is a faster clock speed.

The problem appears to be with how the CPUs are being utilized. On the Mac, its 12 threads all have high cpu usage. On the linux server, it utilizes 36 of the available 72 threads. Using htop to check the usage, threads 1-36 all have high usage, with very little/no usage in threads 37-72.

Mac htop output:

Linux htop output:

Any idea what’s going on here? I would expect that even if only one CPU is being stressed on the linux machine, that it would still be faster than a 6-core/12-thread machine.

What about disk I/O? Maybe that’s the bottleneck?

I point out also the following post that somehow is connected to this one:

Would be nice if you can do a quick try and confirm this.

Two hypotheses come to my mind:

1 the CPU are waiting for a block to block reading (e.g., small blocks to be read from the disk in a block by block schema. In this case the IO bottleneck is the answer. This is not explaining why why using the GUI is faster, though)
2The blocks to be processed are stored in the ram but they are too small and they are quickly processes (in this case we have an overload bigger than the processing time).

Then, together with the developers, it would be useful to dig into this problem and eventually fix it by understand how to properly setting the gpt.vmoptins.

Thanks

I’ve tested performance on the two systems below. I don’t think I am seeing the same problem noted above. I’m seeing pretty comparable performance in the GUI and command line.

Mac OS X (6-cores):

  • SNAP 3.0 GUI: 3m 56s
  • gpt on XML: 3m 23s

Linux (36-cores):

  • SNAP 3.0 GUI: 8m 54s
  • gpt on XML: 9m 30s

Apparently the LinearTodB operator is not available in the Linux SNAP, only LinearToFromdb (both are available in OSX). This leads me to believe that there’s a real difference between the versions of SNAP for Mac and Linux, despite both being version 3.0.

In terms of disk performance, I think the machines should be comparable. The Linux drive I’m using is a Samsung 950 Pro nvme SSD. It should have like 2000MB/s read/write. I could do some disk benchmarking though…

edit: the Linux drive is far faster than the RAID I’m running on the Mac.

The default tiling will process a tile per processor. It could be that
with so many cores that it’s bottleneck is the access to some common
resource such as all trying to access read different parts of the same file.
Try forcing the parallelization to be the same on both systems. Use the
-q option in gpt.
In the gpt -h help, it says the default is 8. I wonder if with this
default it won’t use anymore that 8 cores.

I forced gpt to use 18 threads on my linux machine and the processing time dropped to 7 minutes. This appears to be the optimum number – anything more/less produces slower results. I don’t know what’s up, but 18 cores should still be faster than 6.

By default, gpt appears to try and use all threads available (12 on my Mac, 72 on the Linux server). So the times in my previous post are “using” all available threads.

Hi Polar,

I am wondering how did you force the gpt to use 18 threads? I tried adding ‘-Dsnap.parallelism=1’ to the script that calls gpt and adding “snap.parallelism = 1” to file ${SNAP_INSTALL_DIR}/etc/snap.properties. However, neither way works. I am on a shared linux server, thus I want to limit the number of threads gpt uses.

Thanks

I was using the gpt command line utility and using the -q flag. In the case above, -q 18.

-q Sets the maximum parallelism used for the computation,
i.e. the maximum number of parallel (native) threads.
The default parallelism is ‘12’.