Poor v8.0 Performance

my personal impression is that a lot of tasks (importing and displaying images, calculation of textures) are now a lot faster with version 8. But I haven’t compared GCP tasks yet.

Thank you Luis. It is not ideal to process full image intermediate products when only a subset is required. We do not necessarily know which subswaths contain the subscene, but we will look at calculating which one or two are required for each subscene. You have previously mentioned splitting up terrain flattening and terrain correction on my graphs. I will test.
It is unfortunate that we were required to switch our production server back to 7.0.2. It is producing ~3x more products than with 8.0.0.

Please report how much improvement you get from splitting the graph in two and using TOPS-split on SNAP 7.x and then if possible do the same test on 8.0.

Removing terrain flattening from the graph resulted in faster processing time in v8.0.0, with slightly more CPU utilization. Minutes shown:

v8.0.0: 1.5 real, 10.5 user, 3.2 sys. Peak CPU utilization ~3000%
v7.0.2: 1.7 real, 8.3 user, 3.5 sys. Peak CPU utilization ~1000%

1 Like

The subscene used for the first test was 95% on subswath IW3. So I ran the same original graph, adding TOPSAR-Split of IW3, and measured these times:

v8.0.0: 16.2 real, 184.2 user, 25 sys. Peak CPU utilization ~1700%
v7.0.2: 7.1 real, 89.8 user, 7.1 sys. Peak CPU utilization ~1700%

I started testing the graph split in half, but for some reason 7.0.2 never completed the Terrain-Flattening half. I will post additional results soon.

Same date GRDH to terrain corrected Gamma0 requires more time/resources on SNAP v8.0.0

v8.0.0: 13 real, 172 user, 14 sys.
v7.0.2: 7 real, 46 user, 7.4 sys

Putting terrain flattening and terain correction in the same graph is always going to be suboptimal as both deal with the DEM in a different manner, which creates a bottleneck.

Processing SLC IW3 through Terrain-Flattening only:

v8.0.0: 25 real, 1107 user, 157 sys. Peak CPU utilization ~6800%
v7.0.2: 23 real, 419 user, 30 sys. Peak CPU utilization ~3000%

@mengdahl. Performance summary :
TC without TF: v8 is slightly faster, but uses more CPU time than v7.0.2.
Process through TF only, both versions have about the same processing time, but v8 used ~200% more CPU time (above).
Adding TOPSAR-Split to the original graph, saved ~40% cpu time, but resulted in about the same processing time relative to without split. However, v8 is still twice the time and resources.
GRDH processing takes about the same time as SLC, but SLC uses >2x the CPU resources. And v8 >2x v7.
The biggest cost is TF at >10x time and >30x CPU relative to graphs without. v8 doubling of TF processing time/cost is a hit.
@lveci My results do not appear to point to any benefit in splitting up my graph: It might end up costing more. Smaller graphs could be very important on machines with smaller configurations, but with 72 cores, and 768 GB memory we end up running multiple gpt instances.

If I’m not mistaken more CPU utilization is better since it implies that less
time is wasted waiting for I/O. BTW I have trouble understanding your numbers - what is the total processing time aka. wall time?

To clarify for all: real, user, and sys, I reported for this thread are from the Linux/bash time command, which cumulates those resources during execution. For this thread, I rounded all the numbers to minutes.
real is wall clock. user is user CPU time, and sys is system CPU time (spent relative to the process).

I am running “time gpt my.xml” from bash, CentOS 7, command line, using the same Sentinel-1 image, vmoptions, etc. for both SNAP 7.0.2 and 8.0.0. (VH disappeared in 7.0.3, which is why I am using 7.0.2)

More CPU utilization might be better if the execution (wall) time was proportionally shorter. My results show greater execution (wall) time, as well as CPU time, indicating poorer performance for v8.0.0.

It appears the difference is primarily from TF in the two versions. The culprit could be anti-meridian, polar, or other support if added for 8.0.0.

Hello,

I also notice that the version 8.0 is significantly slower than version 7.0. I try interferogram use same data with all same parameter both in v8 and v7 at same workstation. I use linux “time” to estimate processing time, as the result, v8 use about 79 min to finish interferogram processing, while v7 only use 7 min.

Also, I notice that v8 split write step as a isolated operator while use single operator (like use “gpt Interferogram -t target.dim”) in gpt, I found that the default value of parameter “writeEntireTileRows” in write operator is “false”, it has set as true in v7. I don’t know the connection between the write parameter and processing time, but no doubt the SNAP v8 does a terrible job in efficiency.

Figure below is timing result of version 8.0:

Figure below is timing result of version 7.0:

Your screen dumps are almost unreadable on a laptop screen. It would be much better to cut and paste as text, or attach more complete files. With java code the first thing to check is differences in memory settings. Have you checked the impact of using the same “writeEntireTileRows” setting?

Finally, it is worth noting that SNAP 8 uses the OpenJDK runtime, a result of Oracle’s recent license changes. The OpenJDK effort has put priority on correctness over performance. I’m not sure if they are interested in reports of performance regressions at this stage. If your organization has a paid Java license you might be able to try Oracle Java. There are also high quality Java JDK’s from Redhat and others.

Interesting that v7 is 11x faster, with 1/2 the CPU utilization.

@gnwiii How would you propose switching SNAP to use another java version on linux to test performance difference? Maybe SNAP should not include OpenJDK, so users can select one optimized for their systems?

SNAP 8.0 is distributed with: openjdk version “1.8.0_242”
SNAP 7.0 with: java version “1.8.0_202”

I have 4 other versions on my CentOS 7:
java version “1.8.0_121”
openjdk version “1.8.0_275”
java version “14.0.2” 2020-07-14
openjdk version “15.0.1” 2020-10-20

But none of these other Java installations have jre. So I downloaded jre1.8.0_281 from Oracle and replaced SNAP 8.0 jre. I will post results comparison soon.
mv /opt/snap_8/jre /opt/snap_8/jre_dist
ln -s /opt/jre1.8.0_281/ /opt/snap_8/jre

You should also publish your graph if possible so we could investigate. On our set of test graphs 8.0 is on average significantly faster than 7.0 (~20%). Performance of most operators was improved but some degraded significantly.

@mengdahl I just reread your post on DEM bottleneck. I use a larger DEM for TF to avoid black holes caused by topography and DEM edges, and a smaller DEM for TC. So bottleneck might not apply.

You method for changing the JRE should work. As well, many Java applications recognize environment variables (JAVA_HOME or JRE_HOME). You may want to look for a way to add -showversion to the Java command-line so you get a record of the Java version.

There are alternate garbage collection algorithms such as Shenandoah. RHEL 7.4+ ships with OpenJDK 8+ that includes Shenandoah as a Technology Preview. Use to -Xlog:gc+stats examine differences in gc across different JRE’s.

1 Like

Long graphs generate more overheads and are usually less efficient, at least according to the studies I’ve seen. Of course, YMMV, one needs to find a good compromise-lenght for one’s particular application & system.

My system was not idle, so the small differences running SNAP 8.0 with different Java versions are probably insignificant. ie: Open or Oracle Java version did not change SNAP 8.0 performance. I was able to verify the Java version that was executing. SNAP 7.0 still the fastest. I quickly executed these tests on CentOS 7 with an all in one Sentinel-1 GRDH to Gamma0 graph:

SNAP 8.0.2	               minutes:  real, user, sys
openjdk (SNAP 8.0) 1.8.0_242		 10.0, 77.4, 14.5
Java (Oracle download) 1.8.0_281	 10.2, 79.5, 13.0
Java (SNAP 7.0) 1.8.0_202		      9.6, 78.5, 13.3
openjdk 15.0.1				         10.3, 77.5, 10.8

SNAP 7.0.2: Significantly faster, and less than 1/2 the CPU resources.
Java (SNAP 7.0) 1.8.0_202		      7.7, 34.3, 7.7

The mv with ln method of switching Java versions works, but passing jdkhome to snap is probably better. However, gpt only uses JDK_HOME when …/snap/jre does not exist.

snap --jdhhome /usr/java/jdk-15.0.1
mv /opt/snap/jre /opt/snap/jre_dist ; export JDK_HOME=/usr/java/jdk-15.0.1 ; gpt ......

Use of JDK without JRE, like jdk-15 above, executed with a couple severe, and warnings. All 8.0.2 Gamma0 products were identical.

I have not yet, because I use a operator not graph.xml to execute gpt, I need to build a graph include write operator so that I can test the impact. I will do that later.

I use “gpt Interferogram -t target.dim” instead of “gpt graph.xml …” to process data. I don’t know if the graph will bring faster performance, in my experience, step by step single operator always faster than graph, that why I choose operator not graph.