Gpt and Snap performance parameters - exhaustive manual needed!

Dear SNAP community,

as I couldn’t find a single exhaustive documentation place about SNAP/gpt performance settings I’m posting it here. I’d be very grateful if the developers or some power user could answer. Maybe this also helps to improve SNAP documentation.

In general there are two ways to set the gpt parameters:
a) settings in gpt.vmoptions file
b) flags when calling the gpt program, like: ‘gpt -e -c 40G -q 12 -J-Xms2G -J-Xmx50G’
Right? While the flags will always override the settings in the gpt.vmoptions file?

The meaning of the parameters:
-Xms - the initially allocated memory when starting a SNAP/gpt instance, quite straightforward to understand, no question.
-Xmx - maximum available memory for the SNAP/gpt instance, again quite straightforward to understand, no question.
“-q Sets the maximum parallelism used for the computation, i.e. the maximum number of parallel (native) threads. The default parallelism is ‘12’.” Again straightforward to understand.

But…
“-c Sets the tile cache size in bytes. Value can be suffixed with ‘K’, ‘M’ and ‘G’. Must be less than maximum available heap space. If equal to or less than zero, tile caching will be completely disabled. The default tile cache size is ‘1,024M’.”
This is already a bit tricky. What is the recommended option, how it is meant to be used respect to the available system memory? As close as possible to the -Xmx setting or something different? Can someone explain the tile caching mechanism in SNAP (how it is implemented), if possible then with drawings?

“-x Clears the internal tile cache after writing a complete row of tiles to the target product file. This option may be useful if you run into memory problems.”
Is it then safe to keep the -x option always switched on? Or it can have some negative effects?

“-XX:+AggressiveOpts” <- What does setting this parameter actually do?
Is there something else the user needs to know about setting the SNAP performance parameters?

Kaupo

P.S. Using SNAP 5 on Linux with 64 GB of RAM for Sentinel-1 IW SLC processing, but I’m more interested in general principles than just the parameters that would work for me. I’d bet the whole community would benefit from improved documentation.

5 Likes

I would be very interested in answers to these questions as well!

Many of the parameters are for the Java Virtual Machine. There are many web sites devoted to “Java Performance Tuning”, but the official site is a good place to start. Much of this information was published years ago so doesn’t cover the complexities of current multi-core NUMA platforms. There have been improvements in the garbage collection (GC), so there is more recent guide devoted to GC tuning

SNAP users with performance concerns will find it worth spending a few hours viewing this introductory tutorial helpful and then
talk by author of Java Performance Tuning web sit.

Large organizations devote many person-years to tuning performance of critical applications where a gain in performance means they can save big money on the hardware. There are Java Performance Tuning with Mission Control and Flight Recorder and the like courses for people doing this sort of work.

Applications like SNAP are used in many different ways so it is impossible to have simple recipes that work for everyone. I suspect the vast majority of Java applications are used for databases and transaction processing workloads, so much of the advice you can find on the internet may not apply to SNAP workloads.

SNAP users who have concerns over performance should start by learning a bit about how Java developers approach performance and how to use the instrumentation available from their OS to provide objective measurements and identify the bottlenecks that are responsible for performance problems. SNAP is a very complex system and is sure to have some bugs (memory leaks) so you want to be able to properly identify and report bugs to the developers. javaperformancetuning.com collects performance tweaks for Java, but to use it you need to know much more than just “my calculations take too long”.

Note that Java 8 won;t get updates after Jan. 2019, and Java 9 has many improvements in performance monitoring.

4 Likes

Thanks, gnwiii. This is definitely very useful information about Java performance tuning. Still I asked some very specific questions about SNAP and I didn’t get the answers:

  1. Is setting the flags in calling the gpt overriding the contents of gpt.vmoptions file ?
  2. The -c flag. What is the recommended option for tile cache size? How it is meant to be used respect to the available system memory? As close as possible to the -Xmx setting or something different?
  3. Can someone explain the tile caching mechanism in SNAP (how it is implemented), if possible then with drawings?
  4. Is it safe to keep the -x option always switched on? Or it can have some negative effects?
  5. What does setting the “-XX:+AggressiveOpts” parameter actually do?

If someone could answer those questions I would be very grateful.

Kaupo

1 Like

First, your questions are useful because many others will be interested in the answers.

Many of your questions (the possible exception being #3) are in the category of “excercises left for the reader” or the parable “give a person a fish and that person is fed for a day, but teach a person to fish and that person has food for life”. In other words, the answers are less useful than knowing how to work then out for yourself.

Some insight into the gtp command can be gained by just looking at the gpt script. Here are some examples for a Unix (linux or macOS) installation:

$ file gpt
gpt: POSIX shell script, ASCII text executable, with very long lines
$ wc -l gpt
441 gpt

The best documentation is usually the source code, but on Windows, we only have gpt.exe. If you are on Windows, look in the snap-installer unix bin directory. At this stage, it is useful to know a bit of POSIX shell scripting language. In practice, the majority of Unix systems use the bash implementation. There are many introductory tutorials for bash (including languages other than English). I recommend Linux Command. Windows 10 users can activate Windows Subsystem for Linux (WSL) and install any of the available (and free!) Linux distros from the “Store” to get bash in Windows 10.

Once you feel comfortable reading simple shell scripts, you can examine the rather complicated gpt script. You should discover a shell function called read_vmoptions and also the following “very long line”:

$INSTALL4J_JAVA_PREFIX exec "$app_java_home/bin/java" [...]
  $INSTALL4J_ADD_VM_PARAMS [...] "$@"

If you are bit familiar with shell scripts, you can guess the answer to the first question, and devise experiments to check that your guess is correct. In practice, gpt has become very chatty, so it won’t hurt to have it print a bit of extra detail to record values of options you are tweaking.

Question 2. I suspect the appropriate choice for <cache size> depends on your workload and hardware environment, so the answer requires using JRE and host performance monitoring tools and experimentation.

Question 4. If you watched the tutorials, you should have some insights into how Java manages memory. It likes to have ample memory, so for those of us who don’t have ample budgets, I suspect this option gives Java a bit of help for cases where the “proper” fix might be to purchase more RAM. There are many other potential causes for “Memory problems”, so you should also consider Java performance monitoring tools.

Question 5. There is an explanation of what it is meant to do in various online sources, but also have a look at Java Bug Database to find what it has actually done in the past (I just check and found 22 bugs for AggressiveOpts). Many compilers have similar options. Such options give performance improvements, but can also introduce (usually minor) changes in the results that make it harder to compare output on different hardware. One of the key applications I use has consistently given bit-for-bit identical results across Windows and Unix platforms for a test data set over a period approcahing 10 years. The result differ slightly when using aggressive optimizations and the performance gains small.

2 Likes

Hi @kaupovoormansik and thanks @gnwiii for the exhaustive explanations.

Some more comments from me (one of the developers of SNAP).

The options intended to be used for Java VM should go into vmoptions file. For example the Xms and Xmx options can’t be set on the command line. The specific gpt options like c and q must be given on the command line and can not be specified the vmoptions file.
But there are other ways to set tile cache size and the parallelisation.
In the etc folder of the SNAP installation directory you find the snap.properties file. Inside you finde sever properties to configure the behaviour of SNAP. The properties can also be specified in the <userdir>/.snap/etc/snap.properties files. The properties int the userdir override the system properties in the installation directory. The same properties can also be specified on the command line when calling gpt, e.g. -DpropertyName=value.
A list of properties can be found in our wiki.

From my experience you can increase the cache size to a value of 75% of the max memory (Xmx) value. Depending on the actual processing the value might vary. In some cases it can be increased up to 90% and sometime a value of 60% is already the limit.

The caching mechanism is pretty simple. If the cache size reaches the limit, the oldest tiles are removed. I think, till a fill level of 70% is reached (not sure about this). That’s roughly how it works. The implementation is not ours. We are using JAI.

If you have enough memory to keep a whole line of tiles it will not have negative effects. But this might change from product type to product type. Some are bigger than others.

This only influence how the VM will try to optimise the compiler output. Several parameters of this type are described on this page. But I usually don’t care much about these very special settings.

1 Like

Hi,

I am trying to generate a DEM using SLC but I am facing some performance issues with gpt before getting a result. So far, I am trying to run this graph (in GPT) for the second sub-swath in one of the polarizatiosn

I have been reading on the forum about the different ways to set the configuration of GPT properly but still facing issues. I am using a Linux machine with 15 GB RAM

Here is my gpt.vmoptions

I have computed the VM parameters on the snap-conf-optimiser and paste them in the gpt.vmoptions to update the settings of GPT. The process starts but at some point it runs out of memory

Any idea on how to proceed?
M

Is this still an issue?
To which value have you set the tile cache? (the c option of gpt)
Could be that 15GB is not sufficient for this type of processing.
You could try to split your graph into three. 2 for applying the orbit files and the third for doing the remaining part.

We need to review and optimise the memory-usage and performance of S1TBX operators as they seem to be causing the most problems.

2 Likes

When setting the snap.jai.tileCacheSize in snap.properties,

gpt --diag shows 0 B for most of the settings. I’ve tried multiply different combinations…

eg. Setting snap.jai.tileCacheSize=20000 (which would be 20GB) shows:

image

Is that just a bug from GPT displaying the environment variables and the processing is done using 20GB cache size, or is the tileCacheSize disabled by error?

(Same thing happens when using SNAP GUI Performance tuning for setting the variables.)

When trying different values sometimes, gpt --diag shows eg. 1.5GB for snap.jai.tileCacheSize = 22000. I can’t see any pattern in the way the tileCacheSize is set.

Unfortunately, you are right, this is a bug.
Due to a numerical overflow, the cache size is set to zero when using values higher than 2000. Due to multiple overflows for a value of 22000 you got a cache of 1.5GB. This will be fixed with a module update in the coming weeks.

You can work around this by using the -c option from the command line. This way the cache is correctly set.

1 Like

Thanks good to know ;).

Hi @marpet,

Thanks for the information about setting the Java parameters.

You mention that -J-Xmx16G will not overwrite the -Xmx8G form the gpt.vmoptions file. Is this really impossible ? My problem is that I’ve installed snap on a cluster with 2 types of machines (one with a larger RAM than the other). So I would like to launch on each machine with the optimal parameters, but I don’t see how if the -J-Xmx is not taken into account. Is it maybe possible to specify the gpt.vmoptions file to use with my gpt depending on the machine? Or should I try to install snap twice (but it has a single front end)?

Hi Marco,

Would you please to give your opinion, I’m a bit confused, the machine was 16 GB RAM and 1 TB HHD, the creating coherence graph of two SLC images with multilook it took a few seconds to one or two minutes,

Now the machine 32 GB RAM and 2 TB SSD,

The same process takes more than 10 minutes, The SNAP is reinstalled and this is the snap.properties from snap.

and the following is the SNAP properties from SNAP\etc


and this is the gpt.vmoptions,

You see me puzzeld, too.

One thing I recently noticed is setting the properties

snap.dataio.reader.tileWidth

and

snap.dataio.reader.tileHeight

can slow down the processing.

Delete them in the snap.properties and try again.

Hi Marco,

Thanks a lot for this very precise technical note, the difference of time to the same graph below, is, (19 minutes) after deleting those two lines, and (29 minutes) within two lines.

But, is this duration of time (19 min.) is logical with 32 GB RAM and 2 TB SSD? (using gpt)

I think in your case, it will be useful to increase the snap.jai.tilecache perhaps you could try 12000 or 15000 (if you are using gpt, then you can use the -c parameter).

1 Like

Okay, that’s good. But you say processing finished within 2 minutes.
This means it is still much slower.
Are you sure about the timings for your previous memory configuration?
Somehow I doubt that this graph would run such fast, from looking at what users reported.

The 19 minutes seem to be reasonable to me considering the processing graph. It should take almost the same time with the configurations you have mentioned.

And increasing the tile cache might help, as Omar suggested.

1 Like

Now I really confused with time, but most probably were a few minutes.

Hi Marco and Omar @obarrilero

Using gpt,

After deleting those two lines, and increase the cachesize to (15000) the same graph duration time is only (03 min.) .

Tremendous thanks.

But Omar, I’m not quite sure if got properly this point, would you please to clarify it!

1 Like