GPT batch processing abnormality

Hello,

I’ve got a question regarding processing using GPT.

I’m planning to process some 560 S1 scenes for flood detection purposes. I’ve figured the easiest way would be applying a pre-built graph with the processing operators on the S1 scenes.

I’ve already successfully managed to process multiple scenes with the following graph:

Soon I realized that the resulting data-volume will be too much to handle, so I decided to add the Convert-Datatype operator in order to convert the processed scenes to Int16.

The graph looks as follows:

So that’s where things get interesting for me. I’m working on a Win7 64bit machine with 128 GB Ram and 6 Cores @ 3.50 GHz.

Processing the 1st graph for a small (~450 MB) S1 scene takes about 1 Minute and 10 seconds. When I manually apply the Convert-Datatype operator to the processed result, it takes another 15 secs or so to be done.

However, when I’m applying the second graph which should do everything in one single step, the processing takes 7 minutes and more.

I read the threads here about GPT performance and java heap size etc. (especially Gpt performance) and I’ve consequently tried everything to modify the parameters correctly.

Additionally I’ve noticed another thing while processing the two graphs:

For the first graph, the processing seems to run using the full amount of available power, as all the threads are maxed at 100%.

For the second graph (the one that includes Convert-Datatype), the CPU Usage looks more erratic and doesn’t really show full load.

I’m grateful for any hints on what could be going wrong or how I could streamline the processing. Also, I’m happy to hear any feedback and enhancements on my processing graph for flood detection.

Many thanks,

Val

1 Like

Val, you graph looks good. I’ve had a quick look into the covertDataType operator. It uses the products statistics to determine the min and max value which could be triggering the processing of the whole graph in a single thread. The same thing happens with the mosaic operator with normalization turned on. This can be a problem because the overall product min/max values are needed but are unknown without processing the other steps in the graph.
For now you will be better off to do the convertDataType as a separate graph.

Thanks @lveci for the answer and sorry for the delayed reply.

I already thought about this solution, but I intended to avoid too many I/O operations. But if that’s the only way, so be it.

Before I go back to the Convert-Datatype operation I’d be curious about something concerning gpt itself. I’ve read in multiple threads that people have troubles properly setting up gpt (heap space, etc). I guess it’s not an easy task to do if people never worked with it or have generally little coding experience.
Anyways, I thought I have figured it out and set everything properly. On the one side I’ve run the configuration optimizer both in and outside of snap and it seems like it properly set the parameters;

Additionally I modified the gpt.vmoptions file in /snap/bin accordingly.

It seems that none of these steps worked and I ended up calling gpt with the -c and -q flags. This is obviously not an optimal solution. So I’m wondering if that’s a problem on my end or if it’s a known issue, which is already being fixed.

Considering the Convert-Datatype operation:

I tried two ways to see if there’s any difference in performance:

  • Directly call gpt Covert-Datatype
  • Create a graph which reads converts and writes the file and call it with gpt

Only the first option actually worked for me as I received this error message for the second option:

I’m not sure what it means or what’s the cause, but I thought I’ll include it here.

So thanks again for the help and sorry for the load of screenshots.

Best,

Val