Trouble with gpt multithreading

ASFStoner · January 26, 2017, 11:56pm

I’m using the -q parameter with a graph that does oil spill detection, based on this thread: Oil Spill Graph - Detection/Cluster Output Looks Wrong

I’ve got an 8 core machine, so I used -q with 64 threads to maximize this box, as it’s all mine, nothing else is running here. When I run this command:

./gpt myGraph.xml -q 64

… it kicks off and nicely spreads the workload across all 8 cores. After a few hours, it goes back to 1 CPU and pegs that 1 CPU for 3 to 4 days before completing. Kinda ruins my day

Is this a bug or do I need to set another param so my parallelism is kept through the graph process?

Thanks,
Chris

lveci · January 27, 2017, 12:39am

It’s hard to tell what could be going on without debugging the same scenario. The gpt should divide up the tiles from the writer to the number of cores available. After that, the graph pulls the data from operator to operator as needed. If an operator does something like call statistics on the whole image then the parallelism get ruined as one thread tries to do too much work.
This may not be happening here since it appears to be ok for a while and then not. Maybe the VM goes into garbage collection hell.
How many writers are there? Is the memory getting maxed out when it goes to 1 core?

ASFStoner · February 10, 2017, 8:14pm

I tried this again using a rather big (and expensive) compute node in AWS to run this. It was a c4.8xlarge, a compute optimized node with 16 cores, 30 GB of memory, and ran it like this:

./gpt myGraph.xml -q 128

… and got similar results. I’m not sure when it goes into a single core, but eventually it does (after many hours) and never comes back to fully parallelized. If I add -c parameter, I almost immediately get a java heap space error so I’ve left that alone. But it never really seems to finish what it’s doing. I kick it off using screen, since it runs so long it drops my connection to the remote host after a few hours. And when I reattach to screen , it’s still processing. I check the output file and a full day later it has made no changes, but the file is junk.

Here’s my graph, roughly following @con20or’s post here about Oil Spill:

read > calibration (using VV only) > land-sea-mask (sigma0_vv) > oil spill detection (sigma0_vv) > oil spill cluster > write

I’m using the smaller window size of 61, threshold of 2.0, and small cluster size of 0.1 just to get it to run.

Any suggestions appreciated!
-CS

Jonathan · April 9, 2020, 11:23am

Hi,
i am running into the same problems. Did you manage to identify the error?
(I know its 3 years ago )