Make Snappy wait for Java to finish

Hi there,

I have a good bit of Snappy code working, but I’ve noticed that even if I use a queue, I can’t seem to get snappy to wait for the previous Java request to finish before it fires off the next request. So for example I’m trying to do several steps to run an InSAR (ApplyOrbit on 2 files, create a Stack, do Coarse-Fine Coregistration, Resample, etc.)

ApplyOrbit takes quite a while, and seems to still be running when Coregistration is trying to also write to the stack. I used a queue but that doesn’t seem to help as Python just plows through the queue.

Any ideas?

Thanks!
-Chris

Can you show a snippet of your code? How do you invoke the processing requests?

Sure, it’s basically like this (trimmed down to just the relevant parts):

from Queue import *
from thread import *
from time import *

q = Queue()

def applyOrbitFile():
parameters = HashMap()
target1 = GPF.createProduct(“Apply-Orbit-File”, parameters, file1)

def DestinationThread() :
while True :
f, args = q.get()
f(*args)

‘’’ Main ‘’’
start_new_thread( DestinationThread, tuple() )

q.put( (applyOrbitFile, “”), True)
sleep( 1 )


.

But it both puts and pops whatever I put on the queue almost immediately. I also tried this without the queue and it will plow through the calls to Java and not really wait. I can see the debug messages from java get written to the console well after the script has moved past. If I don’t write anything to the filesystem, it will actually exit before completing anything.

Thanks,
Chris

This is the nature of GPF. The computation is only performed if the data is accessed, e.g. written to disk.
The call GPF.createProduct("Apply-Orbit-File", parameters, file1) just initialises the operator (this can sometimes also take some time) but the actual computation is not done.
If you wirte the target1 product to disk the computation will start.
or you access the data.
The following reads the data of a 100x100 rectangle into r1

r1 = numpy.zeros(100 * 100, dtype=numpy.float32)
target1.getBand("MyBand").readPixels(0,0, 100,100, r1)

The computation is performed for the tiles which are affected by the data request.
The second time you request the same data it is taken from the cache if not vanished till then. If it is not any more in the cache it is recomputed.

Thanks @marpet but I still must be doing this wrong. If I add the call to numpy below, it fails with "Attribute Error: ‘NoneType’ object has no attribute ‘readPixels’.

I agree that GPF hasn’t fired, and it works only if i put this after a write to disk, which I really don’t want to do at each step.


q = Queue()

def applyOrbitFile():
parameters = HashMap()
target1 = GPF.createProduct(“Apply-Orbit-File”, parameters, file1)
r1 = numpy.zeros(100 * 100, dtype=numpy.float32)
target1.getBand(“MyBand”).readPixels(0,0, 100,100, r1)

def DestinationThread() :
while True :
f, args = q.get()
f(*args)

‘’’ Main ‘’’
start_new_thread( DestinationThread, tuple() )

q.put( (applyOrbitFile, “”), True)
sleep( 1 )

In the above example you need to replace the band name ‘MyBand’ by a name which exists in target1.
If you want to do multiple computation steps you can do it differently.
One way is to setup a graph xml file which you can execute with GPF. The other way is to pass the result of one step directly to the next one.

applyOrbitParams = HashMap()
applyOrbitProduct = GPF.createProduct("Apply-Orbit-File", applyOrbitParams, file1)
terrainCorrParams = HashMap()
terrainCorrProduct = GPF.createProduct("Terrain-Correction", terrainCorrParams, applyOrbitTarget)

Thanks @marpet, but that’s where I started.

Before I used a queue, I simply had the calls in sequential order, handing off the target Product from one API call as the provided product for the next call. That didn’t seem to work, because GPF never really fired the process. So doing just what you have there will run really quickly and do nothing unfortunately.

I had it like this:
prods = []

‘’’ ApplyOrbitFile
parameters = HashMap()
target1 = GPF.createProduct(“Apply-Orbit-File”, parameters, file1)
prods.append(target1)

target2 = GPF.createProduct(“Apply-Orbit-File”, parameters, file2)
prods.append(target2)

‘’’ CreateStack
targetStack = GPF.createProduct(“CreateStack”, parameters, prods)

‘’’ CoarseFine
parameters = “”
parameters = HashMap()
parameters.put(‘Test GCPs are on land’, True)
targetCoregister = GPF.createProduct(“CoarseFine-Coregistration”, parameters, targetStack)

It plows through all of these GPF calls but none of them actually do anything. There are no real calls to Java (with debug turned on in Java). I really don’t want to use GPF from the command line using a graph, as I’d rather have the flexibility of using Snappy to make choices and fork around if I wanted different things. I’m tempted to ditch snappy and just use Java so I can control it better. Maybe that’s the only viable option other than graph.
-CS

Actually there should not be a big difference how GPF works between Java and Python/Snappy.

To trigger the processing you need to add a write operation at the end.
Either you do

ProductIO.writeProduct(targetStack, targetFile, "BEAM-DIMAP", false)

or you use GPF also for this.

writeParams = HashMap()
writePrams.put("file", "/output/filepath/file.dim")
GPF.createProduct("Write". writeParams, targetStack)

Too funny. I kept getting a Java array out of bounds error on CoarseFineCoregistration, so I assumed that it was trying to run Coregistration on the previous step’s target that hadn’t completed and kept killing it.

I ignored it and let it run and it successfully completed, successfully writing out my spiffy new InSAR files. I could remove the queue and it would work just as well.

Now I’ll go track down that error! Thanks again @marpet!
-CS

Hi, I tried all of the methods in this topic, but it still doesn’t work as I would like.
My situation is like the one ASFStoner had - all GPF commands run very fast, script stops at deleting unused temp libraries but never quits. On the bright side - task manager shows that python is working, also the output file size is growing until reaches some size when it stops incresaing (and file seems ok) but python still works and script doesn’t end.
When I used GPF also for writing instead of productIO like Marpet wrote, script runs through all commands, quits, but doesn’t write anything.
I hope somebody can advise how to turn this never ending script in something nice.
If I process one file, it would be ok (not good, but working) to wait while file size stops growing, but I would like to process a folder of files, so finishing process is important.

Also, as I’m not very experienced with SAR, I’m not sure if it is correct to put all the processing steps (back geocoding, coherence, topsar deburst, multilook, terrain correciton, writing) in one snappy script.

I think I was wrong with the GPF suggestion. What could work is GPF.writeProduct() instead of GPF.createProduct(“Write”…)

But with ProductIO it should work too.
Are you sure that the file is ok? Considering the operators your are listing I think your chain could take a considerable amount of time. Up to 2 hours or more. Did you wait so long.
Also I see that you do terrain correction. This causes to download the DEM (only once) which takes only some time.

Thank you very much. Patience was the the right solution.
It took about 1,5 hours for growing filesize and about an hour more to quit correctly.
About file being ok or not, I don’t know, it looks ok, snowy pattern, higher coherence in swamps and roads, some croplands, terrain correction also looks fine. Processing metadata for DIM product showed all the steps and parameters. Also I can see no difference between the interrupted GeoTiff file and the one whose script reached the end.