Batch Colocation of S3 SYN files

Frankie91 · April 6, 2022, 8:27am

I’m trying to understand if it is possible to subset and colocate a large number of S3 SYN files in batch mode, writing the output of each subsetting-colocation process to a single netcdf file.

Right now I’m doing the following:

I create the reference master product by spatially subsetting from view the first S3 SYN to my region of interest. This subset is then reprojected to a UTM/WGS 84 regular 300 m grid.
A simple standard colocation graph is then used to subset and colocate each S3 SYN file to the same region and grid. This works quite well to generate single identical netcdf files which contain the same region on the same grid, which can be then very quickly all opened and manipulated at once in python for the work I need to do.

But the graph takes between 2 and 3 minutes to run on my computer, and since I have to manually select the slave product and write the target name (which is simply in the ‘month_day.nc’ format, with month and day taken from the S3 SYN file) each time before running it, it means I’m not able to do much else for an entire day.

I’m therefore trying to figure out if there’s a way to automate this process, at least by running it at once for for the few S3 SYN files I have in each month. The colocation tool doesn’t really work because it merges all days into a single file, distinguishing them only by a slave number(S1, S2, S3, S4) while I need to keep them separate. I’ve been trying to get the batch processing tool to run the graph for multiple S3 SYN , but the same issue remains.

abruescas · April 6, 2022, 8:55am

You could try to use gpt. The Collocate function is availabe there, and it will save you time with the selection of the slaves and the naming.

gnwiii · April 7, 2022, 12:49am

You should describe your problem with the batch processing tool – there may be a simple solution.

Command-line tools allow you to write a script that the loops over a list of files to construct and run a gpt command. As @abruescas mentions, there is a tutorial for this, but you may need to spend some time getting comfortable with the command-line tools.

There are many good tutorials for Windows and linux command-line processing. A good strategy is to work out the gpt command line for one file, then try to construct a loop but put and “echo” command before the gpt command line so you can check the commands quickly. Some of my colleagues find it easier to generate a list of command lines either by editing a directory listing or using a spreadsheet to create a script that is just a list of command lines without the need for a loop.

Frankie91 · April 7, 2022, 7:54am

Could you explain how one would accomplish the very last thing you mention(“using a spreadsheet to create a script that is just a list of command lines without the need for a loop”)?

gnwiii · April 7, 2022, 11:47am

First construct and test one gpt command line. Typically a couple entries will need to change – usually the time or date component of file names. In a spreadsheet, you create columns the entries that change, then use this to construct a command-line that you can export to one-column ASCII (csv?) file. Rename the file (e.g., X.csv → X.bat) and edit out any header or other extraneous characters. Some users just create a .csv file with the variable strings, and use an editor to insert the rest of the command-line and strip out the commas. If the variable strings are filenames, you can use a directory listing to create the spreadsheet so you don’t have to write macros to generate the strings.