V9 os SNAP gpt stops after a few runs

christophe · February 7, 2023, 3:58pm

by cleaning you mean deleting I suppose?
I am working on Linux.
I have the following folders in my /home/user/.snap
auxdata epsg-database etc graphs product-library system var
shall I delete them?

I do not know what would be the AppData folder for Linux. Could you advise?

oana_hogoiu · February 7, 2023, 3:59pm

On Linux all of these are under
~/.snap

Yes

christophe · February 7, 2023, 4:00pm

ok,
I have to run the installation shell scriptagain, right?

oana_hogoiu · February 7, 2023, 4:03pm

No, that folder is created when you start SNAP application.

christophe · February 7, 2023, 4:12pm

I deleted the .snap folder but I am still stuck after de “done”

kraftek · February 7, 2023, 4:22pm

I managed to reproduce on both Windows and Linux, before I have forgotten to change the subset extent when changing the product
What I have also noticed: if you change the formatName to GeoTIFF (and the extension of the output to tif instead of dim), it works.
For example, in your graph, change the Write node to:

> <node id="Write">
>     <operator>Write</operator>
>     <sources>
>       <sourceProduct refid="Subset"/>
>     </sources>
>     <parameters class="com.bc.ceres.binding.dom.XppDomElement">
>       <file>your_path/S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105_resampled_subset.tif</file>
>       <formatName>GeoTIFF</formatName>
>     </parameters>
>   </node>

@marpet There seems to be a locking problem with the BEAM-DIMAP writer

christophe · February 7, 2023, 4:50pm

I am afraid it does not change on my side. Still stuck after the “done”.
The tif file is indeed generated but it stops again

<graph id="Graph"> <version>1.0</version> <node id="Read"> <operator>Read</operator> <sources/> <parameters class="com.bc.ceres.binding.dom.XppDomElement"> <file>/mount/data_3/prod_data/gbov/EO/UoS/S2L2//BART/2022/S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105.SAFE/MTD_MSIL2A.xml</file> </parameters> </node> <node id="Resample"> <operator>Resample</operator> <sources> <sourceProduct refid="Read"/> </sources> <parameters class="com.bc.ceres.binding.dom.XppDomElement"> <referenceBand/> <targetWidth/> <targetHeight/> <targetResolution>20</targetResolution> <upsampling>Nearest</upsampling> <downsampling>Mean</downsampling> <flagDownsampling>FlagMedianOr</flagDownsampling> <resamplingPreset/> <bandResamplings/> <resampleOnPyramidLevels>true</resampleOnPyramidLevels> </parameters> </node> <node id="Subset"> <operator>Subset</operator> <sources> <sourceProduct refid="Resample"/> </sources> <parameters class="com.bc.ceres.binding.dom.XppDomElement"> <sourceBands/> <region>0,0,0,0</region> <referenceBand/> <geoRegion>POLYGON((-71.3186 44.0864,-71.256 44.0864,-71.256 44.0414,-71.3186 44.0414,-71.3186 44.0864)) </geoRegion> <subSamplingX>1</subSamplingX> <subSamplingY>1</subSamplingY> <fullSwath>false</fullSwath> <tiePointGridNames/> <copyMetadata>true</copyMetadata> </parameters> </node> <node id="Write"> <operator>Write</operator> <sources> <sourceProduct refid="Subset"/> </sources> <parameters class="com.bc.ceres.binding.dom.XppDomElement"> <file>/mount/internal/work-st/projects/jrc-066/1553-gbov/component1/data/EO/CP1yr3/vegsub/S2L2//BART/2022/S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105_resampled_subset.dim</file> <formatName>GeoTIFF</formatName> </parameters> </node> <applicationData id="Presentation"> <Description/> <node id="Read"> <displayPosition x="37.0" y="134.0"/> </node> <node id="Resample"> <displayPosition x="163.0" y="145.0"/> </node> <node id="Subset"> <displayPosition x="315.0" y="156.0"/> </node> <node id="Write"> <displayPosition x="455.0" y="135.0"/> </node> </applicationData> </graph>

kraftek · February 7, 2023, 4:59pm

You haven’t changed the output extension, it is still .dim (S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105_resampled_subset.dim). Try change it to S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105_resampled_subset.tif

christophe · February 7, 2023, 5:15pm

correct for the file name, thanks.
I used to have 2 outouts from this graph:
one dim, now a tif file: S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105_resampled_subset.tif
one datafile containing all the S2 bands:
S2A_MSIL2A_20220206T154431_N0400_R011_T19TCJ_20220206T202105_resampled_subset.data
the data file is no longer generated. This is the most important for my application.

kraftek · February 7, 2023, 5:47pm

The bands are in the TIF file:

gnwiii · February 7, 2023, 5:51pm

BEAM-DIMAP output consists of a .dim file with metadata, and a .data directory with each band stored as <name>.hdr (metdata) and <name>.img (image data). With GeoTIF you get one file with (often incomplete) metadata and all the bands (often with names like BAND1, etc.).

christophe · February 7, 2023, 5:54pm

OK thanks,
I would need to have all band in a separate folder like for the dimap datafile.
Would you have any suggestion?

My routine still get stuck even though I use geotiff as output format

gnwiii · February 7, 2023, 7:33pm

It may help to know more about the overall process. Are you using GNU parallel, a looping shell script, …?

So it is not an issue with the output format. You say it works for a “few runs”. I would make a note of the files that fail and try running them. If they fail reliably then something about their contents may be breaking gpt. If they run then we need to consider some sort of resource problem. In principle, each run of gpt is independent of other runs, but in practice there are optimizations that might cause the problem.
One optimization in linux is to delay removing programs from memory so they can be run in a loop with low overhead. Another is caching read and write data, both by the system and storage devices. CPU’s have been getting much faster but data storage has not kept pace. As a result, you can encounter situations where storage devices can’t keep up, which then requires more data to be cached by the system. Ideally the program generating data would be notified to slow down, but this is a rare cases so testing may not cover all situations. I would try adding a delay (in a shell script loop sleep can be used) between each gpt invocation.

For this sort of processing I have relied heavily on GNU parallel.

christophe · February 8, 2023, 7:28am

good morning,
I am running a serire of S2 images in a bash script calling the gpt in sequence. I am running in parallel but in sequence.
here is what I have this morning for a single job launched yesterday evening:

the “database closed” message came up at some point.

gnwiii · February 8, 2023, 11:43am

Not sure what this means. The GNU parallel program monitors system resources and can run shell scripts in parallel on a large system. Do you use parallel -j 1 to run jobs sequentially?

It is less than helpful to post images because they often omit key details such as the command line use to run a script. I can’t tell whether gpt is running in foreground or background. It takes more effort to quote sections from an image, and the contents are not searchable, so the next person who encounters the issue may not find this topic using a search.

Did the job produce the expected output?

When a shell script run in “foreground” finishes you should get back to the command prompt. If you run the script in background ( either by adding an ampersand (&) at the end of the command line or using and then the bg command) then you have the command prompt while the job is running. In workshops with users new to linux I have often had users reporting problems similar to yours who had multiple copies of a script in background because they confused <Ctrl-z> with <Ctrl-c>.

Depending on the linux distribution, there are many gui and command-line tools to view the status of running jobs along with memory usage. bpytop is widely available and gives a good view of the system status.

christophe · February 8, 2023, 1:11pm

sorry, that was not clear indeed.
so, no I do not run my jobs in parallel.

yes the job created the expected output but the script did not go next job.
I know the difference between ctrl-z and ctrl-c and this is not the problem here.
how can I progress with that issue?

gnwiii · February 8, 2023, 6:21pm

We need a lot more detail to understand your issue. It may help to post your shell script. Do you run it from the shell prompt in a terminal? Are you running scripts on a remote system or locally? Are the data files stored locally or using a network drive? DId the script return to the command prompt or was it stalled? Have you checked the output to make sure it is complete? Does your script check the return code ($?) for the gpt process?

When developing a gpt workflow it is useful to include the -e option to get more detailed error reporting.
You may want to surround the line with the gpt invocation with some progress reporting:

echo "gpt starting at $(date)"
gpt -e ...
echo "gpt finished with status $? at $(date)"

If you are running gpt on a remote system (e.g., using ssh from the local system to the remote) the session may be interrupted by network glitches. In such cases the nohup command is useful. It redirects the output of your job to a nohup.out file and will continue processing even if the terminal disconnects. It is good practice to monitor CPU loads, temperatures, memory usage, and disk usage while developing gpt workflows.

christophe · February 9, 2023, 8:03am

thanks for the tip.
I lauched my script yesterday evening with the “-e” option.
It stuck but with no additional information:

INFO: org.esa.snap.core.gpf.operators.tooladapter.ToolAdapterIO: Initializing external tool adapters
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Incompatible GDAL 3.5.0 found on system. Internal GDAL 3.2.1 from distribution will be used.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: GDAL 3.0.4 found on system. JNI driver will be used.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 3.0.4 set to be used by SNAP.
INFO: org.esa.snap.core.util.EngineVersionCheckActivator: Please check regularly for new updates for the best SNAP experience.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 3.0.4 set to be used by SNAP.
Executing processing graph
INFO: org.esa.s2tbx.dataio.s2.ortho.S2OrthoProductReaderPlugIn: Building product reader - EPSG:32613
WARNING: org.esa.snap.core.metadata.GenericXmlMetadata: Metadata: the path to element [metadata_level] does not exist
WARNING: org.esa.snap.core.metadata.GenericXmlMetadata: Metadata: the path to element [bandid] does not exist
WARNING: org.esa.snap.core.metadata.GenericXmlMetadata: Metadata: the path to element [bandid] does not exist
WARNING: org.esa.snap.core.metadata.GenericXmlMetadata: Metadata: the path to element [bandid] does not exist
WARNING: org.esa.snap.core.metadata.GenericXmlMetadata: Metadata: the path to element [bandid] does not exist
WARNING: org.esa.snap.core.metadata.GenericXmlMetadata: Metadata: the path to element [bandid] does not exist
INFO: org.hsqldb.persist.Logger: dataFileCache open start
WARNING: org.esa.s2tbx.dataio.s2.ortho.metadata.S2OrthoMetadata: Warning: missing file /mount/data_3/prod_data/gbov/EO/UoS/S2L2/CPER/2022/S2B_MSIL2A_20221015T174259_N0400_R098_T13TEF_20221016T091111.SAFE/GRANULE/L2A_T13TEF_A029296_20221015T175203/QI_DATA/L2A_T13TEF_20221015T174259_DDV_20m.jp2

WARNING: org.esa.s2tbx.dataio.s2.ortho.metadata.S2OrthoMetadata: Warning: no image files found for band quality_dense_dark_vegetation

18%36%54%....64%...89% done.
INFO: org.hsqldb.persist.Logger: Database closed

marpet · February 9, 2023, 10:02am

Something similar can happen when updating SNAP from the command line.
A user provided a workaround. It is mentioned in the note on this wiki page.
Update SNAP from the command line - SNAP - Confluence (atlassian.net)

snap --nosplash --nogui --modules --update-all 2>&1 | while read -r line; do
    echo "$line"
    [ "$line" =  "updates=0" ] && sleep 2 && pkill -TERM -f "snap/jre/bin/java"
done

Maybe you can adapt for your needs?
Changing the cmd call and change the ‘updates=0’ to ‘INFO: org.hsqldb.persist.Logger: Database closed’

marpet · February 9, 2023, 11:12am

I did some test runs.

Running with dimap first time it hangs
Switching to geotff it works
Running with dimap again, it works too

clearing S2 Cache

It hangs with dimap
Running again with dimap it works

clearing S2 Cache

It hangs with geotiff
It works with dimap

This makes me think that the cause is in the S2 cache creation. When the cached files are created, they might not be properly released.

@kraftek @oana_hogoiu maybe you can have a look in this direction?