Hi,
I am trying to run multiple GPT processes in an HPC environment using Slurm.
Until recently I had set snap.userdir
to the default directory $HOME/.snap
but changed it to a custom directory to redirect auxdata and cache files (not sure if this has anything to do with it. I did not have the error before. It could also only be related to the new SNAP 10 version).
Now my processes fail with the following error message:
WARNING: org.esa.snap.dataio.gdal.GDALInstaller: Failed to delete the GDAL distribution folder '.snap/auxdata/gdal'.
SEVERE: org.esa.snap.dataio.gdal.GDALLoader: Failed to initialize GDAL native drivers. GDAL readers and writers were disabled.zip END header not found
It turns out that, although GDAL is already installed under snap.userdir/auxdata/gdal
, each process deletes the directory and tries to reinstall it. This leads to the conflict between the parallel processes, one process copying the zip file into the directory and unpacking it, another deleting everything again.
Eventually, one process succeeds with re-installation and few of the processes continue successfully.
Any idea how to fix this? I don’t really see why the processes try to reinstall GDAL in the first place. Access rights are per default set to 770. I changed them to 775 but get the same result. So this does not seem to be the problem.
Okay so I noticed that SNAP sometimes writes some GDAL configuration variables to snap.properties
:
gdal.installer
-
gdal.apps.path
: introduced in snap-engine commit b92befc
-
gdal.distribution.hash
: introduced in snap-engine commit 683fa79
Anybody has an idea why they are there and whether this can have anything to do with my error?
Hello @johntruckenbrodt ,
-
gdal.installer
- saves the last SNAP version that installed GDAL. It is used in the code to reinstall GDAL if SNAP has been updated.
-
gdal.distribution.hash
- saves the last installed GDAL distribution hash. It is used to check if the internal GDAL distribution has been updated and needs to be reinstalled.
-
gdal.apps.path
is no longer used.
Regarding your error, could you delete the gdal folder, run a single GPT process and check if the folder is recreated?
Hi @diana_harosa,
thanks a lot for your reply and sorry for taking so long myself.
I can confirm that the gdal folder is recreated when running a single GPT process.
JIRA ticket SNAP-3637 already created for this issue. The fix will be available in the next update.
With the GDAL distribution already installed, you should now be able to run multiple GPT processes.
Hi @diana_harosa I am sure the issue is related but I think it is not exactly the same.
GDAL is reinstalled when running a single GPT process and subsequent single GPT processes use this installation.
However, when running multiple GPT processes in parallel, the different processes delete the GDAL installation and attempt to reinstall it. Most of the processes fail:
stderr_sar_1584545_0128.log. One can literally watch how the directory is repeatedly emptied, the zip file copied and unpacked, only to be emptied again.
Okay so for the time being I have found a hacky workaround:
DIR_USER=.snap_tmp/$(uuidgen)
mkdir --parents "$DIR_USER"
gpt -J-Dsnap.userdir=$DIR_USER" ...
rm -rf "$DIR_USER"
@diana_harosa is there an update on this by any chance?
The issue has been fixed in SNAP 11, which is set to be released soon.