Batch processing with GPT extremely slow, initialization issues?

Hi,
I’m using gpt to batch process thousands of L2 LST products from sentinel 3. I wrote a very simple python script calling gpt via subprocess, here it is:

psg = "4326"


for entry in os.scandir("./data"):

if entry.is_dir():
    
    #compute final tiff filename from product name
    splitname = entry.name.split("____") 
    timestamp = splitname[1]
    sat = splitname[0][:3] # Crops the satellite name from the file name
    date = timestamp.split("_")[0] # Crops the date & time portion from the file name 
    date = ''.join(date) # Convert the list to string 
    fullname = date+"_"+sat #full final tiff filename
    
    #product xml file
    in_file = entry.path + "/" + "xfdumanifest.xml"
    
    print(fullname)
    start = perf_counter()

    print("subset")
    proc = subprocess.run(["gpt","subset","-Ssource="+in_file,"-PcopyMetadata='false'","-PsourceBands=LST", "-f","GeoTIFF-BigTIFF","-t","./tiffs/z_"+fullname])
    t1 = perf_counter() 
    
    print("reproject")
    proc = subprocess.run(["gpt","reproject","-Ssource=./tiffs/z_"+fullname+".tif","-Pcrs="+psg,"-PnoDataValue=-9999", "-f","GeoTIFF-BigTIFF","-t","./tiffs/"+fullname])
    end = perf_counter()

    print("subset duration ",t1-start)
    print("reproject duration ",end-t1)
    print("overall duration  ",end-start)
    print("***")

it works, but the process is extremely slow: each time gpt gets called, it starts looking for GDAL and configuring it. the actual process time is also not that performant, but acceptable. This is the output for two files:

20210105T000155_S3A

subset
INFO: org.esa.snap.core.gpf.operators.tooladapter.ToolAdapterIO: Initializing external tool adapters
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: GDAL 2.3.3 found on system. JNI driver will be used.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
INFO: org.esa.snap.core.util.EngineVersionCheckActivator: Please check regularly for new updates for the best SNAP experience.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
WARNING: org.esa.s3tbx.dataio.s3.AbstractProductFactory: D:\projects\MAAT\T4\sentinel3_SLSTR\data\S3A_SL_2_LST____20210105T000155_20210105T000455_20210105T020632_0179_067_059_1620_LN2_O_NR_004.SEN3 (Accesso negato)
WARNING: org.esa.s3tbx.dataio.s3.AbstractProductFactory: Could not find ''.
INFO: org.hsqldb.persist.Logger: dataFileCache open start
Executing operator...
20%46%73%99% done.
INFO: org.esa.snap.core.gpf.common.WriteOp: Start writing product Subset_S3A_SL_2_LST____20210105T000155_20210105T000455_20210105T020632_0179_067_059_1620_LN2_O_NR_004.SEN3 to .\tiffs\z_20210105T000155_S3A
Writing...
.10%.INFO: org.esa.snap.dataio.bigtiff.BigGeoTiffProductWriter: writing to output file .\tiffs\z_20210105T000155_S3A.tif
20%46%73%99% done.
INFO: org.esa.snap.core.gpf.common.WriteOp: End writing product z_20210105T000155_S3A to .\tiffs\z_20210105T000155_S3A
INFO: org.esa.snap.core.gpf.common.WriteOp: Time:  3.629 s total,  3.024 ms per line, 0.002016 ms per pixel

reproject
INFO: org.esa.snap.core.gpf.operators.tooladapter.ToolAdapterIO: Initializing external tool adapters
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: GDAL 2.3.3 found on system. JNI driver will be used.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
INFO: org.esa.snap.core.util.EngineVersionCheckActivator: Please check regularly for new updates for the best SNAP experience.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
INFO: org.hsqldb.persist.Logger: dataFileCache open start
Executing operator...
20%.....30%.....42%....52%....62%....72%...90%88%86%84%82%..... done.
INFO: org.esa.snap.core.gpf.common.WriteOp: Start writing product projected_z_20210105T000155_S3A to .\tiffs\20210105T000155_S3A
Writing...
..10%..INFO: org.esa.snap.dataio.bigtiff.BigGeoTiffProductWriter: writing to output file .\tiffs\20210105T000155_S3A.tif
.21%....31%.....41%....51%....61%....71%....81%....91%.... done.
INFO: org.esa.snap.core.gpf.common.WriteOp: End writing product 20210105T000155_S3A to .\tiffs\20210105T000155_S3A
INFO: org.esa.snap.core.gpf.common.WriteOp: Time: 104.328 s total, 86.868 ms per line, 0.003072 ms per pixel

subset duration  87.1625576
reproject duration  129.3651081
overall duration   216.5276657

***

20210105T000155_S3A

subset
INFO: org.esa.snap.core.gpf.operators.tooladapter.ToolAdapterIO: Initializing external tool adapters
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: GDAL 2.3.3 found on system. JNI driver will be used.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
INFO: org.esa.snap.core.util.EngineVersionCheckActivator: Please check regularly for new updates for the best SNAP experience.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
WARNING: org.esa.s3tbx.dataio.s3.AbstractProductFactory: D:\projects\MAAT\T4\sentinel3_SLSTR\data\S3A_SL_2_LST____20210105T000155_20210105T000455_20210106T044313_0179_067_059_1620_LN2_O_NT_004.SEN3 (Accesso negato)
WARNING: org.esa.s3tbx.dataio.s3.AbstractProductFactory: Could not find ''.
INFO: org.hsqldb.persist.Logger: dataFileCache open start
Executing operator...
20%73%73%99%126% done.
INFO: org.esa.snap.core.gpf.common.WriteOp: Start writing product Subset_S3A_SL_2_LST____20210105T000155_20210105T000455_20210106T044313_0179_067_059_1620_LN2_O_NT_004.SEN3 to .\tiffs\z_20210105T000155_S3A
Writing...
.10%.INFO: org.esa.snap.dataio.bigtiff.BigGeoTiffProductWriter: writing to output file .\tiffs\z_20210105T000155_S3A.tif
20%46%73%99% done.
INFO: org.esa.snap.core.gpf.common.WriteOp: End writing product z_20210105T000155_S3A to .\tiffs\z_20210105T000155_S3A
INFO: org.esa.snap.core.gpf.common.WriteOp: Time:  2.120 s total,  1.767 ms per line, 0.001178 ms per pixel

reproject
INFO: org.esa.snap.core.gpf.operators.tooladapter.ToolAdapterIO: Initializing external tool adapters
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: GDAL 2.3.3 found on system. JNI driver will be used.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
INFO: org.esa.snap.core.util.EngineVersionCheckActivator: Please check regularly for new updates for the best SNAP experience.
INFO: org.esa.s2tbx.dataio.gdal.GDALVersion: Installed GDAL 2.3.3 set to be used by SNAP.
INFO: org.hsqldb.persist.Logger: dataFileCache open start
Executing operator...
20%....30%....40%....50%....60%.....70%...82%80%......94%.. done.
INFO: org.esa.snap.core.gpf.common.WriteOp: Start writing product projected_z_20210105T000155_S3A to .\tiffs\20210105T000155_S3A
Writing...
..10%..INFO: org.esa.snap.dataio.bigtiff.BigGeoTiffProductWriter: writing to output file .\tiffs\20210105T000155_S3A.tif
.21%....31%....41%....51%....61%....71%....81%....91%.... done.
INFO: org.esa.snap.core.gpf.common.WriteOp: End writing product 20210105T000155_S3A to .\tiffs\20210105T000155_S3A
INFO: org.esa.snap.core.gpf.common.WriteOp: Time: 99.907 s total, 83.187 ms per line, 0.002942 ms per pixel

subset duration  54.68569120000001
reproject duration  118.79726160000001
overall duration   173.48295280000002

***

Am I missing some configuration steps? Should I use some arguments when calling GPT?

Thanks!

I think it’s not related to code or operations, if i launch gpt without arguments from the command line it takes 52 secodns to display the help.

I guess it’s not the standard initialization time, is it?

Windows is at a disadvantage here: many Windows systems have AV software that scans file on access, and linux has optimizations to speed up repetitive use of the same program, and many Windows systems have very long PATH variables used to search for programs and libraries, while linux maintains a central cache for fast access to system libraries. There is a reason Linux is widely used in data centers.

Having said that, my old Windows laptop with SSD runs SNAP 8 gpt.exe in 22s after a reboot, and 4s on the 2nd iteration, so your system appears to have a problem. This could be a hardware or software issue, or malware. A low-end linux desktop with similar specs gave 3s for first run and under 2s for subsequent iterations. Another newer but same model laptop is sometimes slightly faster but often stalls for periods of 0.5 to several minutes with high fan noise and heat. Checking the Task Manager shows the Mcafee AV is working hard.

Back in the days of spinning disk drives, disk access would get really slow just before a drive failed. A very full disk will be slow. Modern storage devices have built-in self-testing. The best tests use the drive vendor’s software, but there are generic tests based on the S.M.A.R.T “standard”. Once storage issues are eliminated, you can use Windows Task Manager to see how resources are being used. There are a couple good malware scanners you can get at no cost: Malwarebytes and Microsoft Windows Safety Scanner.

@gnwiii @danidot Hi, I’m dealing with the exact same issue and I work with mundialis/esa-snap:ubuntu as a base docker image

If you are running GPT in a docker image it is NOT the same problem, but many of the diagnostics still apply.

You need to mention what GPT operations you are using – some operations require downloading “ancillary” data so will appear to stall until the download completes. In that case, the CPU and disk
will be idle until GPT has the required data.

I assume you are using Windows. Please tell us whether S.M.A.R.T software shows disk problems, and whether you are hitting resource limits (see Windows Performance Monitor) while GPT is slowly initializing. Do you see the same slowdown if you reboot and run GPT without any other programs running?