Which is the recommended way of working with SNAP in Python - Snappy (JPY, GPF) or GPT

Which is the recommended way of working with SNAP in a Python context?

I have used the NetBeans-based GUI for the Sentinel Toolboxes a couple of times now. While it is nice, it does have some bugs and deficiencies. I would like to integrate the data retrieval (downloading of products) and preparation (extracting specific bands from a specific subregion that is part of a product) in Python in order to hopefully create an example for a cloud service that relies on data from Sentinel. So far I have explored the following two options:

  • snappy - the Python API that comes with the Python setup for SNAP
  • GPT - the command line tool for SNAP

I am using Sentinel 2 Toolbox but my guess is that this functionality is also available for 1 and 3.

For snappy there appears to be multiple things going on under the hood. For example I can get a subset in two ways:

  • Using JPY (Java-Python bridge):

    subset_from_region = snappy.jpy.get_type('org.esa.snap.core.gpf.common.SubsetOp')
    subset_from_region = subset_from_region()
    subset_from_region.setSourceProduct(s2_product)
    subset_from_region.setCopyMetadata(True)
    subset_from_region.setGeoRegion(region)
    subset_from_region = subset_from_region.getTargetProduct()
    
  • Using GPF:

    parameters = snappy.HashMap()
    parameters.put('copyMetadata', True)
    parameters.put('geoRegion', region)
    subset_from_region = snappy.GPF.createProduct('subset', parameters, s2_product)
    

If I go with GPT I need to call a sub-process to run the command like this:

subprocess.Popen(
  [gpt_path, '-h', 'Subset'],
  stdout=subprocess.PIPE,
  universal_newlines=True
).communicate()[0]

The latter is not exactly Python-ish per se and it involves managing multiple processes. The reason why I even decided to check GPT out was due to the lack of support for snappy for more recent versions of Python. It becomes increasingly difficult to work with it due to the support for other popular libraries being dropped for older versions of Python over time. Right now I am working on creating a separate Docker container just for snappy since other dependencies I am using to process the extracted images for a specific band at a later stage require more current versions of the Python interpreter. However I feel that this will fail at some point (probably quite soon). I also tried Miniconda by splitting my workflow into two - data acquisition (uses sentinelsat to retrieve a product given user-defined parameters including polygon describing the location), data preparation (uses snappy to process the acquired product producing images for specific bands) and data application (e.g. machine learning algorithm that uses the normal image data for enhancement, segmentation etc.).

I am looking for a more future-proof, consistent approach that involves minimal dependency management and is easy to setup.

Both GPF and JPY do the heavy lifting with Java code. JPY allows you to import data into Python (as, e.g., numpy arrays) to do manipulations in Python, but this involves copying data from Java objects into the Python objects, so memory management can be problematic. GPF provides many specialized operators that are specific to remote sensing. Some operators are simple to code in matrix languages, but others are compiex and sensor-dependent. There are, however, cases where ad-hoc workflows don’t take into account the details specific to remote sensing. One I frequently encounter is mapping optical data stored as a 2D array with matching 2D lon-lat arrays without taking into account the variation in the ground footprint of the sensor array pixels. The footprint is larger near pass edges, and the quality of edge pixels is reduced by longer atmospheric path lengths, so “binning” algorithms are needed to ensure that the lower quality large footprint pixels are not over-represented in mapped image.

NASA’s SeaDAS group has adopted SNAP with the addition of OCSSW processing tools, including Sentinel 2 support.
OCSSW programs can be run from the SeaDAS GUI, but they are command-line programs (for linux and macOS, but for Windows you can use WSL) well suited to large scale batch processing.

If a Python-wrapper is suitable for your needs, take a look at snapista. It will be a fully supported part of SNAP in the near future.

SNAPISTA (snap-contrib.github.io)

Since snapista calls gpt the performance should be way better than what snappy can deliver in most cases.