Introducing snapista, a GPT wrapper for Python

fabricebrito · March 22, 2021, 2:14pm

Hello!

Let me introduce snapista, a GPT wrapper for Python. The goal is to provide an easy and pythonic way (to some extent) to write and run SNAP graphs using Python (a script or a Jupyter Notebook).

To create a graph, one would write:

from snapista import Graph
from snapista import Operator
g = Graph()
g.add_node(operator=Operator('Read'), 
           node_id='read_1')

calibration = Operator('Calibration')

calibration.createBetaBand = 'false'

g.add_node(operator=calibration, 
           node_id='calibration', 
           source='read_1')

It is documented here: https://snap-contrib.github.io/snapista/ and the software repo is on Github here: https://github.com/snap-contrib/snapista

There’s a demo on Binder (linked in the software repo README) and a set of examples in the documentation.

You’ll also find other elements that may support your activities on the Github organization https://github.com/snap-contrib, eg:

SNAP packaged with conda
a docker container with SNAP and Python
how to use Visual Studio Code with remote containers
a Jupyter/Theia docker image with SNAP and Python

Feedback and comments welcome!

falahfakhri · March 22, 2021, 3:44pm

I’d like to ask about the already created graph in SNAP, for instance to find a node and then to edit this node and make it processable with python,
An example,
file path = Process.xml
find the read node
change it to make it within a for loop in python , in order to process multiple tile within the directory,

Please give an example if snapista could do so.
Thanks a lot.

fabricebrito · March 22, 2021, 4:38pm

Hello @falahfakhri,

The goal here is get away from XML and instead write code to create and process Graphs.
So although snapista can be used to serialize a snapista.Graph object to XML, it is not meant to deserialize an XML graph and create snapista objects (Graph and Operator).

Fabrice

florian.beyer · March 23, 2021, 6:08am

Hi @fabricebrito,
sounds awesome! I’ll try it.
Is your wrapper using the snappy package behind the curtains?
I ask, because after finishing a complex python script with snappy it was very painful to see that the processing is very very slow, much slower than using the commandline (pure java).
I would love to see a way using python for my large batch operations with comparable speed…

falahfakhri · March 23, 2021, 7:42am

I tried all the ways to install it, anaconda python 3.8.8, windows 10, but all of them failed.

fabricebrito · March 23, 2021, 4:36pm

Hello @florian.beyer,

It uses snappy to get the info about the operators but relies on the gpt for the execution by doing a system call. So processing performance is gpt’s

fabricebrito · March 23, 2021, 4:40pm

Hello @falahfakhri, snapista relies on snap packaged as conca and that targets Linux OS.
For Windows and Mac OS, there’s guidance on how to use VS Code remote containers to run a Linux based container for the development activities.
See https://snap-contrib.github.io/snapista/installation/

Hope this helps

mdelgado · March 27, 2021, 12:36pm

Hi @fabricebrito!
This sounds really good! Thanks for the efforts and sharing with the community.

I have seen that relies on snap 8.0.0 and I am wondering if it does automatically checks and is able to update the plugins once they are available.

Additionally, another question would be, is there any easy way to make it also compatible with previous SNAP versions? or newer ones?

Thanks!

fabricebrito · March 31, 2021, 3:05pm

Hello @mdelgado,

It does rely on snap 8.0.0, so when you install snapista you get snap 8.0.0 installed.
For updating it, you need to follow the same approach as for snap headless updates.

Here’s an example for installing idepix for Sentinel-3

${PREFIX}/snap/bin/snap --nosplash --nogui --modules --install org.esa.snap.idepix.core                 
${PREFIX}/snap/bin/snap --nosplash --nogui --modules --install org.esa.snap.idepix.olci

For older or newer versions, one would have to do the equivalent of https://github.com/snap-contrib/snap-conda for these versions.

I hope you have all the info requested

mengdahl · March 31, 2021, 3:09pm

Great contribution Fabrice, I was wondering if we could automate generating new versions of snapista with each SNAP module update so that novice users would not need to deal with Conda?

fabricebrito · April 1, 2021, 7:03am

Hello @mengdahl,
snapista does not have to change when snap gets an update. What changes is its underlying dependency on snap package as conda (snap-conda). So we don’t have to re-bluid snapista when that happens.
snap-conda will follow major updates of snap. For additional modules or their updates, users (it includes the novice users) have to update the snap installation following the headless update with

/bin/sh -c snap --nosplash --nogui --modules --update-all

which for a snap installed with conda is:

${PREFIX}/snap/bin/snap --nosplash --nogui --modules --update-all

fabricebrito · April 2, 2021, 8:07am

Hello @falahfakhri

There’s a new update in snapista for version 0.1.3. You can load a graph from a local path or a remote URL on a HTTP(s) server and update it:

g = read_file("https://gist.githubusercontent.com/fabricebrito/fe7df152e9f0df3a3ff6d3974b87e9e2/raw/294b5d8fec9b2b1d4fdc7468611c9bb7756f9e7a/graph.xml")

g.view()

g.add_node(
        operator=Operator(
            "Read",
            formatName="DIMAP",
            file='a file',
        ),
        node_id="read",
    )

g.view()

I hope this helps!

falahfakhri · April 2, 2021, 9:56am

Dear @fabricebrito

Thanks a lot for this update, I’d like to raise a question here,

Let’s say we have created the following xml,

g = read_file("https://gist.githubusercontent.com/fabricebrito/fe7df152e9f0df3a3ff6d3974b87e9e2/raw/294b5d8fec9b2b1d4fdc7468611c9bb7756f9e7a/graph.xml")

g.view()

g.add_node(
        operator=Operator(
            "Read",
            formatName="DIMAP",
            file='a file',
        ),
        node_id="read",
    )

g.dd_node(
       operator=Operator(
      "Apply Orbit"
     node_id="apply orbit"
))

g.view()

How could I access the node read_file to add up in my python script a for loop, in order to read and apply the operator- operators to multiple file?

And the second question,

Do you have any cheat_sheet, or you might creating a one for updating the operators of the *.xml file without returning back to SNAP!

fabricebrito · April 7, 2021, 8:55am

@falahfakhri
I’d go for something like this:

from snapista import read_file
from snapista import Operator

g = read_file("https://gist.githubusercontent.com/fabricebrito/fe7df152e9f0df3a3ff6d3974b87e9e2/raw/294b5d8fec9b2b1d4fdc7468611c9bb7756f9e7a/graph.xml")

for myfile in ['filea', 'fileb']:

    g.add_node(
            operator=Operator(
                "Read",
                formatName="DIMAP",
                file=myfile,
            ),
            node_id="read",
        )

    g.view()
    
    g.run()

A second suggestion is to use CWL and Docker as explained here: https://github.com/snap-contrib/cwl-snap-graph-runner

This approach is convenient to batch process files against an existing graph file. It’s completely detached from snapista as it uses a snap docker image. This allows running SNAP against local EO data without installing SNAP

falahfakhri · April 7, 2021, 9:51am

@fabricebrito

Thanks a lot for clarifications, But since me and might be the others of our colleagues have many questions, also I’m not sure if many researchers know this great code, I have suggestions, I hope you have time to take them in your account,

What do you think if you could create two or three hours webinars each week one hour for instance, talk about the following:

First : snapista, installation under windows 10 os
snapista, installation under Linux os

Second: create a virtual processing example of different data
Read the data, apply some operators, write the data

I think this is the best way of shorten the silly questions like mine, and give the people wide area to give you their suggestions. Also this we’ll be a reference of your script.

You could take and simulate any two projects form RUS copernicus for instance, once talk about S1- and the second talks about S2.

I hope you able to find time in your schedule to implement this suggestion.

pavithra · April 23, 2021, 6:43pm

Do you have any cheat_sheet, oryou might creating a one for updating the operators of the *.xml file without returning back to SNAP!

fabricebrito · April 26, 2021, 8:05am

Hello @pavithra

Here’s the link to the documentation: https://snap-contrib.github.io/snapista/gettingstarted/#load-an-existing-graph

mdelgado · June 10, 2021, 9:14pm

@fabricebrito, the snapista package looks very promising! Kudos for such nice work!

I have been using it recently and it makes life easier!
Looking forward to seeing its full potential!

AriJeannin · June 23, 2021, 9:10am

@fabricebrito, Very glad to see a promising solution for combining Snap and python, looking forward to try it!

I was just wondering about the performances, how are they set within Snapista ?

Thanks for your contribution!

ps: I’am using something like:

    gpt_cli = ['gpt',
               graph_path,
               '-q', MAX_CORES,  # Maximum parallelism
               '-J-Xms2G -J-Xmx{}'.format(bytes2snap(MAX_MEM)),  # Initially/max allocated memory
               '-J-Dsnap.log.level=WARNING',
               '-J-Dsnap.jai.defaultTileSize={}'.format(TILE_SIZE),  # Tile size, set to 4096 or lower for ESD operator
               '-J-Dsnap.dataio.reader.tileWidth={}'.format(TILE_SIZE),
               '-J-Dsnap.dataio.reader.tileHeigh={}'.format(TILE_SIZE),
               '-J-Dsnap.jai.prefetchTiles=true',
               '-c {}'.format(bytes2snap(0.75 * MAX_MEM)),  # Tile cache, up to 75% of max memory
               # '-x', # Clears the internal tile cache after writing a complete row to the target file
               *other_args]

mengdahl · June 23, 2021, 9:12am

Snapista is a Python wrapper and has the same performance as the Java gpt.