A StatisticsOp graph

This post shall help to explain how to set up a StatisticsOp graph and what is the meaning of its parameters. An example graph is attached. The StatisticsOp will, unlike other SNAP operators, not create an output product but instead write statistics to either a .csv-file or an ESRI shapefile. Note that this graph works with SNAP 6.0.3. If you work with earlier versions, some of the parameters are not available.

First, the StatisticsOp offers two ways to specify source products: Either by setting sourceProducts explicitly or sourceProductPaths. sourceProductPaths is a convenient way if you want to process multiple products which are located in the same directory. sourceProducts can be set with the following call:

gpt statistics-graph.xml -SsourceProducts=first_product.dim second_product.dim

You can specify that you want to retrieve statistics for specific regions by stating an ESRI shapefile. Statistics will be derived for each geometry in that shapefile separately.
You might also specify that you are only interested in data in a certain time range by setting a start and an end date (in the form year-month-day hour_of_day:minute:second). Products before the start date or after the end date will not be considered. Statistics will be aggregated for the time period, unless you specify a time interval with the interval parameter. There you may set the unit (either days, weeks, months or years) and the amount of temporal units. So, if you choose “2 days”, statistics will be aggregated for that time interval. This parameter only has an effect when start and end dates are set.

With bandConfigurations you can specify which bands you are interested in. You need to specify at least one bandConfiguration. A bandConfiguration must consist at least of either the name of a band (sourceBandName) or an expression from which a band can be computed (expression). Also, you might set a validPixelExpression to mask out pixels. If the band is an integer band, you can set the parameter retrieveCategoricalStatistics to true. This will cause that you don’t retrieve mean, max, or min from a band, but that you count the number of pixels per class and determine the classes with most and second most members.
You can then set the name of a shapefile to which you want to write the retrieved statistics (outputShapefile) and the name of an ascii text file to which the statstics are written in a .csv-format (outputAsciiFile). Note that when you set neither you will not get any results. For both the output shapefile and the output ascii file an accompanying metadata file will be created which documents the meaning of the attribute names.
When retrieving quantitative measures, you can ask to retrieve values for certain percentiles (percentiles) and the accuracy (accuracy) with which to retrieve values. A lower accuracy will result in a faster operator.
Finally, if you retrieve both quantitative and qualitative/categorical measures, you can choose to write these to separate output files (writeDataTypesSeparately). This option helps to keep the files manageable. If it is set, the file names will be extended with the terms categorical and quantitative, respectively.
statistics-graph.xml (1.2 KB)

6 Likes

Thanks TonioF for a well writen description.
Problem solved and StatisticsOP is a power full fucntion for analysis.
I will try ‘interval’ parameter also.
Important aspect is

You’re welcome.

Hello, i’m new at the forum but i’am passing thru a similar problem, if someone could help me, i would appreciate it.

The question is: I’m trying to extract some statistics from a few (81) tifs, and i’am very interested in doing it in a batch way :joy: i’ve made a kind of graph by a xml file, but when it runs, it generates a file (.txt) but blanked, and the error message that appears on cmd window is: Error: org.esa.snap.core.gpf.OperatorException

someone can help me please

@marpet could you help me pls?

the xml i use i’ve attached to this.

statistics-graph.xml (760 Bytes)

That’s a bug. The problem is that the GeoTiff files don’t have time information. And this is not correctly considered
(SNAP-931).
I think you can work around this issue by adding startDate and endDate as parameter.

Somthing like:

<startDate>2010-00-01 00:00:00</startDate>
<endDate>2018-08-02 23:59:59</endDate>
1 Like

Thank you for the reply and for reporting the bug. I also have a recommendation for the tool to export to csv file. When it’s get done it doesn’t export the equivalent number of looks and the coefficient of variation.

Just as you said, i tried to do. And there is another error message :tired_face:

But i’ll keep trying, thank you again.

statistics-graph.xml (846 Bytes)

Unfortunately you found another issue (SNAP-934) which is related to the first one.

If you are able to write python scripts you could read the data with snappy, parse the dates from the files names and set them the products and them write them in e.g. BEAM-DIMAP format.
How to use the SNAP API from Python

1 Like

Thank you for the reply. I’ll take a look on that alternative.

Hi there! here we go again, after a year haha. I’m sorry to resurrect this topic but i’m facing a new kind of easy trouble here.

i’m trying to run StatisticsOp from python, thru GPT. Using the function of subprocess.call and i couldn’t achieve a solution to this problem. Here the part from my code that it’s taking my sleep off.

StatisticsOp.

    arquivo_3 = area_arquivos_processados + "\\" + arquivo_1 + "_stats"
    subprocess.call([
        "gpt", "StatisticsOp", f"-Ssource={arquivo_2}",
        "-PbandConfigurations=Sigma0_IW2_VV",
        "-t",  f"-PoutputAsciiFile={arquivo_3}"
    ])

the part of the -PbandConfigurations, i don’t know how to set this parameter. i’ve tried it in a diferent ways like:

        "-PbandConfigurations=", "-PbandConfiguration=", "-PsourceBandName=Sigma0_IW2_VV"

and i am not being sucessful.

The error message that appears it:

Error: no converter defined for value ‘bandConfigurations’

anybody know how to proceed?

Cheers! I’ve solved it. In case of someone need to know, i’ve solved my original problem - that it was to generate a unique file (*.csv) with statistics of the image for each image that i’ve being working on - using the adapted xml file that are posted here (An exmaple gpt command or graph/xml to run StatisticsOp?). And at my python code i’ve done a tiny modification that will be shown below.

    # StatisticsOp.
    arquivo_3 = area_arquivos_processados + "\\" + arquivo_1 + ".csv"
    subprocess.call([
        "gpt", "-e", "directory_here\\statistics-graph.xml", f"-Ssource={arquivo_2}",
        f"-Pout_stats={arquivo_3}"
    ])

This parameters

        "-PbandConfigurations=", "-PbandConfiguration=", "-PsourceBandName=Sigma0_IW2_VV"

i’ve put inside the *.xml file.04_stats_vv.xml (523 Bytes)

I’m interested in some of the functionalities of the StatisticsOp and I found some information, which might be useful to other users as well. Perhaps @TonioF and @marpet can confirm that what I write is true.
The StatisticsOp, which you can use only through the gpt and Statistics, which is available as a function in the Snap GUI (the Sigma/sum symbol in the Analysis drop-down menu) are not the same operators. Their functionality is quite different.
The StatisticsOp is extremely similar to the TemporalPercentile operator, however the latter has a broader scope of functions - it can extrapolate values in case of missing pixel information and as input you should have at least two products.
Also the TemporalPercentile can be called through the gpt, or in the Graph Builder in the GUI and thus it can be used in the GUI batch processing tool. The StatisticsOp is callable only through gpt and the GUI has no clue about it (except the help menu).

The StatisticsOp and the Statistics Tool use the same code base (so, e.g., mean, min, and max values are derived in the same way). They serve different purposes, though. With the Statistics Tool in SNAP Desktop you can view information about a single product (and maybe a mask), whilst the StatisticsOp is designed to retrieve information for multiple products, time steps, and areas.

The TemporalPercentileOperator does not use that code base. It is used to create time series per pixel and allows to interpolate in case of missing values. A major difference is that it creates an output product, while the Statisticsop writes its results to .csv-files or ESRI-Shapefiles.

This is correct.

2 Likes

this was quite helpful indeed, thanks a lot!!

Hi,

I try to use statisticsOp to extract the statistics inside a polygon (shapefile) from each of the 264 bands composing a netcdf file to have them on a txt file (then to excel). I especially need the mean, median, percentile 25 and percentile 75.

Here is the script I used. I have no error message but also no result.

I would appreciate your help.

Kind regards,
Sim

is the polygon stored in the same coordinate reference system as the raster stack? You can go sure by importing into the SNAP GUI once.
If so, does the tool ini the GUI display correct statistics for a band?

Hi,

Thank you for your rapid response. Yes, both nc file and shapefile are in wgs84 datum. I have used GUI to retrieve statistics band per band and the computed statistics seem to be correct.

Kind regards,