Using StatisticsOp operator with Python


I’m trying to do some zonal statistics over the cloud confidence band from Sentinel-2 by using the StatisticsOp operator. I have the following code:

class CheckClouds:

def __init__(self, sat_img_path):
    self.sat_img_path = sat_img_path
def read_cloud_band(self):
    ''' Read the cloud confidence band. '''
    product = ProductIO.readProduct(self.sat_img_path)
    band_names = product.getBandNames()
    # print (list(band_names))
    b_qcc = product.getBand('quality_cloud_confidence')
    # print (b_qcc)
    HashMap = jpy.get_type('java.util.HashMap')

    BandConfiguration = jpy.get_type('org.esa.snap.statistics.BandConfiguration')
    bandConfiguration = BandConfiguration()
    # print(dir(bandConfiguration))
    bandConfiguration.sourceBandName = 'quality_cloud_confidence'
    bandConfigurations = jpy.array('org.esa.snap.statistics.BandConfiguration', 1) 
    bandConfigurations[0] = bandConfiguration
    parameters = HashMap()

    parameters.put('bandConfigurations', bandConfigurations)
    parameters.put('sourceProductPaths', str(self.sat_img_path))
    parameters.put('shapefile', '..\\..\\config\\test.shp')
    parameters.put('outputAsciiFile', 'D:\\test_file.csv') 
    # parameters.put('writeDataTypesSeparately', True) 
    result = GPF.createProduct('StatisticsOp', parameters, product) 
    return result

new_sat_imag = CheckClouds(‘D:\sentinel_2_img_dir\’)


I have no output (no file or product is generated) and no error after running the script. Does someone have an idea about what I’m doing wrong ? Is the parameter setup ok for this operator ?

Is it always necessary to use ProductIO.writeProduct how can I get only the values from the exported csv file ?



Somehow I figure it out how to use the StatisticsOp operator, but … there are some issues I believe or I miss something.

If I use the raw image format like ‘’ (downloaded from sentinel 2 data server) the script can’t process the ‘quality_cloud_confidence’ band (I believe is not a physical band). The band is considered only if I create a subset from the raw image, in this way in the subset there is a physical quality_cloud_confidence.img file (in the subset there are *.img files for all existing bands listed by ). If I use a band name like ‘B4’ the script goes ok in both situations.

Does anyone knows something about this issue(s) ?

My loop stops unexpected without error, it doesn’t loop through all files from the folder. Does someone know if it can be fixed?

from snappy import ProductIO, GPF, jpy

import os, glob

class CheckClouds:

def __init__(self, sat_img_path):
    self.sat_img_path = sat_img_path
def read_cloud_band(self):
    S2CacheUtils = jpy.get_type('org.esa.s2tbx.dataio.cache.S2CacheUtils')
    ''' Read the cloud confidence band. '''
    product = ProductIO.readProduct(self.sat_img_path)
    band_names = product.getBandNames()
    name = product.getName()
    # print (name)
    # print (list(band_names))
    # b_qcc = product.getBand('quality_cloud_confidence')
    # print (b_qcc) 

    # Configuring operator
    # Build parameters config for StatisticsOp
    BandConfiguration = jpy.get_type('org.esa.snap.statistics.BandConfiguration')
    bandConf_1 = BandConfiguration()
    # print(dir(bandConf_1))
    bandConf_1.sourceBandName = 'quality_cloud_confidence'
    # print (bandConf_1.sourceBandName)
    bandConfigurations = jpy.array(BandConfiguration, 1)
    # print (dir(bandConfigurations))
    bandConfigurations[0] = bandConf_1
    HashMap = jpy.get_type('java.util.HashMap')
    parameters = HashMap()

    parameters.put('bandConfigurations', bandConfigurations)
    parameters.put('sourceProductPaths', self.sat_img_path)
    parameters.put('shapefile', '..\\..\\config\\roi_area.shp')
    parameters.put('outputAsciiFile', '..\\..\\data_output\\quality_check\\' + name +'_test_file.csv') 
    parameters.put('writeDataTypesSeparately', True) 
    # print (parameters)
    result = GPF.createProduct('StatisticsOp', parameters, product) 

    # print (result)

    ProductIO.writeProduct(result, str('..\\..\\data_output\\quality_check\\' + name +'_qa.dim'), 'BEAM-DIMAP')
    # ProductIO.writeProduct(result, '', '')

    # print (dir(result))
    print ("\nDone!\n")

file_list = glob.glob(’…/…/data_output/subsets/*.dim’)

for file_name in file_list:
print (file_name)

The quality_cloud_classifiaction is a physical band.
It is located in the QI_DATA folder, not in the IMG_DATA. It is named L2A___CLD_60m.jp2.
It is not available in 10 meters.
If the band is present after a subset it must exist beforehand.

Yes, the CSV data can only be accessed via the written file.

I don’t know why your loop stops. Are the other products processed after the first chunk is done and you start the processing again?

Regarding the loop stop issue…
The way that I wrote the script is like this: I generated a list of file paths and loop through it by instantiating the class for each *.dim file from the list of file paths.

If you skip over the already processed ones is it failing at the same product.
You could move the already processed data some where else. Maybe one is corrupted?
Or does it stop at different products?

It stops at different product. I don’t have problems with the products, I use them to calculate various indexes.
I reversed the loop etc. … Usually it runs for about 20 products and stops unexpected with no error message.

I have to mention that I have to loop through 47 *.dim files. After splitting images in 4 different folders the script worked as it should. I believe that some used operator or other snappy instantiated object is overloading the memory.

Can be ‘parameters’ written as a nested dictionary in this version? In this way it could be possible to limit class initialization (object creation).

I don’t think so.
If I remember correctly there is no conversion implemented from Python dictionary to Java (Hash)Map.