Sentinel-2 resample file sizes SNAP versus r-stats - not equal

mthompson · April 28, 2022, 4:42pm

When I process Sentinel-2 product using SNAP to export bands as GeoTiff at resampled resolution (e.g., resample all to Band 2 @ 10 m), the output looks great and I can work with it in r-stats. However, I had someone help me batch process and we noticed that some of the granules were coming out at 60 m resolution, despite being run with the graph builder. Perhaps there was a mistake made on our end, but this prompted me to program an option for resampling granules in r-stats. The r-stats jp2 to GeoTiff GDAL procedure I used is described here. The output granules from r-stats jp2 to GeoTiff GDAL conversion are all the same size - 4GB (see fist 9 in image). However, the resampled SNAP granules (the granules after the first 9 in the image) are mostly less than 2GB and quite varied in size. They have the same bands contained within and are all resampled to 10 m resolution. Does anyone know why this would occur?

gnwiii · April 28, 2022, 5:13pm

The differences in size could be due to compression, different data types (byte versus float32). You can use gdalinfo or similar tools to verify the dimensions and data type (I don’t recall whether gdalinfo gives
information about compression, but then I often don’t recall where I left my coffee cup).

mthompson · April 28, 2022, 5:30pm

Thanks - that is helpful. I think you are correct about byte v float32 and will look further into this - but the discussion here makes me realize that this is a huge topic. In my case, I am going to process all with GDAL. However, it would be helpful to have a comparison between the data compression types to set a match between SNAP and GDAL in r-stats so that the same matching output can be created.

gnwiii · April 28, 2022, 6:32pm

Trying to use file sizes in comparisons is very fragile. Lossless compression does not mean that two files compressed using different software will be the same (sometimes just due to different length strings for versions or processing timestamps). I suggest looking for ways to track conditions that result in missing/invalid pixels. For my own production workflows (most built in the 1990’s around a Fortran library that processes a single pixel) I use a local version of the SLATEC xerror routine to record “errors” as each pixel is processed and print a summary table at the end, so I have counts of the various reasons for missing data in the output files, but this approach only tells you about the first issue that is encountered for each pixel. NASA’s msl12/l2gen software creates a mask band that allows you to record multiple error conditions for a given pixel.