Python batch download of NCEP and AURA/OMI aux data for C2RCC processing with Snappy

SSteinbach · May 12, 2025, 2:16pm

Hello,

I need to apply water-specific atmospheric correction for a large number of Sentinel-2 1C images using the C2RCC processor in Snappy on a processing server. For this, I am trying to automate the retrieval of the necessary auxiliary data. To my understanding, I need:

Surface pressure from NCEP/NCAR Reanalysis 1
Total column ozone from Aura OMI TOMS

So far I understand that:

I cannot use automatic downloading of aux data directly from my C2RCC server script with Snappy as the script shows me it falls back to default values when I do not provide the data myself (1000 hPa for surface pressure and 300 DU for ozone).

I have a couple of questions:

Is there a recommended or existing Python-based way and source where to download and how save these datasets in bulk (ideally via FTP or HTTPS)? I have no clue whether this note is still valid and if so, it does not tell me the required data naming for NCEP.
I have a script that successfully batch downloads from OB.DAAC, see below, but the repository only contains Aura OMI until 2022.

I have reviewed the SNAP and C2RCC documentation, but am a bit lost on which is the workflow that C2RCC/Snappy accepts and which are the correct and complete aux data sources. Many of the few posts I found seem outdated.

Thanks a lot for your help!

Stefanie

import requests
import os
from tqdm import tqdm

base_url = 'http://oceandata.sci.gsfc.nasa.gov/getfile/'
user = 'username'
pw = 'password'
output_path = "../Testpath"

start_year = 2017
end_year = 2023

def download_data(url, user, pw, out_dir):
    with requests.Session() as session:
        session.auth = (user, pw)
        r1 = session.request('get', url)
        r = session.get(r1.url, auth=(user, pw))
        if r.status_code == 200:
            with open(out_dir, 'wb') as f:
                f.write(r.content)
        else:
            raise ValueError(f"Unexpected status code: {r.status_code}")

os.makedirs(output_path, exist_ok=True)
for year in range(start_year, end_year + 1):
    print(f"Working on year {year}")
    year = str(year)
    os.makedirs(os.path.join(output_path, year), exist_ok=True)
    for day in tqdm(range(1, 367)):
        day = f"{day:03}"
        os.makedirs(os.path.join(output_path, year, day), exist_ok=True)
        dataName = f"N{year}{day}00_O3_AURAOMI_24h.hdf"
        try:
            url = f"{base_url}{dataName}"
            out_path = os.path.join(output_path, year, day, dataName)
            download_data(url, user, pw, out_path)
        except Exception as e:
            print(e, f"\nCould not download file {dataName}")

Marco_EOM · May 13, 2025, 9:19am

I think the auxdata download has not been used for quite some time.
I think it would be best to go without the auxdata. I’m don’t know where to get data which is still compatible. The benefit is also limited. I think most current operational processing don’t use it.
If you really need it, it would be best to get the data from some location and provide it as parameter to the processing. The auxdata support should be removed from processor in an upcoming version or should be replaced with a working version.
I’m sorry for the effort you have already put into this.

@abruescas Ana can you help further? Anything to add?

SSteinbach · May 14, 2025, 7:38am

Thank you so much for your quick reply.
In the past, I either looked up the information manually when using the SNAP GUI or I extracted the aux data time series for the locations in question and automatically inserted them from a data table in my python script. But this is not practical for the type of scaling I am trying to do now.
So, if compatability and usefulness are questionable, I will probably stick to the default values and hope this will be addressed in a future version.