Fast way to Downloading Sentinel-2

December of 2015 I reported to EOSupport (CDS-3622) that the “corrupt” file issue when using software that resumes the download was caused by a Windows server in their download path inserting an error message into the download stream when the error occurs. When the file is resumed at the next byte, which is after the Windows error message, the file ends up corrupt.

My solution was to set retries to 0 so the software I was using (wget/curl) to download would not automatically resume. I then assumed that there were bytes of a Windows error message appended to my file, which was not always the case. I then truncated a number of bytes from the file and then resume the download. This is implemented in a loop so every failure results in truncation and resume. I used 8192 byte truncation which is much larger than the appended error message. In Linux this is: truncate -s -8192 ${SAFE}.zip

The error message appended to the file is always similar to: <?xml version='1.0' encoding='UTF-8'?>error xmlns=“http://schemas.microsoft.com/ado/2007/08/dataservices/metadata”>code />message xml:lang=“en” />/error>
(note that I had to remove some of the XML formatting to get the message to display on the Forum)

After implementing the "truncate before resume’ work around, file corruption rate dropped from 75% to less than 5%.

I now only download much faster and ~100% reliably from Amazon: http://sentinel-pds.s3-website.eu-central-1.amazonaws.com/

Google also has a mirror: see https://cloud.google.com/storage/docs/public-datasets/sentinel-2

Hi @unnic,

the script you mentioned (from author Max König) has an issue: It blows the whole SAFE.zip into memory before finally writing it into a file on disk. This might be a concern, as you can see from the comments section, where user Sunny reports:
MemoryError: out of memory
This issue becomes relevant especially when dealing with those huge Sentinel2 multi-tile packages (about 8GB per SAFE.zip).

@cgo

yes. I noticed there was a RAM issiue but didn’t have the time to look into it (and had plenty of RAM to blast). Thanks for pointing out the soruce of the problem.
Do you know a way to solve it? I’m not very knowledgable about the methdods used.

Sorry for my late answer here (I missed your reply).

To avoid the memory issue, the HTTP response has to be read block-wise. When using vanilla Python2 (without any 3rd-party libs), one option to accomplish this is to use shutil.copyfileobj(), like explained here: https://stackoverflow.com/questions/1517616/stream-large-binary-files-with-urllib2-to-file .

As an alternative, especially if you want to do some extra stuff in the course of reading the response (like calculating a MD5 sum or setting some progress notification), you can simply iterate over read(BLOCK_SIZE) manually, like implemented here: https://github.com/EsriDE/ArcGIS-Sentinel2-Download-Tools/blob/master/lib/sensub.py#L226

for block in iter(lambda:rsp.read(8192),""): # urllib.retrieve() has 8KiB as default block size, shutil.copyfileobj() 16KiB.
  f.write(block)

(When using Python3, maybe there are some other options… I have not yet investigated this topic with regard to Python version 3.)

If you are not too keen on using GAFA’s clouds, download from PEPS is now faster ! PEPS is the French collaborative ground segment, and it provides all Sentinel data globally. To allow that with a moderate cost and low electric consumption, most of the data is stored on tapes, except for a couple of petabytes on disks, which are used as a cache. A new version of peps_download.py tool allows to speed-up the downloads from PEPS, staging the reading of tapes while downloading the products already on disks

http://www.cesbio.ups-tlse.fr/multitemp/?p=12964

Hi all, would you know if sentinelsat package has a TCI only download option for Sentinel 2?

Hi there

Just for information.

Fortunately it was acknowledged at many levels that the SciHub did not provide enough performance and functionality and EU Commission have now procured 5 DIAS that give faster access and more functionality to sentinel data.

Some of them are very fast. Following are the links that I could find:
1)
EUMETSAT, ECMWF and Mercator Océan
http://wekeo.eu
2)
ATOS Integration, consortium includes T-SYSTEM International, DLR, eGEOS, EOX, GAF, Sinergise Ltd, Spacemetric, and Thales Alenia Space.
http://mundiwebservices.com
3)
Airbus Defence and Space, consortium includes Orange SA, Airbus Defence and Space, Geo SA, Capgemini Technology Services SAS, CLS and VITO
http://sobloo.eu
4)
Serco Europe, OVH, Gael Systems and Sinergise Ltd.


5)
Creotech Instruments, Cloud Ferro, Sinergise Ltd, Geomatis SAS, Outsourcing Partner Sp. z o.o., Wroclaw Institute of Spatial Information and Artificial Intelligence Sp. z o.o.
http://creodias.eu

1 Like

A post was split to a new topic: Blan result after supervised classification. Why?