December of 2015 I reported to EOSupport (CDS-3622) that the “corrupt” file issue when using software that resumes the download was caused by a Windows server in their download path inserting an error message into the download stream when the error occurs. When the file is resumed at the next byte, which is after the Windows error message, the file ends up corrupt.
My solution was to set retries to 0 so the software I was using (wget/curl) to download would not automatically resume. I then assumed that there were bytes of a Windows error message appended to my file, which was not always the case. I then truncated a number of bytes from the file and then resume the download. This is implemented in a loop so every failure results in truncation and resume. I used 8192 byte truncation which is much larger than the appended error message. In Linux this is: truncate -s -8192 ${SAFE}.zip
The error message appended to the file is always similar to: <?xml version='1.0' encoding='UTF-8'?>error xmlns=“http://schemas.microsoft.com/ado/2007/08/dataservices/metadata”>code />message xml:lang=“en” />/error>
(note that I had to remove some of the XML formatting to get the message to display on the Forum)
After implementing the "truncate before resume’ work around, file corruption rate dropped from 75% to less than 5%.
the script you mentioned (from author Max König) has an issue: It blows the whole SAFE.zip into memory before finally writing it into a file on disk. This might be a concern, as you can see from the comments section, where user Sunny reports:
MemoryError: out of memory
This issue becomes relevant especially when dealing with those huge Sentinel2 multi-tile packages (about 8GB per SAFE.zip).
yes. I noticed there was a RAM issiue but didn’t have the time to look into it (and had plenty of RAM to blast). Thanks for pointing out the soruce of the problem.
Do you know a way to solve it? I’m not very knowledgable about the methdods used.
If you are not too keen on using GAFA’s clouds, download from PEPS is now faster ! PEPS is the French collaborative ground segment, and it provides all Sentinel data globally. To allow that with a moderate cost and low electric consumption, most of the data is stored on tapes, except for a couple of petabytes on disks, which are used as a cache. A new version of peps_download.py tool allows to speed-up the downloads from PEPS, staging the reading of tapes while downloading the products already on disks
Fortunately it was acknowledged at many levels that the SciHub did not provide enough performance and functionality and EU Commission have now procured 5 DIAS that give faster access and more functionality to sentinel data.
Some of them are very fast. Following are the links that I could find:
1)
EUMETSAT, ECMWF and Mercator Océan http://wekeo.eu
2)
ATOS Integration, consortium includes T-SYSTEM International, DLR, eGEOS, EOX, GAF, Sinergise Ltd, Spacemetric, and Thales Alenia Space. http://mundiwebservices.com
3)
Airbus Defence and Space, consortium includes Orange SA, Airbus Defence and Space, Geo SA, Capgemini Technology Services SAS, CLS and VITO http://sobloo.eu
4)
Serco Europe, OVH, Gael Systems and Sinergise Ltd.
5)
Creotech Instruments, Cloud Ferro, Sinergise Ltd, Geomatis SAS, Outsourcing Partner Sp. z o.o., Wroclaw Institute of Spatial Information and Artificial Intelligence Sp. z o.o. http://creodias.eu