Hi @marpet . Thank you for looking into this issue. It seems that whilst the servers were under load, the Terrain Correction module would continue to request data from the SRTM server on a loop. These requests seemed to be continuous and did not time out automatically if the data was not available, presumably continuing forever.
I have simulated this behaviour by turning off the internet to my machine and attempted to process a product where the SRTM tiles are not available in the local cache. Please see the attached video demonstrating the real-time frequency of server requests.
We were unaware of this behaviour over the weekend, and did not know that the processing chain would continue to do this. This seems to have caused our Amazon node to be reported for a suspected DDOS attack on the ESA server during this time, probably related to the frequency of requests created by SNAP.
I am querying whether this module of SNAP will be updated to better handle failed requests from the server if not successful after a certain period of time, or number of attempts. It might also be beneficial if there was more of a longer delay between requests, so as not to overload the SRTM server in times of high demand.
Otherwise, I am wondering if there are any other steps that we can take to ensure that the processing does not simply wait indefinitely for the SRTM to become available in case of future server downtime, and to reduce the load this might cause on the ESA systems.
===
EDIT:
I have discovered that the Terrain Correction procedure will contact the tile server, regardless of whether or not the tile already exists in the local cache. This is presumably to checksum the local file against the one on the server, to re-download if necessary.
Can you please confirm that this is the expected behaviour, as the processing will still halt if the external server is down, even if the tiles are available on the machine?