Cache filling up very fast

goofydude · January 13, 2021, 10:01am

Hi so i am running out of space for the cache storage i have 3k S2 images to process but after only 223 images i have filled my C drive that is 372GB of available space (dont have the luxury of being able to have a dedicated cache drive would need to be like 4TB). The cache actually fills up faster than the output location for the processed images

I am just doing S2 L1C though S2resampler(20m) then C2RCC in a multiprocessing script calling the gpt. eg there are 4 instances of “(”/home/btera/snap/bin/gpt " + xml_file + " -f NetCDF4-CF -q 10 -c 60G -x -Ssource=" + y + " -t " + z)"

I have the S2 cache setting set to daily deletion but it actually only takes ~1h to fill the 372GB of space so the deletion is not fast enough.

Options > Performance > Cache Size (MB) is set to 1024 (never sure if this affect the gpt or not)

My other question is does calling the gpt count as “start up” so i could have “max time in cache” set to delete on each start up?

I figure i probs have something wrong in some settings somewhere. Anyone’s experience would be helpful here thanks.

marpet · January 14, 2021, 10:18am

Hi,
yes, you are right. For mass production the cache does not very well.
We have improvements already on our roadmap.
[SIITBX-277] Add maximum size option in S2TBX cache - JIRA (atlassian.net)

@MartinoF It might be good to address this for the next release.

I think setting the S2 cache to be deleted at every start should fix your issue.

goofydude · January 14, 2021, 11:02am

thanks @marpet that fixes it for me.
I had seen discussion of that maximum size in the forums in 2017 and your request for it to be implemented so didnt know if it had been implimentd and i just couldnt find it the option.

goofydude · January 19, 2021, 9:31am

Hi @marpet sorry to re open a closed thread but this solution is not a viable solution in this instance.

It took me while to realise as technically the process runs fine and produces all the outputs. However since enabling deletion on start-up i get incomplete images with a random number of blocks processed within them (not all images only some).

My theory (totally untested) is that because im running multiple instances that start up of one clears the cache for one of the running instances causing it to think its finished whilst it is in the middle of processing a tile. Or it could be something completely unrelated thoughts?

marpet · January 19, 2021, 10:19am

Oh, I’m sorry to hear this. I haven’t thought of this. Yes, you are probably right with your assumption. When you have multiple processes running, than they can interfere.

@FlorianD So this should be changed in the near future. Maybe by a GB limit which is kept or by allowing to specify a cache directory separate for each gpt call/process. Or by some other good idea.

@goofydude Not sure what I can suggest to you as a short-term solution.
You could modify your current script so that you wait for the four instances to end and then delete the cache (<USER_DIR>\.snap\var\cache). The S2TBX cache setting in SNAP you would change to never.

goofydude · January 19, 2021, 11:27am

thanks for the response @marpet i should imagine it is not a problem experienced by many as for most processes in snap i can get away with older than 1 day and you have to be multithreading to cause it.

@FlorianD Thanks for those suggestions. I am no programmer nor expert but i have also made some suggestions.

A GB limit: Even by doing oldest first you would have to know that all current threads had enough space to complete their jobs.
Separate directories per thread: possible but by the time you are running 60 instances of snap (have achieved with light tasks) the size would still be considerable.
increased granularity: have an older than 1h, 2h, 12h option so that only older items (hopefully not in use) in cache will delete on startup
turn cache off: I have no idea if cache is mandatory for processes to function.
lock items in use: if snap is reading a file lock it from deletion when delete on startup is called.

@marpet thanks for your short term solution suggestion it would slow down processing quite a bit due to the wait times of ‘uneven’ images but would work as a stop gap. I will see if i can get a large drive to dedicate for cache storage so i can hold 1days worth.

goofydude · February 4, 2021, 1:38pm

Just wanted to add that in the interim before any update from SNAP 8.0.2

I have just been running a bash script below that fixes the issue (adjust times to make sure a single itteration of your prcoessing doesnt take longer than the time and you can cope with the cache size created in the sum of those two times) (linux only)

#!/bin/bash
for i in {1…1000}
do
find ~/.snap/var/cache/s2tbx/l1c-reader/8.0.0 -type d -mmin +60 -exec rm -r {} ;
echo "cleanup number $i "
sleep 30m
done