Sen2cor and SLURM


#1

Hi,

I’m having trouble getting sen2cor to run in an hpc environment. I basically loop over ~170 files, each time calling sen2cor with sbatch. This all works until multiple files get associated with the same node. E.g., for simplicity I generally just book 1 core per job, so a particular node might get selected multiple times to process different scenes (memory is not the issue).

So, my log files look somewhat like this:

Sentinel-2 Level 2A Prototype Processor (Sen2Cor), 2.2.1, created: 2016.04.29 started …
Traceback (most recent call last):
File “/home/rhagensi/opt/anaconda2/bin/L2A_Process”, line 11, in
load_entry_point(‘sen2cor==2.2.1’, ‘console_scripts’, ‘L2A_Process’)()
File “/home/rhagensi/opt/anaconda2/lib/python2.7/site-packages/sen2cor-2.2.1-py2.7.egg/sen2cor/L2A_Process.py”, line 221, in main
result = config.readPreferences()
File “/home/rhagensi/opt/anaconda2/lib/python2.7/site-packages/sen2cor-2.2.1-py2.7.egg/sen2cor/L2A_Config.py”, line 3258, in readPreferences
xp.export()
File “/home/rhagensi/opt/anaconda2/lib/python2.7/site-packages/sen2cor-2.2.1-py2.7.egg/sen2cor/L2A_XmlParser.py”, line 161, in export
objectify.deannotate(self._root, xsi_nil=True, cleanup_namespaces=True)
File “src/lxml/lxml.objectify.pyx”, line 1731, in lxml.objectify.deannotate (src/lxml/lxml.objectify.c:24548)
File “src/lxml/cleanup.pxi”, line 49, in lxml.etree.strip_attributes (src/lxml/lxml.etree.c:150273)
File “src/lxml/apihelpers.pxi”, line 63, in lxml.etree._rootNodeOrRaise (src/lxml/lxml.etree.c:15695)
TypeError: Invalid input object: NoneType

Did anyone run into similar troubles already? It’s hard for me to judge whether this is an issue with sen2cor or just SLURM being set up unfavorably.

Also, just to reiterate: I do not aim to parallelize sen2cor processing via the config-file. For me the ideal solution would be to process many different scenes yet each on one core. And everything works fine except when processing multiple scenes on the same node. Any ideas?

Thanks in advance,
Ron


#2

I rhagensi,

I have the nearly the same problem running in HPC environment.

Sentinel-2 Level 2A Prototype Processor (Sen2Cor), 2.2.1, created: 2016.04.29 started …
Traceback (most recent call last):
File “/dpc/app/hosted/L2A_Process/02.02.01/install/anaconda/bin/L2A_Process-02.02.01”, line 9, in
load_entry_point(‘sen2cor==2.2.1’, ‘console_scripts’, ‘L2A_Process-02.02.01’)()
File “/dpc/app/hosted/L2A_Process/02.02.01/install/anaconda/lib/python2.7/site-packages/sen2cor-2.2.1-py2.7.egg/sen2cor/L2A_Process.py”, line 221, in main
result = config.readPreferences()
File “/dpc/app/hosted/L2A_Process/02.02.01/install/anaconda/lib/python2.7/site-packages/sen2cor-2.2.1-py2.7.egg/sen2cor/L2A_Config.py”, line 3488, in readPreferences
xp.export()
File “/dpc/app/hosted/L2A_Process/02.02.01/install/anaconda/lib/python2.7/site-packages/sen2cor-2.2.1-py2.7.egg/sen2cor/L2A_XmlParser.py”, line 161, in export
objectify.deannotate(self.root, xsinil=True, cleanup_namespaces=True)
File “src/lxml/lxml.objectify.pyx”, line 1728, in lxml.objectify.deannotate (src/lxml/lxml.objectify.c:24541)
File “src/lxml/cleanup.pxi”, line 49, in lxml.etree.strip_attributes (src/lxml/lxml.etree.c:150093)
File “src/lxml/apihelpers.pxi”, line 63, in lxml.etree._rootNodeOrRaise (src/lxml/lxml.etree.c:15691)
TypeError: Invalid input object: NoneType

For the time being, I’m re-configure my HPC in order to run only 1 instance of L2A with full product (not by tile).
I have just update L2A_GIPP.xml in order to add L2A internal parallelism (eg <Nr_Processes>8</Nr_Processes>).
be careful with this parameter, I’ve try AUTO (to get all my node ressources) and I went in out-of-memory (70 GB of mem)

Regards
Christophe


#3

Hi Christophe,

thank you for your reply. It’s good to see I’m not alone with that. It appears there is no other way except this workaround, though I wonder if the initial error shouldn’t be easy to fix.

I have now observed that while it emerges on the HPC, if I run multiple instances of sen2cor locally on my pc I do not experience any of these problems, which I find very surprising.

Thanks again,
Ron


#4

Dear all,
I have the same issue on a CentOS 7 platform. Sen2cor was working well, then I tried to run it using GNU parallel command, and now I get the same error message as Ron, even if I don’t run it in parallel.

I guess some xml file has been corrupted, but the error message does not tell which one.
Best regards,
Olivier


#5

Hi Olivier,

I haven’t made any progress over the last weeks. I found some ways to reduce these errors, but I finally switched to F-MASK (as for me the cloud mask was more important than the atmospheric correction itself).

  1. If you reinstall sen2cor and overwrite everything, it should work again.
  2. I tried a lot of things to circumvent the error. Including the installation of hundreds of separate condas + sen2cors, all exactly linked with variables. - Problems still occur (though the error message differs and problems are fewer).
  3. I observed that the files not only get corrupted when one instance of sen2cor is executed from different nodes, but also when there is not enough time between the starting of processes (maybe a filesystem issue?). So for example if one job finishes, and another job directly continues with the next scene, the problem may also occur. Even workarounds such as sync && sleep 10 minutes didn’t help.

So I gave up in the end, as it was just a very painful hassle. If for you also the cloud masks are your main concern, I would highly recommend to take F-MASK, which I could use for hundreds of jobs in parallel.

Cheers,
Ron


#6

Thanks Ron,
Yes reinstalling works, at least I have a correct configuration now.
I’ll parallelize with different computers.

I have my own cloud mask, MACCS, I just wanted to make some comparisons.

Dear Sen2cor developers, did you register this issue as a bug ?


#7

Dear All,
I encountered no difficulty in installing Sen2cor 2.3.0 on a linux centos 7 computer. Good job to all Sen2cor team !

Do you know if this new version solves the issue, described above, with running sen2cor in parallel ?
Best regards,
Olivier


#8

I haven’t tried it yet. I don’t expect any changes, but I will inform you when I will try to run it in the future.


#9

Hi all,
Sen2cor used to fail when it was launched in parallel, for instance using gnu parallel command. Do you know if this issue has been fixed in version 2.4.0 ?
Thanks,
Olivier


#10

Hi Olivier & all

I assume that the error you describe was caused by a rewriting of the configuration file. It was recorded as Bug SIIMPC-599 and according to my records, this issue has been fixed since Sen2Cor 2.3.0, as is desctibed in the actual release note. Sorry for not mention this earlier, the forum is sometimes a little bit confusing.

cheers,
Uwe

SIIMPC-599
Rewriting of L1C tiles metadata: the xml parser used a prettifier which failed when L1C data are blocked for writing or write protected. This has been removed so that L1C data can be read only without affecting the execution.


#11

Thanks Uwe,
I can confirm it works !
Olivier