A fatal Java error occurs when running Snappy scripts

anthony.scarth · April 28, 2016, 10:15am

Hi all,

A custom script that uses Snappy to apply orbit files to many Sentinel 1 products sometimes crashes with a fatal JRE error, such as the following:

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007f14eff62f1e, pid=1443, tid=139727907419968

JRE version: Java™ SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27)
Java VM: Java HotSpot™ 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops)
Problematic frame:
C [libpython2.7.so.1.0+0xadf1e] type_dealloc+0xfe

Failed to write core dump. Core dumps have been disabled. To enable core dumping, try “ulimit -c unlimited” before starting Java again

An error report file with more information is saved as:
/home/envsys/anthony/libenvsys/python/hs_err_pid1443.log

If you would like to submit a bug report, please visit:
http://bugreport.java.com/bugreport/crash.jsp
The crash happened outside the Java Virtual Machine in native code.
See problematic frame for where to report the bug.

Aborted (core dumped)

The script will always fail with the same products (provided the same set of inputs are given). However, it is likely that these products are not faulty, as they will process correctly running through SNAP (GUI), and will even run through the script correctly if a different set of inputs are given. I do not believe that the script is at fault either, as it can successfully process products without fault with certain inputs.

e.g. The error occurs when during/after reading S1A_IW_GRDH_1SDV_20150206T173846_20150206T173911_004506_00587B_70ED.SAFE.zip, after having applied an orbit file to multiple other products. However, running this file through the script by itself, or will fewer additional products, does not result in this crash.

Additionally, it does not appear that the problem is resulting from not finding an appropriate orbit file, and is not related to a lack of memory.

Does anyone know why this might be occurring?

marpet · April 28, 2016, 1:41pm

No idea why this happens. You are still using Python 2.7, right? Have you already tried a more recent one?
Do you have the /home/envsys/anthony/libenvsys/python/hs_err_pid1443.log at hand? Can you upload it here?

anthony.scarth · April 28, 2016, 2:15pm

Thanks, marpet. I have attached the relevant log. Yes, I am using Python 2.7 on Linux.

hs_err_pid1443.log (124.2 KB)

marpet · May 1, 2016, 5:18am

Also from looking at the code I can’t really say what the reason is.
What I saw is the PrivilegedActionException. I don’t know if this causes the problem but it might indicate that you are trying to access a resource and you don’t have the right to do it. Maybe be writing to a directory.
And it seems that nearly all memory is consumed. This might be another reason.
I saw that several S1 products are accessed. Depending on what you are doing with them they might consume a lot of memory.

anthony.scarth · May 9, 2016, 9:23am

Thank you very much for taking the time to look into this issue.

Yes, the script aims to read each GRD product in turn and apply a precise orbit file. This works successfully for many of the products, but will always crash on the same product if processed with the same set of inputs. The script shows that the crash occurs at some point when the orbit file is being applied, after the product has been read successfully.

Monitoring the memory usage indicates that it never uses more than 3 Gb (there is 24 Gb on this machine). I also tried to run the same script with root privileges, but in this case, the memory quickly filled up until Java crashed with an OutOfMemoryError.

marpet · May 12, 2016, 9:18am

So maybe this is more related to the S1 processing. I’m not very familiar with this.

@lveci Do you have an idea what’s wrong here.

TinoHH · July 12, 2017, 3:17am

Hi all,

I assume that by now, you have figured out the reason why your snappy batch processing of S1 crashes on the same file.

I am processing large amounts of S1 GRD files as well using snappy on a server and got exactly the same error/crash message as you, typically on the same GRD files. I did some trials and realized that the file is not the reason. Instead, I found that it always crashes after evoking snappy for 23 times in a row. Once you write your code in a way that already processed GRD files won’t be processed again, it will crash 23 files later than before.

Do you guys have an idea on why this could happen and how I could avoid this crash?

Thanks in advance.

marpet · July 12, 2017, 10:02am

Is it possible that you stumbled over this issue (https://github.com/bcdev/jpy/issues/74)?
Do you invoke python n times or do you loop over the files and call the processing function for each of the files.
If this is the problem, you can move the get_type to __init_.py.

TinoHH · July 13, 2017, 2:47am

Hi Marpet,

Thanks a lot for your prompt reply.

I invoke python only once and then loop over the S1 files within the script. At the beginning of the code, I initialize snappy using:

import snappy
from snappy import ProductIO
from snappy import HashMap as hash
from snappy import GPF as GPF
from snappy import ProductUtils
from snappy import jpy
GPF.getDefaultInstance().getOperatorSpiRegistry().loadOperatorSpis()
snappy.HashMap = snappy.jpy.get_type(‘java.util.HashMap’)

After that, I loop through all the files and within each loop, I execute snappy functions such as:
for folder in S1_Safe_Folders:
S1image = ProductIO.readProduct(folder + “/manifest.safe”)

It crashes on the 23rd image/loop.

Do you think moving the get_type to _init.py would resolve this issue?

Thanks

marpet · July 13, 2017, 1:25pm

If you don’t use get_type within the loop it will not help moving it to __init__.py.

What kind of error do you see in the logs?

As a side note:

snappy.HashMap = snappy.jpy.get_type(‘java.util.HashMap’)

You don’t need this in your script. You already do:

from snappy import HashMap as hash

So you can directly use hash

Also this is not necessary anymore since version 5.0:

GPF.getDefaultInstance().getOperatorSpiRegistry().loadOperatorSpis()

MarkWilliamMatthews · November 17, 2017, 3:06pm

I have had the same problem. I fixed it by commenting out the below two lines:
HashMap = jpy.get_type('java.util.HashMap') parameters = HashMap()

and adding:
from snappy import HashMap as hash parameters = hash()

I also commented out
GPF.getDefaultInstance().getOperatorSpiRegistry().loadOperatorSpis() # Load all available operators.

It doesn’t crash any more. Thanks @marpet!

MarkWilliamMatthews · November 21, 2017, 3:18pm

Just a short update. I also had to remove multiple calls to jpy.get_type() in my code. I only call these in __init__ and now it does not crash. If you are looping over many files calling jpy.get_type() will cause this error.

hajar · May 16, 2019, 9:57am

I’m using the latest version of snap via snappy and calling functions in python script for the processing of Sentinel-1 GRD images. I got the same java fatal error but at the beginning already. I followed the suggestions of MarksWilliamMatthews but still got the same error.

INFO: org.esa.snap.core.gpf.operators.tooladapter.ToolAdapterIO: Initializing external tool adapters
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=1859, tid=0x00007f18bc735740
#
# JRE version: Java(TM) SE Runtime Environment (8.0_102-b14) (build 1.8.0_102-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  0x0000000000000000
#
# Core dump written. Default location: /mnt/geodata/core or core.1859
#
# An error report file with more information is saved as:
# /mnt/geodata/hs_err_pid1859.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted (core dumped)

marpet · May 20, 2019, 10:01am

Do you still have the hs_err_pid1859.log file or can you recreate it and post it? Without this file it is hard to tell what’s going wrong.

hajar · May 20, 2019, 11:03am

Dear marpet, thank you for your reply. after long investigations, it came out that the problem in OGR module (that im using inside my python script) that creates that java problem. as snappy is calling java classes inside python, it just crushes whenever a problem occurs due to another module.

kameshvinjamuri · October 4, 2022, 12:02pm

Hi hajar,
how you solved it…i got the similar issue while reading the L1 SLSTR data
thanks in advance