SNAP and HDFS

Hi there,

I was wondering whether anyone had managed to use the SNAP Engine with Hadoop, I’m trying to access files via HDFS to open with ProductIO.readProduct() but I’m having no such luck. The method takes in a Java File object but I’m struggling to go from HDFS -> File -> SNAP.

Many thanks for any help :slight_smile:
Ciaran

We use Hadoop on our Calvalus cluster too.
Most of our readers can’t utilise an InputStream and this is what you get from HDFS, so I understand your problem.
So we have a little utility class which copies the file from HDFS to the local node into a temporary directory and then we feed this local file copy into SNAP. When the processing is done we clean up the temporary dir.
Works pretty good.

Interesting, I was wondering whether to just do a File.createTempFile() and go from there, do I need to save it as a specific format?

I was getting a null return from ProductIO.readProduct(), unfortunately it’s a bit of a pain debugging from IntelliJ onto our cluster…

A specific format is not needed. But the file name and its extension is important for several readers. Therefor the createTempFile method will not work.

If the read method returns null it means in most cases that the format is not supported and maybe a plugin is missing. Maybe because if the tempFile.

1 Like

Ah that would explain it, just tried using this:

  val inputStream: InputStream = FileUtils.openInputStream(new File("/home/ciaran/scala/resources/" +
    "S1A_IW_GRDH_1SDV_20161221T174100_20161221T174125_014481_017804_91CB.zip"))
  var productFile: File = File.createTempFile("S1A_IW_GRDH_1SDV_20161221T174100_20161221T174125_014481_017804_91CB",
    ".zip")
  FileUtils.copyInputStreamToFile(inputStream, productFile)
  val product = ProductIO.readProduct(productFile)
  productFile.delete()
  println(product.getDisplayName)

This prints out the correct display name :slight_smile: (Also apologies, it’s technically /not/ Java, messing around with Scala atm)

Thanks again @marpet!

@marpet, sorry for all the questions!

I’ve managed to get past the file issue, I’m now experiencing a similar issue to before where the operators I need are not available, yet locally it runs OK.

Do you do anything to ensure the operators are available up in Hadoop ‘world’?

No problem. It’s good to see that SNAP is in use.

You can try the following:

GPF.getDefaultInstance().getOperatorSpiRegistry().loadOperatorSpis()

Actually this is done automatically since SNAP 5.0.
But you can investigate which operators are available:

GPF.getDefaultInstance().getOperatorSpiRegistry().getOperatorSpis()

As far as I know (I’m not one of the main developer of our cluster) all jars are in one directory and we put them on the class path. So actually not different to the local use case.

Interesting, its strange that they seem to not be available as soon as I run it on Hadoop, for reference I had to do this to get them working locally, I didn’t seem to manage to get it running automatically.

def provisionSentinel1ProductReader(): Unit = {
val plugins = ProductIOPlugInManager.getInstance().getAllReaderPlugIns
var hasSentinel1ProductReader = false
while (plugins.hasNext) {
  val plugin = plugins.next()
  if (plugin.isInstanceOf[Sentinel1ProductReaderPlugIn]) {
    hasSentinel1ProductReader = true
  }
}
if (!hasSentinel1ProductReader) {
  val manager = ProductIOPlugInManager.getInstance()
  manager.addReaderPlugIn(new Sentinel1ProductReaderPlugIn)
  manager.addWriterPlugIn(new BigGeoTiffProductWriterPlugIn)
}

}

And here are the dependencies I’m using:

<dependency>
        <groupId>org.esa.snap</groupId>
        <artifactId>snap-core</artifactId>
        <version>${snap.version}</version>
    </dependency>
    <dependency>
        <groupId>org.esa.s1tbx</groupId>
        <artifactId>s1tbx-io</artifactId>
        <version>${s1tbx.version}</version>
    </dependency>
    <dependency>
        <groupId>org.esa.s1tbx</groupId>
        <artifactId>s1tbx-op-sar-processing</artifactId>
        <version>${s1tbx.version}</version>
    </dependency>
    <dependency>
        <groupId>org.esa.snap</groupId>
        <artifactId>snap-bigtiff</artifactId>
        <version>${snap.version}</version>
    </dependency>

Where ${snap.version} == 5.0.3 and ${s1tbx.version} == 5.0.0

Glad that you’re glad it’s being used, doing some really interesting stuff with it!

@marpet, just ran getOperatorSpis() locally and get a set of size 60 back, looking inside I can see that it contains the ‘Calibration’ operator.

However, if I run the same code on the cluster, I get a set of size 12. If I run GPF.getDefaultInstance.getOperatorSpiRegistry.loadOperatorSpis() and check the size I still only get 12 operators on the cluster.

The operators I get on the cluster are:
Write
Resample
Subset
WriteRGB
BandMaths
Read
Merge
Reproject
Import-Vector
PassThrough
Mosaic
ProductSet-Reader

These 12 operators are only those which are contained in the module snap-gpf. So somehow you miss the S1 modules

I thought that might be the case, unsure how this is happening though, the S1 modules are available when ran locally… The code is the exact same when run on the cluster. Any idea to a work around for this? Or is this possibly something with how I’m running it?

Appreciate the help :slight_smile:

Yes. The cause for this must be how you instantiate the VM and provide the class path. It’s not in the source.
I assume that you also can’t create any instance of a S1 class like CalibrationOp.
Try

Operator tempOp = new org.esa.s1tbx.calibration.gpf.CalibrationOp();

I guess you will get a NoClassDefFoundError.

Thanks I will give this a go, I’m a bit of a novice at this so might be a few more questions to follow :’)

Hi @marpet

I tried creating the operator as you said, annoyingly it ran fine and created it!

I’ve looked at the compiled jar i’m running on the cluster and it definitely contains the CalibrationOp.class etc.

Here’s a screenshot to show:

At a loss now, I can access them but the OperatorSpi’s just aren’t being populated…

If I manually add the operator using:

val calibrationSpi: OperatorSpi = new org.esa.s1tbx.calibration.gpf.CalibrationOp().getSpi
GPF.getDefaultInstance.getOperatorSpiRegistry.addOperatorSpi(calibrationSpi)

I then get:

Exception in thread "main" org.esa.snap.core.gpf.OperatorException: Operator 'CalibrationOp': Value for 'Source Band' is invalid: 'Intensity_VH'

I don’t want to do this manually though as locally I already have that operator available. I know however that those parameters for Source Band are correct as again locally it runs fine.

For reference, we believe now that a call in DefaultServiceRegistry in com.bc.ceres.core to getService(serviceName) is not returning the Operator, whilst debugging we can see that the services Hashmap is indeed only 12 large. Not sure how this gets populated and whether it is system dependent?

DefaultServiceRegistry

Meanwhile I have a guess. It seems that you compile the classes differently than we do.
Do you consider als the file in resources/META-INF/services
There are the OperatorSPIs defined.
See

Really do appreciate the help @marpet

As reference here’s my whole pom.xml for the code I’m running, any pointers as to where I should change it are appreciated :slight_smile:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.mint</groupId>
    <artifactId>sar-imagery</artifactId>
    <version>0.0.1-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <encoding>UTF-8</encoding>
        <scala.tools.version>2.11</scala.tools.version>
        <scala.version>${scala.tools.version}.8</scala.version>
        <spark.version>2.0.0.cloudera1</spark.version>
        <geotrellis.version>1.1.0-RC2</geotrellis.version>
        <spark.scope>compile</spark.scope>
        <snap.version>5.0.3</snap.version>
        <s1tbx.version>5.0.0</s1tbx.version>
    </properties>

    <dependencies>
	<!-- GIS -->
        <dependency>
            <groupId>org.locationtech.geotrellis</groupId>
	    <artifactId>geotrellis-spark_${scala.tools.version}</artifactId>
	    <version>${geotrellis.version}</version>
            <scope>${spark.scope}</scope>
    	</dependency>
        <dependency>
            <groupId>com.vividsolutions</groupId>
            <artifactId>jts-core</artifactId>
            <version>1.14.0</version>
            <scope>${spark.scope}</scope>
        </dependency>
        <!-- Scala -->
        <dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>${scala.version}</version>
            <scope>${spark.scope}</scope>
        </dependency>
        <!-- Spark -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.tools.version}</artifactId>
            <version>${spark.version}</version>
            <scope>${spark.scope}</scope>
        </dependency>
        <!-- SNAP -->
        <dependency>
            <groupId>org.esa.snap</groupId>
            <artifactId>snap-core</artifactId>
            <version>${snap.version}</version>
        </dependency>
        <dependency>
            <groupId>org.esa.s1tbx</groupId>
            <artifactId>s1tbx-io</artifactId>
            <version>${s1tbx.version}</version>
        </dependency>
        <dependency>
            <groupId>org.esa.s1tbx</groupId>
            <artifactId>s1tbx-op-calibration</artifactId>
            <version>${s1tbx.version}</version>
        </dependency>
        <dependency>
            <groupId>org.esa.s1tbx</groupId>
            <artifactId>s1tbx-commons</artifactId>
            <version>${s1tbx.version}</version>
        </dependency>
        <dependency>
            <groupId>org.esa.s1tbx</groupId>
            <artifactId>s1tbx-op-sar-processing</artifactId>
            <version>${s1tbx.version}</version>
        </dependency>
        <dependency>
            <groupId>org.esa.snap</groupId>
            <artifactId>snap-bigtiff</artifactId>
            <version>${snap.version}</version>
        </dependency>
        <!-- Test -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.specs2</groupId>
            <artifactId>specs2-core_${scala.tools.version}</artifactId>
            <version>3.7.2</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.specs2</groupId>
            <artifactId>specs2-junit_${scala.tools.version}</artifactId>
            <version>3.7.2</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest_${scala.tools.version}</artifactId>
            <version>3.0.0-M15</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>
        <plugins>
            <plugin>
                <!-- see http://davidb.github.com/scala-maven-plugin -->
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                        <configuration>
                            <args>
                                <arg>-dependencyfile</arg>
                                <arg>${project.build.directory}/.scala_dependencies</arg>
                            </args>
                        </configuration>
                    </execution>
                </executions>
                <configuration>
                    <scalaVersion>${scala.version}</scalaVersion>
                    <scalaCompatVersion>${scala.tools.version}</scalaCompatVersion>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.2</version>
                <configuration>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

        </plugins>
    </build>

    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
        <repository>
            <id>snap-repo-public</id>
            <name>Public Maven Repository for SNAP</name>
            <url>http://nexus.senbox.net/nexus/content/repositories/public/</url>
            <releases>
                <enabled>true</enabled>
                <checksumPolicy>warn</checksumPolicy>
            </releases>
            <snapshots>
                <enabled>true</enabled>
                <checksumPolicy>warn</checksumPolicy>
            </snapshots>
        </repository>
    </repositories>

    <profiles>
        <profile>
            <id>cluster</id>
            <properties>
                <spark.scope>provided</spark.scope>
            </properties>
        </profile>
    </profiles>

</project>

I think you will need this transformer:
https://maven.apache.org/plugins/maven-shade-plugin/examples/resource-transformers.html#ServicesResourceTransformer

It merges multiple META-INF/services files in a single one.

1 Like

Brilliant! Now past where it was breaking, many thanks for that @marpet!

Really appreciate you taking the time to help, I understand it’s not the usual SNAP/snappy questions haha!