SNAP and HDFS

@marpet With your cluster, do you know if you do a similar thing with reading as you do writing? Eg. ProductIO.writeProduct() to a temporary file then add this to HDFS? It doesn’t seem to look like I can write a product to a path in HDFS directly from the SNAP classes.

I get a java.io.IOException: failed to create data output directory: <my directory in hdfs>

It’s either a similar issue with reading, or possibly permissions? Although I’m running the code as the owner of the folders in HDFS

Yes, we do the same for writing. First write it locally and then copy to HDFS.
Just, in case you are curios. This is the about page of our cluster. A bit outdated.
The disk space is meanwhile around ~1.5 PB.

Hi @marpet

When trying to write the file locally and copy to HDFS, I’m having an issue with saving as BEAM-DIMAP, it appears that writing to a temporary file with ProductIO.writeProduct() does not produce the .data file, only the .dim, how would you go about doing this as ProductIO.writeProduct() only takes in one File object to write to?.

I think I drop out as man-in-the-middle for now and I delegate you to my colleague @mzuehlke. He works more with the cluster.

1 Like

No worries! Many thanks for the help again!

@mzuehlke Could you offer any advice as to my above comment?

http://forum.step.esa.int/t/snap-and-hdfs/5250/23?u=ciaranevans

Hi @CiaranEvans,

the DIMAP writer tries to write the .data file into the same directory the that you specify for the .dim file.
Could it be that this directory is not writable ? Maybe you better create a temporary direcory first and the write the .dim into that directory.

If I’m pointing into the wrong direction could you share the lines of code that involve the writing?

Cheers,
Marco

I think I’m almost making too short a shortcut in my temporary file writing:

val configuration: Configuration = new Configuration()
val fileSystem: FileSystem = FileSystem.get(configuration)
val filepath = productBaseDir + filename + fileExtension
var fileToSave: File = File.createTempFile(filename, fileExtension)
ProductIO.writeProduct(product, fileToSave, fileType, false)
FileUtil.copy(fileToSave, fileSystem, new Path(filepath), true, configuration)

With FileUtil from org.apache.hadoop.fs.FileUtil, file extension being “.dim” and the rest is just filepaths relating to my folder structures.

I tried using .tif and this worked but I assume it’s because it’s writing one file to one temp file, rather than .dim which will actually make 2

I’m writing Java code below, my Scala is currently reading only :wink:

File tmpDir = Files.createTempDirectory("dummy").toFile();
File dimFile = new File(tmpDir, "my_dimpa_product.dim");
ProductIO.writeProduct(product, dimFile, fileType, false);

After that in you tmpDir shuolde be a .dim file and a .data directory
You should the open an OutputStream to HDFS and wrap it inside a ZipOutputStream

OutputStream os = fileSystem.create(new Path(filepath))
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(os))

And then write recursively write all files and directories into the zip stream example: write to zip. addDirToZipArchive(zos, tmpDir)

Hope this helps.

Hadoop can be tricky.

1 Like

Brilliant! Will give that a go, no worries! I’m a student so even Java isn’t my best! Luckily Scala and Java are very compatible :slight_smile:

Many thanks for the pointers!

Hello,CiaranEvans,
i coded it like your discribled above to creat a tempFile. And the HDFS data can lode this File , but while to use product.getBands () , there have exception, java.io.IOException: Stream closed at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy. Can you help me to solve this?

Best wish!
Xiaojian.

@Xiaojian_Gan Could you possibly show me the code you’re using to do this?

I can’t really help with what you’ve given me right now, the most I can offer is that it seems your file system has closed before you try to access it:


@CiaranEvans Thank you for your reply and help. Now, i am solved this problem.

best wish!
Xiaojian.