Issue with Supervised random forest classification

Good day,

I’m evaluating the Random Forest classification for crop and land cover analysis based on Sen-2 imagery.
The aim is to apply a multi temporal approach including all 13 spectral bands and in addition: ndvi, lai, fapar and fcover.

For now I use 1-c products and select more or less cloud-free scenes.

The Random Forest classification works fine for single image analysis – the trouble starts when I want to use my cloud-masked imagery.

In those cases the Random Forest classification is also executed without any error message – but after a few minutes a result is created where the LabeledClasses Band does not have any classification results (for not cloud masked subsets the processing takes more than 2 h).

Due to the fact that my non-cloud-masked subsets are processed nicely I suspect that I miss something when I apply my cloud mask work flow.

Any advice is highly appreciated.

My work flow is as follows:

1: I create products with a) Idepix cloud masks and b) Biophysical inices:

Running Random Forest classifications on those new products works fine.

(At that stage the Idepix masks are just masks in the Masks folder – but they are not jet applied to the various bands, as far as I understand)

2: I combine all IDEPIX cloud and snow masks to one cloud_mask band:


3: To get a smoother cloud mask I apply a buffer via Raster > Filtered Band… e.g. Maximum 5x5 :


4: I apply the newly created smoothed cloud_mask via Band Maths to all bands I want to include into the Random forest classification:

The following example is indicating that all clouds now have no data value (the image below also includes some training area vectors)

5: The moment I execute Random Forest classification on cloud masked products I get such empty results:


Any advice is highly appreciated !!!

Hello everyone,

I’ve performed RF classification on a S2L2A subset with good results, and then I want to run the same classifier (with same training vectors) on the same subset but with some places masked out. For this, I created a vector file of the areas I want to mask out (urban and golf fields) and used land/sea mask to obtain a new product.

But RF fails with this masked image. I’ve tried modifying pixel values before classification (Nan, 0, -1 and un-selecting No data value) without success. I always get an empty image.

Anyone has idea how to solve this?

Thanks in advance,

You could use the vector in the valid pixel expression of the S2 bands to declare these pixels invalid before running the RF classifier.

e.g. if the vector is named golf you enter !=golf in all band properties to have them removed.

Hello Andreas,
I tried with the expressions: !vector_name and B1!=vector_name for each S2 band before classifying, but it still classifies inside the vectors…

On this old post : Missing vectors on vector training box - #31 by Sara.Aparicio i understand they have similar problem, and doesn´t resolve it yet too. Maybe is a soft problem

I solved it by creating new bands in the original S2 image, with the simple expression:
if vector then 0 else Band
So the pixels inside the polygons I do not want to classify get 0 value. Then the classifier worked fine, although there is still an issue because the “masked” pixels are assigned to an existing class.
It was not possible to solve it using the Valid-Pixel Expression in band properties…

Could you again remove the classified pixels with the vector in the end then?

I could do that, but I guess if the classifier includes those pixels, the results will differ. That is why I wanted to remove them before classification.

If they are not part of the training, the results will not change because the classifier only applies the learned rules to the untrained pixels. If some of them don’t contain valid results, you can simply remove them in a post-classification step.

I think they do differ, look at this two results; the one on the left is the classification with the complete image, on the right you can see the classification of the image with pixels masked (before class.) You can notice some differences between them at a glance…

I don’t think that the difference is caused by the masking. Technically, every random forest produces different results, even if you run it twice on the exact same inputs. This is because both subsets of the training data and the raster data are randomly shuffled in each iteration to construct the ruleset. Accordingly, the thresholds to separate the classes can change unless you use a really large number of trees.

How much trees did you use to compute the results?

As default, 10
And yes sure, you are right about the randomness of the random forest :sweat_smile:

You can run the same configuration (with masks) twice to estimate how large the variation between two runs is based on the random sampling. If it is too high for you, simply increase the number of trees which makes the class thresholds more robust.