Issue with Supervised random forest classification

Your training areas look really small. As stated here and in the SNAP help, the standard number of input pixels per tree is 5000. That means if you have less than 5000 training pixels at all, each run is based on the same subset. That makes the random forest approach quite ineffective.

Thank you sir. I will increase the training sets will try again.

If I may use this thread for issues with Random Forest Classification, please let me expose this one I have with a recent classification I’m trying to do.
I’ve applyied a land/sea mask to an MSI I have to define ROI of an admnistrative region. I want to classify this region and I tested MLC and RFClassifier. The thing is, the results for the MLC came all inside the region of interest wiht no attibut/class outside, but it didn’t happen for the RF, when using the same image bands and training areas.
Here is two images, with the results for MLC and RF:



Is there any way to solve this issue? What could be happening with the results of RF or its inputs?

Another question I have and in this case it could be applied for the two classifiers, is why the percentages for the no data values are 0.000%? In the case of the RF labeled classes, there is a lot of pixels without any information (if you notice most of dark pixels are no information data).

Can any one please, help me, or know what is happenig with this results?
Thank you.

Hm, is it an option to just exclude the class? What happens when you open the band properties of the classified raster and check “No-Data value used” and define 0 as no data?


1 Like

Wow!! Something really happened. I think you solve it!! Thank you a lot!! The results are similar to the MLC, I have to analyse this later , but comparing the percentages for both classifiers this seems to be the right direction (in the process though I have lost the water class representation at cyan colour, (value:0 for class) can it be done anyting about it ?)

RF Classification results:

MLC results:

Maybe the problem is just RF Classifier didn’t actual make it to classify the water from the start:
I just applied a band math expression, for class 0 to hightlight it between the two classifiers:



Thank you so much! :slight_smile:

1 Like

the random forest leaves 0 when the level of confidence is not reached (see links below). But of course, if water was in your training data, it should be included in the resulting classification. You can try masking out the water areas again, by creating a nodata mask before the classification and then apply it on the classified RF image.

1 Like

I will take look to your advise and links to see how it works out.
Thank you so much ABraun for your time and attention.
You have been a helpful friend to me, I wich you all the best !!

Dear ABraun,

The standard input pixels per tree is 5000 (For Random forest algorithm). Therefore, we couldn’t use single pixel for training data, right? I quite a bit confuse because the training data that I collected via GPS represent as point and I also 10-fold cross validation.

  • Should I make training data as polygon (base on field data collection)?
  • Is it okay for publication for doing that? (sorry some questions seem far from the forum)

Thank you very much

you can use point samples, but as you say, if you have less than 5000, each tree is generated based on all available points and the advantage of permutation of a random forest classifier is partly lost (it also shuffles the input rasters).
You can create polygons around your samples to increase the number of potential samples. Your points collected in the field are a good base for this and I think it is scientifically correct to extend this points to representative area around.

If you have enough samples, you can also leave some of them out of the classification process and later use them for validation (did the classifier predict the correct class at the sampled location)?

1 Like

Thank you very much bro ABraun.

Bro ABraun,

  • if I have only 800 samples (points) and I decrease the input parameter: Number of training samples to 800 or 700, would it be enough for make classification ?

Lots of thanks

you either create polygons around your points (depending on the radius and pixel size, this already multiplies your input values) or you decrease the number of samples. To make it effective, I would reduce it to 2/3 of the whole data, so around 550.

1 Like

Why after the re projection image not loading?

Good day,

I’m evaluating the Random Forest classification for crop and land cover analysis based on Sen-2 imagery.
The aim is to apply a multi temporal approach including all 13 spectral bands and in addition: ndvi, lai, fapar and fcover.

For now I use 1-c products and select more or less cloud-free scenes.

The Random Forest classification works fine for single image analysis – the trouble starts when I want to use my cloud-masked imagery.

In those cases the Random Forest classification is also executed without any error message – but after a few minutes a result is created where the LabeledClasses Band does not have any classification results (for not cloud masked subsets the processing takes more than 2 h).

Due to the fact that my non-cloud-masked subsets are processed nicely I suspect that I miss something when I apply my cloud mask work flow.

Any advice is highly appreciated.

My work flow is as follows:

1: I create products with a) Idepix cloud masks and b) Biophysical inices:

Running Random Forest classifications on those new products works fine.

(At that stage the Idepix masks are just masks in the Masks folder – but they are not jet applied to the various bands, as far as I understand)

2: I combine all IDEPIX cloud and snow masks to one cloud_mask band:


3: To get a smoother cloud mask I apply a buffer via Raster > Filtered Band… e.g. Maximum 5x5 :


4: I apply the newly created smoothed cloud_mask via Band Maths to all bands I want to include into the Random forest classification:

The following example is indicating that all clouds now have no data value (the image below also includes some training area vectors)

5: The moment I execute Random Forest classification on cloud masked products I get such empty results:


Any advice is highly appreciated !!!

Hello everyone,

I’ve performed RF classification on a S2L2A subset with good results, and then I want to run the same classifier (with same training vectors) on the same subset but with some places masked out. For this, I created a vector file of the areas I want to mask out (urban and golf fields) and used land/sea mask to obtain a new product.

But RF fails with this masked image. I’ve tried modifying pixel values before classification (Nan, 0, -1 and un-selecting No data value) without success. I always get an empty image.

Anyone has idea how to solve this?

Thanks in advance,

You could use the vector in the valid pixel expression of the S2 bands to declare these pixels invalid before running the RF classifier.

e.g. if the vector is named golf you enter !=golf in all band properties to have them removed.

Hello Andreas,
I tried with the expressions: !vector_name and B1!=vector_name for each S2 band before classifying, but it still classifies inside the vectors…

On this old post : Missing vectors on vector training box - #31 by Sara.Aparicio i understand they have similar problem, and doesn´t resolve it yet too. Maybe is a soft problem

I solved it by creating new bands in the original S2 image, with the simple expression:
if vector then 0 else Band
So the pixels inside the polygons I do not want to classify get 0 value. Then the classifier worked fine, although there is still an issue because the “masked” pixels are assigned to an existing class.
It was not possible to solve it using the Valid-Pixel Expression in band properties…

Could you again remove the classified pixels with the vector in the end then?