Issue with Supervised random forest classification

I will take look to your advise and links to see how it works out.
Thank you so much ABraun for your time and attention.
You have been a helpful friend to me, I wich you all the best !!

Dear ABraun,

The standard input pixels per tree is 5000 (For Random forest algorithm). Therefore, we couldn’t use single pixel for training data, right? I quite a bit confuse because the training data that I collected via GPS represent as point and I also 10-fold cross validation.

  • Should I make training data as polygon (base on field data collection)?
  • Is it okay for publication for doing that? (sorry some questions seem far from the forum)

Thank you very much

you can use point samples, but as you say, if you have less than 5000, each tree is generated based on all available points and the advantage of permutation of a random forest classifier is partly lost (it also shuffles the input rasters).
You can create polygons around your samples to increase the number of potential samples. Your points collected in the field are a good base for this and I think it is scientifically correct to extend this points to representative area around.

If you have enough samples, you can also leave some of them out of the classification process and later use them for validation (did the classifier predict the correct class at the sampled location)?

1 Like

Thank you very much bro ABraun.

Bro ABraun,

  • if I have only 800 samples (points) and I decrease the input parameter: Number of training samples to 800 or 700, would it be enough for make classification ?

Lots of thanks

you either create polygons around your points (depending on the radius and pixel size, this already multiplies your input values) or you decrease the number of samples. To make it effective, I would reduce it to 2/3 of the whole data, so around 550.

1 Like

Why after the re projection image not loading?

Good day,

I’m evaluating the Random Forest classification for crop and land cover analysis based on Sen-2 imagery.
The aim is to apply a multi temporal approach including all 13 spectral bands and in addition: ndvi, lai, fapar and fcover.

For now I use 1-c products and select more or less cloud-free scenes.

The Random Forest classification works fine for single image analysis – the trouble starts when I want to use my cloud-masked imagery.

In those cases the Random Forest classification is also executed without any error message – but after a few minutes a result is created where the LabeledClasses Band does not have any classification results (for not cloud masked subsets the processing takes more than 2 h).

Due to the fact that my non-cloud-masked subsets are processed nicely I suspect that I miss something when I apply my cloud mask work flow.

Any advice is highly appreciated.

My work flow is as follows:

1: I create products with a) Idepix cloud masks and b) Biophysical inices:

Running Random Forest classifications on those new products works fine.

(At that stage the Idepix masks are just masks in the Masks folder – but they are not jet applied to the various bands, as far as I understand)

2: I combine all IDEPIX cloud and snow masks to one cloud_mask band:


3: To get a smoother cloud mask I apply a buffer via Raster > Filtered Band… e.g. Maximum 5x5 :


4: I apply the newly created smoothed cloud_mask via Band Maths to all bands I want to include into the Random forest classification:

The following example is indicating that all clouds now have no data value (the image below also includes some training area vectors)

5: The moment I execute Random Forest classification on cloud masked products I get such empty results:


Any advice is highly appreciated !!!

Hello everyone,

I’ve performed RF classification on a S2L2A subset with good results, and then I want to run the same classifier (with same training vectors) on the same subset but with some places masked out. For this, I created a vector file of the areas I want to mask out (urban and golf fields) and used land/sea mask to obtain a new product.

But RF fails with this masked image. I’ve tried modifying pixel values before classification (Nan, 0, -1 and un-selecting No data value) without success. I always get an empty image.

Anyone has idea how to solve this?

Thanks in advance,

You could use the vector in the valid pixel expression of the S2 bands to declare these pixels invalid before running the RF classifier.

e.g. if the vector is named golf you enter !=golf in all band properties to have them removed.

Hello Andreas,
I tried with the expressions: !vector_name and B1!=vector_name for each S2 band before classifying, but it still classifies inside the vectors…

On this old post : Missing vectors on vector training box - #31 by Sara.Aparicio i understand they have similar problem, and doesn´t resolve it yet too. Maybe is a soft problem

I solved it by creating new bands in the original S2 image, with the simple expression:
if vector then 0 else Band
So the pixels inside the polygons I do not want to classify get 0 value. Then the classifier worked fine, although there is still an issue because the “masked” pixels are assigned to an existing class.
It was not possible to solve it using the Valid-Pixel Expression in band properties…

Could you again remove the classified pixels with the vector in the end then?

I could do that, but I guess if the classifier includes those pixels, the results will differ. That is why I wanted to remove them before classification.

If they are not part of the training, the results will not change because the classifier only applies the learned rules to the untrained pixels. If some of them don’t contain valid results, you can simply remove them in a post-classification step.

I think they do differ, look at this two results; the one on the left is the classification with the complete image, on the right you can see the classification of the image with pixels masked (before class.) You can notice some differences between them at a glance…

I don’t think that the difference is caused by the masking. Technically, every random forest produces different results, even if you run it twice on the exact same inputs. This is because both subsets of the training data and the raster data are randomly shuffled in each iteration to construct the ruleset. Accordingly, the thresholds to separate the classes can change unless you use a really large number of trees.

How much trees did you use to compute the results?

As default, 10
And yes sure, you are right about the randomness of the random forest :sweat_smile:

You can run the same configuration (with masks) twice to estimate how large the variation between two runs is based on the random sampling. If it is too high for you, simply increase the number of trees which makes the class thresholds more robust.