Random Forest Satellite Image Classification

Hi, I have grasped the fundamental theory of how random forest works. However, random forest classification seems to work differently in SAR and optical image. How does it classify the image? What’s the working principle of random forest on satellite image? What’s the parameters/attribute it used to classify the image? I appreciate it if someone can enlighten me. Thank you

possible reasons why results based on optical images (multispectral) work better than radar images (only two bands VV and VH): Classification of GRD product
reasons for low accuracy: Random Forest result
on the parameters: Number of training samples at Random forest classifier

Please specify your questions if these information are not sufficient.

Hi, I am going to use RF classification on S-1 images, and I would like to confirm a few things about on how RF works on SNAP. From the posts I have read,

  1. the Evaluate classifier is about training accuracy and not accuracy assessment of the classified image,
  2. So, accuracy assessment has to be done in another software,
  3. The number of the training samples is the number of the pixels that will be used from each vector I have imported (when train on vector is selected) for the training of the algorithm,
  4. RF splits training data to 2/3 for training and 1/3 for an internal cross-validation.
    In the number of the training samples, the number that I am typing is the 2/3 or the 3/3 of the available pixels of each trained class?

you can still do it within SNAP, an example is given in this tutorial: Landcover classification with Sentinel-1 GRD

The number of pixels which is used per class for the training is shown in the txt file which pops up after running the RF classifier

Thank you very much for the tutorial! I will try it this way, too!

As far as concerned the number of the training samples, I want to classify in water and non-water. The total number of pixels for water and non-water are 4.500 and 12.000, respectively. What is the number I should type in the “number of training samples” box?

Having 4.500 pixels available in the training vector of water, when the default is 5.000, what should I do, if I can not increase the number of the 4.500 pixels?

Would you advise to decrease the number in the 2/3 of the 4.500 pixels (meaning 3.000 in the ‘number of training samples’)? and the remaining 1/3 of 4.5000 (=1.500 pixels) will be used for the computation of the training accuracy?
I apologize for asking so much questions! I thank you very much for your time!

the number of training samples is a fixed value, but not limited to the same amount of pixels. If you select 3000 pixels and compute 10 trees, every tree is constructed based on randomly selected 3000 pixels.
If you really want to split the analysis into two absolutely independent sets of training and validation areas, you have to decrease the training vector size by one third so this third is not even imported into SNAP for training.

Ok. thank you very much. So, I split the dataset to training (2/3) and validation (1/3) (and the validation dataset I don’t import it in SNAP). Supposing now, after the split, the sum of the training samples is 5.000. So, to make RFC effective I have to decrease again the number to
2 / 3 * 5.000 = 3.333 (pixels - training samples), because otherwise I train on the same pixels and the principle of the RF classifier is missed. Is that right? Thank you very much!!

sounds right to me at least :slight_smile:

Ok. thank you very much for your quick response!!

Does this mean we don’t have to use separate testing data with training data for accuracy assessment? The .txt file produced after RF classification is enough to prove the classification accuracy? After the calculation of metrics provided in the file.

Thank you.

no, the random component is only used to improve the classifier. The accuracy measures tell you how well the RF is able to predict the classes of the training samples. It does not tell you how well it performs on untrained data, so I recommend to add an external validation.

1 Like

Dear @ABraun,

Is there any way to identify the feature importance once we forgot to select the Evaluate Feature Power Set after running the Random Forest classifier. I had run for timeseries input images. ;(

maybe you can check if there is something in the metadata record (processing graph or history)

at the text file after RF classification is done, having selected the Evaluate feature power set, I want to ask what is the cv meaning on:
newClassifier20.792: cv 86.43 % VV_Entropy, …,

as well as I want to ask if the last sentence at the text file:

TOP Classifier = newClassifier20.204 at 90.87 % gives the set with the best features to run the algorithm.