Ground-truthing RF

Hey all

I am doing supervised classification of vegetation types a L2A product using RF. I have previously collected field data (GPS points & vegetation attribute) which I have as a shapefile. I understand how to perform supervised classification by digitising training areas by specifying geometries, however I have no idea how to integrate the field data I have collected into the RF training and validation data. Can anyone who has experience in ground-truthing please give me a basic step by step of how this should be approached?

Many thanks to anyone who can help in advance


When collecting filed data for your application (in this case is classifying vegetation types), there are two ways you can use those data for:

  1. Training the algorithm:. As field data are very accurate (if they have been collected properly) in terms of showing the right land type class, you will have a very good training set. One problem that might come out with that is whether your filed data is enough to train the algorithm. If the filed data are quite sparse, then the possibility of your algorithm to perform poorly is quite high. In that case, you would need to create your own training data manually

  2. Validating the classification results: Once the classification results have been produced, you would need to validate it and see whether the algorithm classified each land type correctly. In that case, you could use your filed data to confirm how well the algorithm performed


thank you @johngan
I think my training data is quite sparse so i will use a combination of both
However I don’t understand how to incorporate this data into the training? Is it simply just a matter of creating a vector overlay, and drawing polygons over my vector points (and attributing them to the class i have recorded them as in the field)? Or is there a much more complex way of doing this?

Or, if i don’t have enough for validation and training, could i just create my own training data but use the field survey data for validation only? Again, in this case, would it just be a matter of placing validation pins where i have filed survey vegetation type data?

So, your field data are points and not polygons.
Having few sparse points are not enough to perform a classification with a good accuracy.
I would suggest you to manually digitize all the different classes (you can use google earth as guidance to make sure you are digitizing the correct land type) using polygons. The more training data the better.

Once the classification is finished, overlay your points on top of the classified map and check whether the classified pixel match the description of your field data. If yes, then this particular pixel has been classified correctly