Supervised Classification (Random Forest, Maximum Likelyhood, KNN

najamsyed · November 17, 2020, 2:32am

I am doing supervised classification and faced a problems with classifier evaluation.
When I put less number of training points its gives a complete evaluation report but whenever I increase number of training samples it doesn’t give me complete report.
Two reports with 500 Samples and 5000 sample for maximum likelihood is attached for reference
Same is the issue with RF and KNN. Previously It was working well…
newClassifier500TrainingSample.txt (2.6 KB)
newClassifier5000TrainingSample.txt (567 Bytes)

ABraun · November 17, 2020, 6:09am

could it be that your training areas are smaller than the sample size?

najamsyed · November 24, 2020, 8:53am

Dear ABraun
I am still stuck in that supervised classification issue using different algorithms (RF, Maximum Likelyhood, KNN)
I have 25000 points for each sample class and these points are well distributed over the image.
When ever I increase number of training samples from 500 the classifier does not work properly

ABraun · November 24, 2020, 9:56am

what does this mean? Can you please specify?

najamsyed · November 28, 2020, 4:38am

Thanks for reply,
I am using Random Forest, Maximum Likelohood and KNN. Trianing samples are vector points of two class each with 25000 points. I also ten (10) feature classes which are also included during classification.
Commonly it gives a classification report which do have Accuracy, Precision, RMSE etc. But sometime I do not get the complete report mostly when I increase number of samples from 500.
Last day I processed it with some new data and it works now and may be it will trouble again then I will refer the error to you.
At the moment my issue is resolved.
Thanks

anthoulap · February 21, 2021, 6:56pm

Hi, I would like to ask something about the evaluate classifier when running Random Forest.

I noticed in other posts in the section of Testing feature importance score, in each rank score, tp and accuracy are different numbers. However, in my case (txt file in the bottom), tp and accuracy is the same number, and this is happening in each RFC I try. Any idea why is this happening?
Also, I can’t understand the meaning of negative error rate and cost, and I’m not sure about correlation and GainRatio in each feature rank.
I found information about Correlation here:
Understanding Random Forest. How the Algorithm Works and Why it Is… | by Tony Yiu | Towards Data Science ( we want relatively uncorrelated models (trees), so that uncorrelated models can produce ensemble predictions that are more accurate than any of the individual predictions), and for the GainRatio here: https://www.researchgate.net/publication/228919572_Comparative_study_of_attribute_selection_using_gain_ratio_and_correlation_based_feature_selection: (The attribute with the highest gain ratio is selected as the splitting attribute. So, this means that: rank 2 feature 7 : VV+VH was used as the splitting). Could you, please, help me make it clear?
Thank you in advance!
S1B_20190612_Combin_pol_RF_10_2505.txt (2.4 KB)