How to know the ratio of training and testing sample in Random Forest classifier?

swadhina_koley · October 21, 2020, 10:03am

Random Forest classifier is one of the most widely used classifiers used for supervised classification. The classifier provides accuracy and precision based on some training and test samples. However, in the SNAP platform, while training the classifier, it only asks for the number of training samples, set at 5000 by default, and the number of trees, set at 10 by default. What does this training sample mean?

Also, while performing the classification, it gives a report on the classifier accuracy and precision for each class in a text file. I guess, this report is prepared based on a particular number of training and test samples. Now, my question is how to know this number of training and test samples for the Random Forest classifier in the SNAP platform? Does SNAP use a ratio by default? If yes, then how to know that ratio and also how to change that ratio, if required?

Any ideas on this would be helpful. Thanks in advance.

gnwiii · October 21, 2020, 11:29am

SNAP uses the Java Machine Learning Library,
Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning Library, Journal of Machine Learning Research, 2009, 10, 931-934

Also: source for java-ml random forest classifier usage in SNAP

giangnv · October 28, 2020, 5:17pm

@gnwiii,
Could you please explain in detail correspondent with @swadhina_koley questions. We don’t understand much when looking at the code.
Thank you in advance!