Maximum Likelihood Classifier results are different each time when re-run on the same training data

andrewp · October 15, 2018, 2:10am

I am using the Maximum Likelihood Classifier (MLC) in SNAP and am getting some results that I was not expecting. What I’m finding is that I get a different classification map every time I train the MLC on the same training data.

I’ve attached some screenshots here:
Screenshot 1
Screenshot 2
Screenshot 3

I assume that there must be a random number generated at some point in the algorithm. Otherwise, shouldn’t I expect to get the same classification map if I use the same training data (polygons)?

If someone could provide me with technical detail on the SNAP implementation of MLC (there wasn’t a lot in the Help menu) or explain what’s going on here that would be greatly appreciated.

Thanks!

obarrilero · October 15, 2018, 2:38pm

Hi,
I think that there are two reasons that could explain the different outputs:

In the classifier you select the number of training samples and if you select a number smaller than the available, then the used values could be different.
Internally, SNAP divide the training samples in two datasets for training (50%) and for testing/validation (50%) and as far as I know, this is made randomly

andrewp · October 15, 2018, 10:38pm

Ah I see.

I’ve done some testing and the effect is mostly due to your first point - when I increase the number of training samples to be larger than the available I get more consistency and the classification maps are almost indistinguishable. However, there are still (very minor) differences which can be seen when comparing the frequencies of each class i.e.

First Run Frequencies (%)
Class 1 = 27.197
Class 2 = 22.559
Class 3 = 13.824
Class 4 = 36.421

Second Run Frequencies (%)
Class 1 = 27.206
Class 2 = 22.685
Class 3 = 13.848
Class 3 = 36.262

I assume these differences are due to your point 2.

One more thing, in the output there’s a “confidence” variable produced. I couldn’t find any information on how this is calculated or what it actually means. Is there a user guide that you could point me to?

Thanks again

maria1 · December 20, 2018, 4:26pm

Dear senior,

Senior @obarrilero mentioned that " Internally, SNAP divide the training samples in two datasets for training (50%) and for testing/validation (50%) …"

could anyone/senior provide me the reference document to support this sentence?

Thank very much much much…!

ABraun · December 20, 2018, 4:53pm

I would be interested in that as well. If this is really the case, the % would indeed give the accuracy of the classified map.