Random Forest Classification with Sentinel 2

if you have the capacity to run 200 trees, you can do it. There is usually a saturation at which the quality no longer increases. But generally, the more trees, the bigger benefit is retrieved from the randomization. This also depends on the number of input layers. If you have only 2 bands, there is no way to randomize and recombine them in multiple way so the only random component is the subset of training pixels. Again, if you have digitized 700 pixels, setting this value to 2000 won’t bring any improvement. So the number of training samples should be smaller than the absolute sum you have because then a different subset is selected with each realization (each new tree) and by this, the algorithm sequentially finds rasters and thresholds with the highest impact.

I still recommend using vectors as training polygons, but if you manage to create rasterized training samples (with NoData at all non-training areas), it should work as well.

Generally, random forest works best with large training inputs (many bands and lots of samples), because only then the randomization is effective. Using only VH band is technically possible, but kind of ruins the whole idea of the RF algorithm. Or did I misunderstand your point and you are using a series of VH bands of different dates? This, in turn, would make sense very much, because the temporal variation is then part of the feature space.
I wonder why you use a classification method to model a gradual variable, such as the LAI. Wouldn’t be a regression, for example, more suitable?