I think the main difference between both is that the sklearn RF offers more options to define the permutation of the input features and variables, for example the fraction of each sampling set for the generation of each tree, the pureness of nodes, the minimum numbers within a node ect. All these are unknowingly predefined in SNAP which could be one reason for its faster performance (no or less randomization, subsetting, bootstrapping of samples).
Besides that, I think it is rather useless to generate 500 trees based on only one input raster, because you will extract samples from the same source over and over again. Your random forest becomes a forest, basically and the only thing that changes is the subset of sampling points. You would achieve way more out of a random forest if you offer a couple of rasters at least to be randomly permutated.