They ‘consider’ that activation maps obtained in the intermediate (hidden) layers will follow Gaussian Mixture Model.
Notation: They say the target mixture model will contain S Gaussians.
So far it’s okay for me.
Then, in Section IV. Functional Watermarking, they say, in practice they select S to be number of classes in the classification task.
This is where I get confused. If you can have one Gaussian for each class already at an intermediate layer, why would you need any additional layers? (Alternatively, if you cannot have a Gaussian for each class at the given layer, then such selection looks too restrictive.)
Do I miss something?