The expert annotation is used by clip_grads and DDP.
Currently only implemented for a single process and expert.