Add a hint for DGDP synchronization
Fix gshard gate test
Fix smart schedule with older PyTorch
Fix type mismatch, shape mismatch and lack of condition in FasterMoE's expert shadowing
fix bug: skip computeFn when batch is empty
Documents for FasterMoE
FasterMoE Expert Shadowing
Faster Scheduling
Faster topo gate
Fix document for megatron