[zero] adapt zero for unsharded parameters (#561)
* support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler
Showing
Please register or sign in to comment
* support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler