-
HELSON authored
* support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler
e6d50ec1
* support existing sharded and unsharded parameters in zero * add unitest for moe-zero model init * polish moe gradient handler