[zero] polish ShardedOptimV2 unittest (#385)
* place params on cpu after zero init context * polish code * bucketzed cpu gpu tensor transter * find a bug in sharded optim unittest * add offload unittest for ShardedOptimV2. * polish code and make it more robust
Showing
Please register or sign in to comment