[feat] Sharded DDP - small refactor and new features (#97)
- rename oss_ddp to ShardedDataParallel - some refactoring - ShardedDataParallel owns the sharded optimizer, exposed if need be - some small perf bumps
Showing
Please register or sign in to comment