[DeepSpeed ZeRO3] Fix performance degradation in sharded models (#18911)
* [DeepSpeed] Fix performance degradation in sharded models
* style
* polish
Co-authored-by:
Stas Bekman <stas@stason.org>
Showing
Please register or sign in to comment