DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect...
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)
Showing
Please register or sign in to comment
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760)