[PyTorch] Fix pipeline parallel execution by using cloned scale inverse tensors (#659)
Use cloned scale_inv for fp8 cast
Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
Showing
Please register or sign in to comment
Use cloned scale_inv for fp8 cast
Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>