"git@developer.sourcefind.cn:OpenDAS/TransformerEngine.git" did not exist on "9d0f1c9b63bbcc7f705c02ef1238a942b7b75fe5"
-
Zhongbo Zhu authored
* pipeclean, fix nvfp4 padding of 32 alignment Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * numerical test passed Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix CI failure with test_cast_master_weights_to_fp8 (in a hacky way) Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * found CUDA mis-aligned address error in training in multi-swizzle, hack the vec_load_size to 1 to unblock Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * leave comments about alignment issue Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fused bulk alloc nvfp4 Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix RHT sign mask CPU overhead Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * resolve comments Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Remove incorrect logic that treats 0-D tensor as uninitialized Tensor shape logic still requires treating 0-D tensor as uninitialized. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix invalid conversion from tensor to int Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
b4a1d4d6