[PyTorch][MoE] MXFP8 Support to Reduce CPU Overhead By Fuse Torch Empty Calls (#1934)
* functional passed Signed-off-by:zhongboz <zhongboz@nvidia.com> * before zero padding in mxfp8 swizzle, use torch zeros to malloc for now Signed-off-by:
zhongboz <zhongboz@nvidia.com> * format Signed-off-by:
zhongboz <zhongboz@nvidia.com> * lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com>
Showing
Please register or sign in to comment