-
Xin Yao authored
* GroupedGEMM via multi-stream cublas * fix A/B is nullptr while D is not nullptr * add fp8 grouped gemm * register with TorchScript * add the GroupedLinear layer --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Jiang Shao <jiangs@nvidia.com> Co-authored-by:
Qi Zhang <qizhang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
a4e95e86