"vscode:/vscode.git/clone" did not exist on "66b108d1428eaee089bf2a4cb41b1299d150fd93"
Increase number of FP8 tensors per GEMM (#22)
* Increase number of FP8 tensors per GEMM Signed-off-by:Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable FP8 output tensor for fp8_gemm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [BERT FP8] Initial TE review comments Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Temporary fix for cuda graph non convergence Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Address review comments-2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Review comments-3 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change for New API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove unnecessary clone for D_scale, D_amax Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Avoid Roll for AMAX history size = 1 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Update onnx_te_gemm API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint errors Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
Showing
Please register or sign in to comment