- 20 Mar, 2025 2 commits
- 18 Mar, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Do not apply bias when apply_bias is False Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Bwd fix for LNMLP and tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for the dbias calculation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Improve tests and cleaning the logic Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Tightened test tolerances a little Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "Tightened test tolerances a little" This reverts commit 2e20a92c884a84759006541adc1d638ab91dde62. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Update tests/pytorch/test_numerics.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> * Fix the Gelu Aux type Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove use_fc1_bias option Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 17 Mar, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Do not create multiple cublas handle Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for multiple GPUs per thread Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix multithreaded execution Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix from conlfict Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Mar, 2025 1 commit
-
-
Li Tao authored
* support tp-comm-overlap in Current Scaling recipe Signed-off-by:
Li Tao <lit@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean Signed-off-by:
Li Tao <lit@nvidia.com> * fix test recipe argument to generalize to MXFP8 Signed-off-by:
Li Tao <lit@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Reduce duplicated transpose in certain cases Signed-off-by:
Li Tao <lit@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use per_tensor_scaling() to judge DS or CS Signed-off-by:
Li Tao <lit@nvidia.com> * minor fixes Signed-off-by:
Li Tao <lit@nvidia.com> * change comment description Signed-off-by:
Li Tao <lit@nvidia.com> * add multi-layer unit test for tp overlap Signed-off-by:
Li Tao <lit@nvidia.com> * support test case that run for several times Signed-off-by:
Li Tao <lit@nvidia.com> * avoid save ub tensor in prepare_for_saving Signed-off-by:
Li Tao <lit@nvidia.com> * fix Signed-off-by:
Li Tao <lit@nvidia.com> * switch to a simple fix Signed-off-by:
Li Tao <lit@nvidia.com> * formatting Signed-off-by:
Li Tao <lit@nvidia.com> * simply test cases; avoid additional clone() Signed-off-by:
Li Tao <lit@nvidia.com> * fall back to get_buffer in layernormmlp Signed-off-by:
Li Tao <lit@nvidia.com> * use 2 layers for fp8 tpoverlap multi-layer test for better tolerance, limit max gpus for test Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Li Tao <lit@nvidia.com> Signed-off-by:
zhongboz <zhongboz@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
zhongboz <zhongboz@nvidia.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Aug, 2024 1 commit
-
-
NVJiangShao authored
* Add permutation functions * Add permutation ops * Remove the dependency on cutlass * Move permutation.py out of module dir * Rewrite the unit test and enable skipping if FP8 is unavailable * Rename exposed C++ API and reorder its parameters + take NVTETensor as inputs * Use Float8Tensor for FP8 input * Move dtype to ctx --------- Signed-off-by:
Jiang Shao <jiangs@nvidia.com> Co-authored-by:
Qi Zhang <qizhang@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 21 Aug, 2024 1 commit
-
-
Tim Moon authored
* Perform scale-inv update in cast-transpose kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Perform scale-inv update in cast and activation kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Perform sclae-inv update in LayerNorm and RMSNorm kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Perform scale-inv update after FP8 GEMMs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fuse casts and scale-inv updates in linear module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fuse casts and scale-inv updates in layernorm-linear module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Simplify kernel to update FP8 scale-inv Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix typos Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug amax update in layernorm kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug ONNX export Use quantization scaling factor in ONNX quantize op. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestion from @ptrendx Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Debug mismatched dtypes Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 01 Aug, 2024 1 commit
-
-
Xin Yao authored
* fix workspaces and unfused bias in multi-stream cuBLAS * Expose num_streams via pybind * Fix C-compatibility * rm importing packaging in test_fused_attn.py --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 25 Jun, 2024 1 commit
-
-
Xin Yao authored
* GroupedGEMM via multi-stream cublas * fix A/B is nullptr while D is not nullptr * add fp8 grouped gemm * register with TorchScript * add the GroupedLinear layer --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Jiang Shao <jiangs@nvidia.com> Co-authored-by:
Qi Zhang <qizhang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 May, 2024 1 commit
-
-
Phuong Nguyen authored
* added alignment requirements for CuBLAS heuristics Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * minor rewords Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * added unit test for gemm with unaligned inputs Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * added pytest skip if fp8 is not available Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * changed offset so that it has alignment with 128 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 24 Oct, 2023 1 commit
-
-
Tim Moon authored
* Do not include logging macros in installed C headers Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug logging macros Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug C++ tests Use Google style for header includes. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CUDA driver macros Incorporating changes from #389. Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use core error checking macros in PyTorch extensions Hack to get around macro redefinition warning. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix missing arg when getting CUDA driver error string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse logging header in frameworks Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com>
-
- 05 Oct, 2023 1 commit
-
-
vasunvidia authored
* Initial commit Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Repro for RS output mismatch with Single GEMM + Split pipelined RS Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * minor changes for AG->GEMM pipelined overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add Atomic Gemm cublasApi attributes and initial implementation of AG->Atomic GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * AtomicGemm+RS functional with workaround Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * add amax update to layernorm_linear for FP8 unit test accuracy Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable reducescatter2_userbuff_strided variants Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * AG+AtomicGemm overlap functional but gemm doesnt overlap with comm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add userbuffers_sendrecv kernel variants Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * TransformerLayer API changes to enable AtomicGemm+RS overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [UB] AllGather Atomic GEMM overlap using userbuffer_sendrecv kernels Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup + bug fix for multiatomic sendrecv kernel Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fixes Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [UB] Add shuffling for better AG AtomicGEMM overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix for AG AtomicGemm overlap Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix for multiAtomicAG and singleAtomicAG Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Use chunk_i+1 as recv_chunk for multiatomic_AG with shuffling Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Launch AtomicGEMM after first-chunk AG Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Rebase to main Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add FP8 ReduceScatter kernels, AtomicGEMM+FP8 RS not functional Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Revert "Add FP8 ReduceScatter kernels, AtomicGEMM+FP8 RS not functional" This reverts commit 80a47a76355440cd5fb4314c96fe9fda632d87f9. Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add support for NVLS-MC and FP8 Reduce Scatter Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Bug fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Atomic and Multiatomic FP8 RS functional Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove debug print Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * UB comm initialization hang fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Code cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Create new GEMM API for Atomic GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * CI ready Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * more fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * license Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Bug fix Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Revert NVLS-MC Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Check cu* versions for running atomic gemms Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Add experimental warning Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better wording Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add warning to c api Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix wording Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 26 Jun, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Apr, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Port initial changes Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd FA include for PyTorch Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Re-enable sm_70 + cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * LICENSE, cleanup header Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * 5k -> 173 errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * license and fixes in userbuffers-host Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * next round fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * final cpp cleanup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pylinting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix from linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Turn off default async amax reduction (#148) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove unused code path Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Macros Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix conflict resolution bug Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Fix gencode flags in setup (#145) * Fix gencode flags based on cuda version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert append_nvcc_threads change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change overlap config dict error message Signed-off-by:
Sangkug Lym <slym@nvidia.com> * simplify ub initialization Signed-off-by:
Sangkug Lym <slym@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix sanity imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpplint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix TensorFlow build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix TE macros in public header Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * compiles with and w/o MPI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes for python side annotations for conditional compile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * link gdrAPI only when MPI found Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix comments for dummy var Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix linking Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * load MPI before TE Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add Py side argument checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove unused code and catch silent failures Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix find_lib path for tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
- 28 Mar, 2023 1 commit
-
-
vasunvidia authored
* Add support for fp8 GEMM BIAS AUX GELU fusion Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint error Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint error Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
- 03 Feb, 2023 1 commit
-
-
vasunvidia authored
* Increase number of FP8 tensors per GEMM Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Enable FP8 output tensor for fp8_gemm Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * [BERT FP8] Initial TE review comments Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Temporary fix for cuda graph non convergence Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Address review comments-2 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Review comments-3 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Cleanup Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Change for New API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Remove unnecessary clone for D_scale, D_amax Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Avoid Roll for AMAX history size = 1 Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Update onnx_te_gemm API Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Fix Lint errors Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
- 12 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
* Add NVTX to TE modules Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix pylint Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix NVTX in _prepare_backward Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add NVTX to C API Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix cpplint and link nvToolsExt Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Add NVTX to GeGlu Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 03 Jan, 2023 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 08 Dec, 2022 1 commit
-
-
Przemyslaw Tredak authored
* Move the amax/scale/scale_inv into the TE Tensor struct. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Handle multi_cast_transpose Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Changed softmax to new Tensor Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * First pass at the cpp tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Round of fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix multi_cast_transpose Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix cast_to_fp8 Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Sep, 2022 1 commit
-
-
Przemek Tredak authored
Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-