- 12 Nov, 2025 1 commit
-
-
wenjh authored
-
- 16 Oct, 2025 1 commit
-
-
yuguo authored
-
- 15 Oct, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
- 18 Sep, 2025 1 commit
-
-
alan yang authored
* feat: add cutlass group gemm support Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor: refactor multi tensor gemm interface Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor: refactor nvte_multi_stream_cublas_gemm func and add license info Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: add unit test for cutlass group gemm Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: add cutlass support type protect Signed-off-by:
Min Yang <min.yang@shopee.com> * add tests and fix lint Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: fix unit tests error Signed-off-by:
Min Yang <min.yang@shopee.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat: refactor host workspace malloc Signed-off-by:
Min Yang <min.yang@shopee.com> * update cutlass Signed-off-by:
Xin Yao <xiny@nvidia.com> * update cutlass Signed-off-by:
Xin Yao <xiny@nvidia.com> * further relex threshold and add a env var to warn fall back Signed-off-by:
Xin Yao <xiny@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Min Yang <min.yang@shopee.com> Signed-off-by:
Xin Yao <xiny@nvidia.com> Signed-off-by:
alan yang <89962857+cassiewilliam@users.noreply.github.com> Co-authored-by:
Min Yang <min.yang@shopee.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 27 Aug, 2025 1 commit
-
-
yuguo authored
-
- 26 Aug, 2025 1 commit
-
-
Vladimir Cherepanov authored
* Pick up cuBLASMp during build Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Change lib order to fix link error Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Context creation, incomplete... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Test fixure Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * A sanity AgGemm test, failing... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix axes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Take care of uneven distribution Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use MPI to get position of local matrices Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Refactor Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Refactor & fixes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Gemm-RS Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Gemm-AR, not working... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fixes Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Setting all-reduce epilogue for gemm-ar Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use supported shapes for GEMM-AR Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak tolerance Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * First shot at fp8 Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use TensorHolder in tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More test configs Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Support comm_sm_count Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Parametrize dtypes for A, B and D separately Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak scaling Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Amax ptr Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Flags parity with cublas_gemm, saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Cleanup Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Bias tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix bias test Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Aux, saving... Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * aux_ld Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * A fix Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Use test::Tensor Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Set scale inv Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove unsupported test configs Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Replace libcal with NCCL Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add NVTX markers to API functions Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Tweak GemmAr tests Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More test config Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix merge fallout Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove MPI dependency, comment API, add algo parameter Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix nvshmem dependency Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix nvshmem build Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Excluse CommGemm tests from L0_cppunittest Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add cpp_distributed sh file for CI Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Adapt tp TensorAllocator Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip GemmAr test on unsupported HW Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Oversibscribe is needed on some clusters Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix incomplete libcal removal Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Move CI tests to L1 Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Rename context to include NVTE prefix Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove leftover code Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * NVTE_WITH_CUBLASMP off by default Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More detailed NVTE_CHECK diag Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Comment API Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Include stdbool header for legacy C compilers Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Remove now unused argument Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Abstract away cuBLASMp algo behind our own enum Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * More detailed shape diag messages Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/include/transformer_engine/comm_gemm.h Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Vladimir Cherepanov <56651474+mk-61@users.noreply.github.com> * Add license Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> --------- Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Signed-off-by:
Vladimir Cherepanov <56651474+mk-61@users.noreply.github.com> Co-authored-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
- 21 Aug, 2025 1 commit
-
-
yuguo authored
-
- 20 Aug, 2025 1 commit
-
-
Tim Moon authored
* Fix incorrect version checks for atomic GEMM Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix typo Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 15 Aug, 2025 1 commit
-
-
Jan Bielak authored
* Add `nvte_cublas_gemm_scaled` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `tex.generic_gemm` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `general_gemm` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `BasicLinear._functional_forward` and `BasicLinear._functional_backward` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `ForwardLinearScaleAdd` fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `BackwardLinearScale` fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Apply suggestions from code review Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Remove calls to `validate_gemm_scale` from `BasicLinear` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 06 Aug, 2025 1 commit
-
-
yuguo authored
-
- 16 Jul, 2025 1 commit
-
-
yuguo authored
-
- 14 Jul, 2025 1 commit
-
-
Tim Moon authored
* Add run-time version checks in cuBLAS GEMM wrapper Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add run-time version logic for multicast Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix namespace error Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 09 Jul, 2025 1 commit
-
-
yuguo authored
-
- 08 Jul, 2025 1 commit
-
-
yuguo authored
-
- 19 Jun, 2025 1 commit
-
-
yuguo authored
-
- 18 Jun, 2025 1 commit
-
-
yuguo authored
-
- 12 Jun, 2025 3 commits
-
-
Phuong Nguyen authored
* Implemented GroupedDense and TestGroupedDense for BF16, FP16, and FP8 * Fix GroupedGemmFFI cuBLAS workspace alignment bug Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com>
-
Phuong Nguyen authored
Revert "[JAX] GroupedDense v.2 without dynamic shape (#1721)" This reverts commit 5d01ef21 . Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* Implemented GroupedDense and TestGroupedDense for BF16, FP16, and FP8 * Fix GroupedGemmFFI cuBLAS workspace alignment bug Signed-off-by:
Hua Huang <huah@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 06 Jun, 2025 1 commit
-
-
Phuong Nguyen authored
* refactor the multi_stream utils + implement nvte_multi_tensor_quantize in TE/Common * implement GroupedQuantizer and grouped_quantize in jaxx * fix logical_axes_names for transpose tensor in ScaledTensor Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
Ming Huang <mingh@nvidia.com>
-
- 29 May, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Changed the Tensor allocation strategy Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Disable debug flag Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the double free error Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed pyTorch recipe extension Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Hide TensorAllocator and fix the usage in LayerNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix permutation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 27 May, 2025 1 commit
-
-
yuguo authored
-
- 22 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Build support for cuda 13 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix build for cudnn 8.9*; cuda 12.1 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd include Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2025 1 commit
-
-
Evgeny Tsykunov authored
* Check tensor-recipe compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 May, 2025 1 commit
-
-
wenjh authored
Add rules of cuda_runtime.h, cuda_driver.h and cuda_nvml.h to hip. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 May, 2025 1 commit
-
-
yuguo authored
-
- 07 May, 2025 2 commits
- 06 May, 2025 1 commit
-
-
yuguo authored
-
- 23 Apr, 2025 1 commit
-
-
yuguo authored
-
- 21 Apr, 2025 1 commit
-
-
Sudhakar Singh authored
* rtx5090 arch fix support Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * apprend `nvte` to the function name so that its visible in framework specific dirs Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix typo Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add filter for nvte_is_supported_nontn_fp8_gemm Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * properly expose the api Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feedback from PR Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move the function to apt header/c files Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more info Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 08 Apr, 2025 1 commit
-
-
Tim Moon authored
* Minor stylistic tweaks and typo fixes Review suggestions from @ptrendx Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix incorrect col strides for MXFP8 matrices Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Apr, 2025 1 commit
-
-
kwyss-nvidia authored
* Add GEMM logic for blockwise quantized tensors. GEMM test cases included in pytorch integration. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update NVTE_BLOCK_SCALING for GEMM. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Gate feature on CUDA 12.9 Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Gemm typo. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove unecessary type converter change. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reflect epilogue availability and test supported epilogues. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * GEMM simplifications from recipe branch. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Format py code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update GEMM DGelu tests to match support depending on output dtype. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Force pow2Scales in GEMM Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add GEMM test to pytorch test suite. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add copyright to GEMM test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update import for GEMM test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add license. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update test gemm supported predicate. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use sgemm like interfaces and naming. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Rewrite GEMM comment. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR Feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Refactor GEMM param canonicalization Configure A and B matrices separately. Have separate code path for each scaling mode. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Prune number of tests. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> --------- Signed-off-by:
Keith Wyss <kwyss@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 04 Apr, 2025 1 commit
-
-
kwyss-nvidia authored
* Blockwise float8 quantizer and quantized tensor class. The classes are configurable for 128x128 blocksize and 1x128 blocksize via setting block_scaling_dim == 2,1 respectively. Scale tensors are stored in a format emenable for matrix multiplication, however the integration of matmul is deferred as a separate story. Fusions of quantization and DBIAS or activation functions are not yet implemented, and the dequantization is currently implemented in torch. Tests for quantization are included in C++ and pytorch layers, with exact comparison to reference quantizer behavior as well as an attempt to hit interesting branches through the API such as tensor creation in pytorch and CPP and dequantization of row and columnwise usage. Two CUDA kernels for quantization are included, and are direct ports of equivalents in the kitchen repository, where a subchannel recipe has been used for end to end training. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Apply linting changes. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Alignment for 1D scaling for GEMM edge case. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Change API name. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix merge conflict with name change. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use common tensor map API. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Change API to use two scaling mode enums. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix typo. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update some call sites. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Tests for torch tensor API surface. Since the quantized tensor is a tensor subclass, these tests exercise torch hooks. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reuse scale calculation between quantizer refs. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Save memory by dropping reference to saved tensors. Issues previously observed are solved. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove constexpr parameters from kernel. Code size is reduced with fewer constexpr params. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Merge conflict from rebase. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add shape implementations for block scaling. nvte_shape was added upstream. Logic added for block scaled fp8. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Move benchmark to te_playground Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove amax_epsilon and pow_2_scales from tensor. Hardcodes the default values. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Lint changes. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fixup MR changes that broke. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Safer ifdef in kernel. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Documentation prose. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reuse compute_scale function from Current Scaling. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Bugfix on inf_value scale refactor. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove qopt calls from test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update pytest list. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add copyright to reference scale calc. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use ptx.cuh functions instead of cde. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update shape logic with allocation and reuse shape. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Usage defaults MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Copyright and header guard. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Updating torch dispatch code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix exception type. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use TypeInfo Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update CS scale update test to use updated ref impl Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update JAX scaling mode enum Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Skip tests on Lovelace Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Keith Wyss <kwyss@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 01 Apr, 2025 1 commit
-
-
Phuong Nguyen authored
* refactor + mxfp8 * added grouped gemm * rename linear to dense * added cublas init phase for groupedGemm * relax the tol of test encoder multiprocessing mxfp8 by 0.001 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 27 Mar, 2025 1 commit
-
-
yuguo authored
-
- 20 Mar, 2025 1 commit
-
-
yuguo authored
-
- 18 Mar, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Do not apply bias when apply_bias is False Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Bwd fix for LNMLP and tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for the dbias calculation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Improve tests and cleaning the logic Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Tightened test tolerances a little Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "Tightened test tolerances a little" This reverts commit 2e20a92c884a84759006541adc1d638ab91dde62. Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Update tests/pytorch/test_numerics.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> * Fix the Gelu Aux type Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove use_fc1_bias option Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 17 Mar, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Do not create multiple cublas handle Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for multiple GPUs per thread Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix multithreaded execution Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix from conlfict Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-