- 27 Aug, 2025 1 commit
-
-
yuguo authored
-
- 23 Aug, 2025 1 commit
-
-
yuguo authored
-
- 21 Aug, 2025 1 commit
-
-
yuguo authored
-
- 15 Aug, 2025 1 commit
-
-
Jan Bielak authored
* Add `nvte_cublas_gemm_scaled` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `tex.generic_gemm` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `general_gemm` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `BasicLinear._functional_forward` and `BasicLinear._functional_backward` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `ForwardLinearScaleAdd` fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `BackwardLinearScale` fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Apply suggestions from code review Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Remove calls to `validate_gemm_scale` from `BasicLinear` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 13 Aug, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 12 Aug, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 11 Aug, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 Aug, 2025 2 commits
- 06 Aug, 2025 3 commits
- 18 Jul, 2025 1 commit
-
-
yuguo authored
-
- 16 Jul, 2025 1 commit
-
-
yuguo authored
-
- 14 Jul, 2025 1 commit
-
-
Tim Moon authored
* Add run-time version checks in cuBLAS GEMM wrapper Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add run-time version logic for multicast Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix namespace error Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 09 Jul, 2025 1 commit
-
-
yuguo authored
-
- 08 Jul, 2025 1 commit
-
-
yuguo authored
-
- 19 Jun, 2025 1 commit
-
-
yuguo authored
-
- 18 Jun, 2025 1 commit
-
-
yuguo authored
-
- 12 Jun, 2025 3 commits
-
-
Phuong Nguyen authored
* Implemented GroupedDense and TestGroupedDense for BF16, FP16, and FP8 * Fix GroupedGemmFFI cuBLAS workspace alignment bug Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com>
-
Phuong Nguyen authored
Revert "[JAX] GroupedDense v.2 without dynamic shape (#1721)" This reverts commit 5d01ef21 . Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* Implemented GroupedDense and TestGroupedDense for BF16, FP16, and FP8 * Fix GroupedGemmFFI cuBLAS workspace alignment bug Signed-off-by:
Hua Huang <huah@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 06 Jun, 2025 1 commit
-
-
Phuong Nguyen authored
* refactor the multi_stream utils + implement nvte_multi_tensor_quantize in TE/Common * implement GroupedQuantizer and grouped_quantize in jaxx * fix logical_axes_names for transpose tensor in ScaledTensor Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
Ming Huang <mingh@nvidia.com>
-
- 29 May, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Changed the Tensor allocation strategy Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Disable debug flag Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the double free error Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed pyTorch recipe extension Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Hide TensorAllocator and fix the usage in LayerNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix permutation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 27 May, 2025 1 commit
-
-
yuguo authored
-
- 26 May, 2025 1 commit
-
-
wenjh authored
Use ocp fp8. Workaround: test_cast_float8blockwise.cu link wrong std::max Signed-off-by:wenjh <wenjh@sugon.com>
-
- 22 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Build support for cuda 13 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix build for cudnn 8.9*; cuda 12.1 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd include Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2025 1 commit
-
-
Evgeny Tsykunov authored
* Check tensor-recipe compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 May, 2025 1 commit
-
-
wenjh authored
Add rules of cuda_runtime.h, cuda_driver.h and cuda_nvml.h to hip. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 May, 2025 2 commits
-
-
yuguo authored
-
wenjh authored
Default use of hipMallocAsync rather than hipMalloc in rocblas_gemm and add support of fp16_fp16_fp32 in rocblas_gemm. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 07 May, 2025 2 commits
- 06 May, 2025 2 commits
-
-
yuguo authored
-
wenjh authored
Fix launch params are larger than launch bounds(256) for kernels in rocm_gemm.cu Signed-off-by:wenjh <wenjh@sugon.com>
-
- 30 Apr, 2025 1 commit
-
-
wenjh authored
Signed-off-by:
wenjh <wenjh@sugon.com> [RocblasGemm] Provide support of AB(bf16)D(fp32) Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 25 Apr, 2025 1 commit
-
-
yuguo authored
-
- 23 Apr, 2025 1 commit
-
-
yuguo authored
-
- 21 Apr, 2025 1 commit
-
-
Sudhakar Singh authored
* rtx5090 arch fix support Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * apprend `nvte` to the function name so that its visible in framework specific dirs Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix typo Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add filter for nvte_is_supported_nontn_fp8_gemm Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * properly expose the api Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feedback from PR Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move the function to apt header/c files Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more info Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-