- 23 Feb, 2026 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 23 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 21 Jan, 2026 1 commit
-
-
maxiao3 authored
Signed-off-by:maxiao3 <maxiao3@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!71
-
- 30 Oct, 2025 2 commits
-
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 23 Oct, 2025 2 commits
-
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
- 20 Oct, 2025 1 commit
-
-
tabuchixiangcai3 authored
[DCU]Fix MPI root support, enable int8 simulation and batched_inear to access non-existent. main_grad Signed-off-by:Tangao <2205747538@qq.com>
-
- 16 Oct, 2025 2 commits
- 13 Oct, 2025 1 commit
-
-
yuguo authored
-
- 28 Sep, 2025 1 commit
-
-
dongchl authored
-
- 24 Sep, 2025 1 commit
-
-
dongcl authored
-
- 19 Sep, 2025 2 commits
- 18 Sep, 2025 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
- 02 Sep, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 28 Aug, 2025 2 commits
-
-
Charlene Yang authored
* disable determinism for sm100+ and cudnn<9.14 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix remaining CI failures Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert some changes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert more changes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove sm100 from determinism table Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
yuguo authored
-
- 27 Aug, 2025 2 commits
-
-
Kshitij Lakhani authored
Signed-off-by:Kshitij Lakhani <klakhani@nvidia.com>
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 26 Aug, 2025 3 commits
-
-
Tim Moon authored
* Return dummy wgrad tensors when requested by Mcore Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Apply suggestions from code review Co-authored-by:
Jan Bielak <janekb04@icloud.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Jan Bielak <janekb04@icloud.com>
-
Tim Moon authored
Avoid garbage collection when capturing a CUDA Graph Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
yuguo authored
-
- 25 Aug, 2025 1 commit
-
-
yuguo authored
-
- 23 Aug, 2025 2 commits
- 21 Aug, 2025 2 commits
- 19 Aug, 2025 1 commit
-
-
evt_fugx1 authored
-
- 18 Aug, 2025 1 commit
-
-
Xin Yao authored
* check if the given recipe is supported in fp8_autocast Signed-off-by:
Xin Yao <xiny@nvidia.com> * resolve comments Signed-off-by:
Xin Yao <xiny@nvidia.com> * check only when enabled Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 16 Aug, 2025 1 commit
-
-
jomitchellnv authored
fix: fixes multi head attention for context parallel: rotary embedding to use padded cu_seq_lens (#2077) fix: fixes mha to use padded cu_seq_lens during cp Signed-off-by:Jonathan Mitchell <jomitchell@nvidia.com>
-
- 15 Aug, 2025 1 commit
-
-
Jan Bielak authored
* Add `nvte_cublas_gemm_scaled` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `tex.generic_gemm` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `general_gemm` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Support use of `alpha` and `beta` in `BasicLinear._functional_forward` and `BasicLinear._functional_backward` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `ForwardLinearScaleAdd` fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add `BackwardLinearScale` fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Apply suggestions from code review Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Remove calls to `validate_gemm_scale` from `BasicLinear` Signed-off-by:
Jan Bielak <jbielak@nvidia.com> --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 14 Aug, 2025 4 commits
-
-
Kirthi Shankar Sivamani authored
Add launch bounds to swizzle kernel, use empty scale inv Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
* Unfused impl for dbias-quantize Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Unfused impl for dact-dbias-quantize Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable fused bgrad-quantize for unsupported recipes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unfused dbias-quantize impls Not supported in the core lib. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Support unfused impls in tex functions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweaks Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Tim Moon authored
Avoid registering FP8 recipe update in ops without backward pass Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Tim Moon authored
* Register weight/bias params in linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure linear op checkpoint is backward-compatible Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warning Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Check for invalid case before setting bias Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-