- 15 Dec, 2025 1 commit
-
-
wenjh authored
-
- 26 Nov, 2025 2 commits
- 12 Nov, 2025 2 commits
- 08 Nov, 2025 1 commit
-
-
wenjh authored
-
- 03 Nov, 2025 8 commits
-
-
zhaochao authored
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
zhaochao authored
Signed-off-by:zhaochao <zhaochao1@sugon.com>
-
- 31 Oct, 2025 1 commit
-
-
wenjh authored
[DCU]Fix memory overflow and test-didistributed in L1_pytorch_istributed_unittest See merge request dcutoolkit/deeplearing/TransformerEngine!49
-
- 17 Oct, 2025 4 commits
-
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
yuguo authored
Merge branch 'develop_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into release_v2.8
-
yuguo authored
Update activation offload code to align with the official version See merge request dcutoolkit/deeplearing/TransformerEngine!52
-
dongcl authored
-
- 16 Oct, 2025 4 commits
-
-
yuguo authored
Merge branch 'develop_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into release_v2.8
-
yuguo authored
Merge branch 'develop_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into develop_v2.8
-
yuguo authored
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 15 Oct, 2025 8 commits
-
-
yuguo authored
Merge branch 'release_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into release_v2.8
-
yuguo authored
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
Merge branch 'develop_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into release_v2.8
-
yuguo authored
-
yuguo authored
Merge branch 'develop_v2.8' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into release_v2.8
-
yuguo authored
-
- 13 Oct, 2025 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 11 Oct, 2025 2 commits
- 09 Oct, 2025 2 commits
- 02 Oct, 2025 1 commit
-
-
Tim Moon authored
* Make sure to set usages for linear op quantizers before forward Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid unsupported case for fused dbias+quantize kernel Hopper does not support dbias + FP8 cast without FP8 transpose. Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 01 Oct, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Fix the cublas workspace alignment Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com>
-