- 26 Aug, 2025 1 commit
-
-
yuguo authored
-
- 23 Aug, 2025 1 commit
-
-
yuguo authored
-
- 21 Aug, 2025 1 commit
-
-
yuguo authored
-
- 13 Aug, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 12 Aug, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 11 Aug, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 Aug, 2025 2 commits
- 06 Aug, 2025 3 commits
- 22 Jul, 2025 2 commits
- 18 Jul, 2025 2 commits
-
-
yuguo authored
-
Charlene Yang authored
* update cudnn-frontend to 1.13.0 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable 9.11 for a bug Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix selection logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Jul, 2025 2 commits
-
-
Charlene Yang authored
* optimize kv_cache reindex and copy kernels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * avoid reindexing from python side Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rename variable from previous commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
yuguo authored
-
- 16 Jul, 2025 2 commits
-
-
Tim Moon authored
* Add dtype checks in multi-tensor Adam Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid throwing exception in destructor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
yuguo authored
-
- 15 Jul, 2025 1 commit
-
-
yuguo authored
-
- 14 Jul, 2025 2 commits
-
-
Tim Moon authored
* Add run-time version checks in cuBLAS GEMM wrapper Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add run-time version logic for multicast Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix namespace error Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Autumn1998 authored
* fix underterminsic problem in CI Signed-off-by:
tongliu <tongliu@nvidia.com> * fix bug on mbs>1 Signed-off-by:
tongliu <tongliu@nvidia.com> * fix bug on sm dispatcher Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CI initial values Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
tongliu <tongliu@nvidia.com> Co-authored-by:
tongliu <tongliu@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com>
-
- 12 Jul, 2025 1 commit
-
-
Jan Bielak authored
* Fix clearing tensor data in backward removing is_first_op Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Misc fixes Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Use Linear weight dtype and device for compute consistently Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add backward dbias + quantize fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Pass recipe to OperationFuser to allow recipe-dependent fusions Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Remove redundant view from activations Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Add bias activation backward fusion Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Apply suggestions from code review Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 10 Jul, 2025 1 commit
-
-
Autumn1998 authored
* add router fusion Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by:
tongliu <tongliu@nvidia.com> * fix ci with cuda 12.3 Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CI sm89/80 Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
tongliu <tongliu@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
tongliu <tongliu@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 09 Jul, 2025 1 commit
-
-
yuguo authored
-
- 08 Jul, 2025 1 commit
-
-
yuguo authored
-
- 03 Jul, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 02 Jul, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 01 Jul, 2025 1 commit
-
-
wenjh authored
Add env to chose blocklen of blockwise quantize. Signed-off-by:
wenjh <wenjh@sugon.com> Fix pytest of blockwise error Signed-off-by:
wenjh <wenjh@sugon.com> Resolve new api in int8 gemm test Signed-off-by:
wenjh <wenjh@sugon.com> Fix incorrect launch parm Signed-off-by:
wenjh <wenjh@sugon.com> Fix 1D blockwise(64) acc error Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 26 Jun, 2025 1 commit
-
-
xiaoxi-wangfj authored
* [PyTorch|common] Implement unpadding kernel for FP8 1. Add multi-tensor unpadding kernel 2. Replace split+cat with unpadding kernel in Fp8Padding and Fp8Unpadding 3. Add unpadding with padding unit tests Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add license Signed-off-by:
Xin Yao <xiny@nvidia.com> * Update padding.cu Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
xiaoxi-wangfj <690912414@qq.com> Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com>
-
- 20 Jun, 2025 1 commit
-
-
yuguo authored
-
- 19 Jun, 2025 2 commits
- 18 Jun, 2025 5 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 17 Jun, 2025 1 commit
-
-
yuguo authored
-