- 21 Jul, 2025 1 commit
-
-
Charlene Yang authored
* exclude 9.10.0/.1 for certain configs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix kv_channels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add get_backend to tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add init files Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix numerics and cuda graph tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove prints Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor changes after renaming Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import structure and rename get_attention_backends Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs and benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get backend calls Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix get backend calls" This reverts commit 653cbb51c697bc2f975416bb3aac1d85f76c36dc. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix docs and benchmarks" This reverts commit 98cd52e04ff7c53e26b412195f5744e39f7ed0e9. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix docs, benchmarks and pre-commit ci Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix dpa/mha flash attn selection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix rng states Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix backend selection on Ampere Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix issues from last merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update tests/pytorch/utils.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove initialization of rng_states to None Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * redefine ModelConfig Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix seed for CP tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update tests/pytorch/test_sanity.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move fixture from utils to individual tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 09 Jul, 2025 1 commit
-
-
Zhongbo Zhu authored
* functional passed Signed-off-by:
zhongboz <zhongboz@nvidia.com> * before zero padding in mxfp8 swizzle, use torch zeros to malloc for now Signed-off-by:
zhongboz <zhongboz@nvidia.com> * format Signed-off-by:
zhongboz <zhongboz@nvidia.com> * lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com>
-
- 26 Jun, 2025 1 commit
-
-
Zhongbo Zhu authored
* finish python ref impl for bulk alloc Signed-off-by:
zhongboz <zhongboz@nvidia.com> * c++ bulk alloc worked, still draft version Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * resolve rebase conflict Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add license Signed-off-by:
zhongboz <zhongboz@nvidia.com> * use shared_ptr to auto manage reference count Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attempt to fix misc training error Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attempt to handle case where experts get zero token Signed-off-by:
zhongboz <zhongboz@nvidia.com> * updated with fused C++ function calls Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * experiment with reducing py object construction time Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix seg fault bug in inference mode Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fuse torch split into bulk alloc Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * rebase to latest main Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix unit test failure Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix lint error Signed-off-by:
zhongboz <zhongboz@nvidia.com> * refactor create_tensor to use get_scale_shape Signed-off-by:
zhongboz <zhongboz@nvidia.com> * refactor quantize to call quantize_cpp Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Implement separate functions for multi-tensor quantize and split + multi-tensor quantize Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update grouped linear module with fused split+quantize func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move multi-tensor quantize func to cast.cpp Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not expose quantizer helper function externally Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert cuDNN frontend commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> * fix corner cases with zero tokens Signed-off-by:
zhongboz <zhongboz@nvidia.com> * add comments Signed-off-by:
zhongboz <zhongboz@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Aug, 2024 1 commit
-
-
Charlene Yang authored
* WIP: add fa3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * differentiate func/varlen_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add FP8 fwd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add FA3 FP8 fwd code and test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix assert for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA3 FP8 logic and add tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA2 to <=2.6.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak unit tests for base/mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set constraints for FA3 for sm90 and causal_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert debug changes in benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 13 Aug, 2024 1 commit
-
-
Charlene Yang authored
* update example/benchmark scripts Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix head_dim after MLA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update notebook Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Charlene Yang authored
* add attention docs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak to first draft Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up pictures Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft for review Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add logging info/debug Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix of an SWA message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use subprocess instaed of os.sys Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add example script and update notebook Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax/Paddle related comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rerun H100 benchmark Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict fp8 tests to sm90+ Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move get_cudnn_version from common to pytorch utils Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-