- 09 Jul, 2025 1 commit
-
-
Zhongbo Zhu authored
* functional passed Signed-off-by:
zhongboz <zhongboz@nvidia.com> * before zero padding in mxfp8 swizzle, use torch zeros to malloc for now Signed-off-by:
zhongboz <zhongboz@nvidia.com> * format Signed-off-by:
zhongboz <zhongboz@nvidia.com> * lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com>
-
- 26 Jun, 2025 1 commit
-
-
Zhongbo Zhu authored
* finish python ref impl for bulk alloc Signed-off-by:
zhongboz <zhongboz@nvidia.com> * c++ bulk alloc worked, still draft version Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * resolve rebase conflict Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add license Signed-off-by:
zhongboz <zhongboz@nvidia.com> * use shared_ptr to auto manage reference count Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attempt to fix misc training error Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * attempt to handle case where experts get zero token Signed-off-by:
zhongboz <zhongboz@nvidia.com> * updated with fused C++ function calls Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * experiment with reducing py object construction time Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix seg fault bug in inference mode Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fuse torch split into bulk alloc Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * rebase to latest main Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix unit test failure Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix lint error Signed-off-by:
zhongboz <zhongboz@nvidia.com> * refactor create_tensor to use get_scale_shape Signed-off-by:
zhongboz <zhongboz@nvidia.com> * refactor quantize to call quantize_cpp Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Implement separate functions for multi-tensor quantize and split + multi-tensor quantize Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update grouped linear module with fused split+quantize func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move multi-tensor quantize func to cast.cpp Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Do not expose quantizer helper function externally Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert cuDNN frontend commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> * fix corner cases with zero tokens Signed-off-by:
zhongboz <zhongboz@nvidia.com> * add comments Signed-off-by:
zhongboz <zhongboz@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 Aug, 2024 1 commit
-
-
Charlene Yang authored
* WIP: add fa3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * differentiate func/varlen_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix parsing keyword for FA3 and remove bshd->thd conversion for flash_attn_func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add FP8 fwd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add FA3 FP8 fwd code and test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix assert for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA3 FP8 logic and add tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA2 to <=2.6.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak unit tests for base/mask Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set constraints for FA3 for sm90 and causal_bottom_right Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert debug changes in benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 13 Aug, 2024 1 commit
-
-
Charlene Yang authored
* update example/benchmark scripts Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix head_dim after MLA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update notebook Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Charlene Yang authored
* add attention docs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attn doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: update attention doc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak to first draft Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up pictures Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * first draft for review Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add logging info/debug Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix of an SWA message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use subprocess instaed of os.sys Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up benchmark script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add example script and update notebook Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweaks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax/Paddle related comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rerun H100 benchmark Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * restrict fp8 tests to sm90+ Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move get_cudnn_version from common to pytorch utils Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-