- 21 Oct, 2024 1 commit
-
-
Charlene Yang authored
remove one FA version in the L3 test Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 18 Oct, 2024 3 commits
-
-
Tim Moon authored
Remove PyTorch L0 distributed test Forgot to remove in #1255. Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Tim Moon authored
* Debug wheel test for PaddlePaddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix typo Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
Tim Moon authored
* Reorganize PyTorch L1 tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move ONNX tests to L1 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move FA version test to L3 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Limit parallel build jobs in FA version test Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 17 Oct, 2024 4 commits
-
-
Xiaowei Ren authored
fix seq_dim in CP implementation Signed-off-by:Xiaowei Ren <xren@nvidia.com>
-
Phuong Nguyen authored
* register CmdBufferCompatible traits via C++ API * renamed FFI_Traits * use register_ffi_target() --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
Xin Yao authored
* fix bias for 0-dim tensor Signed-off-by:
Xin Yao <xiny@nvidia.com> * add check Signed-off-by:
Xin Yao <xiny@nvidia.com> * use numel() instead of nullptr Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com>
-
Xin Yao authored
Fix wgrad for GroupedLinear when weights doesn't require grad Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Oct, 2024 6 commits
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemyslaw Tredak <ptredak@nvidia.com>
-
Kirthi Shankar Sivamani authored
Fix FP8 activation recompute Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Upgrade pylint and first round formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * round 2 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * round 3 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Format and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Paddle lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * FIxes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Run formatter Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Paddle lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Charlene Yang authored
* WIP: make FA2 optional Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: fix logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweak Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add L1 test to test all supported FA versions Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update version to 2.1.1 and trim L1 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update onnxruntime version Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove onnxruntime from L1 FA versions tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Md Fahim Faysal Khan authored
fixed assertion bug for SWA Signed-off-by:
Md Fahim Faysal Khan <mdfahimfaysa@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
Tim Moon authored
* Build custom ORT ops before running ONNX tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove ONNX from context parallelism tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Export ONNX ops that do compute in FP32 Matches internal impl of TE kernels. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add build script for custom ORT ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 15 Oct, 2024 2 commits
-
-
Santosh Bhavani authored
* Create README.md added all PyT examples Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * Update README.md - Added JAX, PaddlePaddle, and third-party examples - Fixed DL framework links - Removed issue request for new PRs Signed-off-by:
Santosh Bhavani <sbhavani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> Signed-off-by:
Santosh Bhavani <sbhavani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Michael Goldfarb authored
Update test to check support for context parallel attention. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 14 Oct, 2024 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 12 Oct, 2024 1 commit
-
-
Xin Yao authored
* Let Fused RoPE support THD with CP Signed-off-by:
Xin Yao <xiny@nvidia.com> * add comment Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Xiaowei Ren <103958965+xrennvidia@users.noreply.github.com>
-
- 11 Oct, 2024 2 commits
-
-
Xiaowei Ren authored
* fa2 function import renaming Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * refine fa_fwd_kwargs and fa_bwd_kwargs Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * import FA3 fucntions for CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix output of FA3 fwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix rng_state in a2a implementation with FA3 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * hack lse correction for packed lse format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make CP thd out correction work with packed lse Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix for packed softmax_lse Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix softmax_lse shape Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change lse_packed to constexpr Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
李金梁 authored
* Fix bug in torch compile and seqdim is integer Signed-off-by:
李金梁 <975761915@qq.com> * Update attention.py change the jit_fuser to torch.compile on flash_attn_fwd_out_correction Signed-off-by:
李金梁 <975761915@qq.com> * Annotate fused functions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
李金梁 <975761915@qq.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Oct, 2024 2 commits
-
-
Przemyslaw Tredak authored
* Fixes to Float8Tensor Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Hua Huang authored
* Expose JAX sliding window attn API Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No SWA in context parallel; fix RNG seed in test Signed-off-by:
Hua Huang <huah@nvidia.com> * Handle SAW API discrepancy in cuDNN and Python Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add SAW API for flax, all tests passed Will update tests/jax/test_praxis_layers.py next Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_praxis_layers.py for SWA, test passed Signed-off-by:
Hua Huang <huah@nvidia.com> * Use tuple window_size; update for PR #1212 Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add and adjust some pytest.skip Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revised following Reese Wang's comments Still need further debugging: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: These errors does not exist in the previous commit Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix no-SWA test case errors in previous commit Signed-off-by:
Hua Huang <huah@nvidia.com> * Add Padding mask w/ sliding windows sanity tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use float32 for the reference code softmax calculation Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Reese Wang <rewang@nvidia.com>
-
- 09 Oct, 2024 3 commits
-
-
Charlene Yang authored
* improve get_attention_backend logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * polish logic and wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove redundant comment Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Charlene Yang authored
* add extra_state change description for different TE versions Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add FAQ page Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FAQ page Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix extra_state tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Tim Moon authored
* Handle Float8Tensor when casting module dtype Keep data in Float8Tensor and only change nominal dtype. Monkey-patch PyTorch module casting functions to handle Float8Tensor. Add tests. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Respect autocast dtype in linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Suppress linter warning Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Suppress linter warning Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak comments Review suggestion from @ptrendx Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 08 Oct, 2024 1 commit
-
-
Charlene Yang authored
* add qkv descales to FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix sbhd shapes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * force the same dtype when comparing FA3 and cuDNN FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "force the same dtype when comparing FA3 and cuDNN FP8" This reverts commit 19e7f877026a19a32d2f02c6c9de20df4ae2e064. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force the same dtype when comparing FA3 and cuDNN FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add try/except for FA3 when custom qkv descales are not supported Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace FA3 installation warning with a debug logging message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * avoid varlen_func for FP8 and improve messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add SWA support for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change preference reason for FP8 logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Oct, 2024 3 commits
-
-
Charlene Yang authored
* adjust window size to (i-window_size_left,i] for cuDNN Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reduce the window to make any errors more pronouced Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Xiaowei Ren authored
* change API for hierarchical CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * move fp8 code before qkv reshape Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * try to insert A2A for hierarchical CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make fwd work Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove a redundant sync Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * make bwd of hierarchical CP work Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix dout a2a in bwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix q_f16 with fp8 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert hierarchical CP implementation does not support THD format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * assert hierarchical CP does not support attn bias Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add unit test for hierarchical CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix cp_comm_type in unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix and code cleaning Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * an assert info change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * dout shape fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move function definitions to the front of the first call Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix tensor view comments Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * refine CP unit test Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * save cp_size_a2a and rank_a2a in fwd Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add more explainations of cp_group in doc_string Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* Tests for distributed Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added the test to the qa script Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Changed qa Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix to test_numerics file Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * pr fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@pgadzinski-mlt.client.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/distributed/run_numerics.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@pgadzinski-mlt.client.nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@pgadzinski-mlt.client.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 06 Oct, 2024 1 commit
-
-
Emmanuel Ferdman authored
Signed-off-by:
Emmanuel Ferdman <emmanuelferdman@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Oct, 2024 1 commit
-
-
Tim Moon authored
* CPU perf optimization in linear autograd function Avoid enable_grad context when possible in cast function. Cache distributed group properties. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * CPU perf optimization in prepare_forward function Avoid torch.nn.Module impl of __setattr__. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid module import in TE module forwards Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use fast getter for params Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse tensor dims in linear autograd func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply optimizations to grouped linear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid deepcopy in tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move _fast_setattr logic to __setattr__ method Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 03 Oct, 2024 1 commit
-
-
Charlene Yang authored
move block_table arg to varlen_func section Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 01 Oct, 2024 3 commits
-
-
Przemyslaw Tredak authored
* Removing the unused options from GroupedLinear docs and fixing the bug with offsets Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * offsets -> fp8_meta_offsets Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
Fix Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Add pool argument to make_graphed_callable Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 27 Sep, 2024 3 commits
-
-
Xiaowei Ren authored
skip FP8 CP tests if hardware does not support FP8 Signed-off-by:Xiaowei Ren <xren@nvidia.com>
-
Charlene Yang authored
* fix detection of 3 in 3hd/h3d layouts Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * error out when invalid layout group is provided Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* Docs fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * docs fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * docs fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 25 Sep, 2024 1 commit
-
-
Sangkug Lym authored
* fix NVTE_UB_WITH_MPI read Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Add default value Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 24 Sep, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Add new users to CI Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-