- 06 Apr, 2024 2 commits
-
-
Sangkug Lym authored
fix the default userbuffer communicator init settings Signed-off-by:Sangkug Lym <slym@nvidia.com>
-
Jaemin Choi authored
* Enable DGRAD RS overlap Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * fix lint; apply suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Apr, 2024 3 commits
-
-
Sangkug Lym authored
* userbuffer fp8 reduction support for individual overlap Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup dict ub_cfg dict value load Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Remove unnecessary fence from producer From @erhoo82 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Args can be None Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix other arg types Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Pavel Shamis (Pasha) authored
* Fixing potential integer overflow on sequence counter Current implementation may potential cause hangs or data corruption Signed-off-by:
Pasha (Pavel) Shamis <pasharesearch@gmail.com> * Fixing typo in comments Addressing reviewers comments Signed-off-by:
Pasha (Pavel) Shamis <pasharesearch@gmail.com> --------- Signed-off-by:
Pasha (Pavel) Shamis <pasharesearch@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Apr, 2024 4 commits
-
-
Sangkug Lym authored
* Atomic gemm for TP-AR and TP-RS overlap with P2P exchanges Signed-off-by:
Sangkug Lym <slym@nvidia.com> * FP8 reduction for atomic TP-RS with p2p exchange Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* Do not store input activations when not computing weight gradients Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix userbuffer tp comm overlap case Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
vasunvidia authored
Fix license, and sign off everything Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
Kirthi Shankar Sivamani authored
This reverts commit 965803c9.
-
- 29 Mar, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Fix backward compatibility with checkpoint API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
Perform FP8 cast on gathered layernorm output in LayerNormLinear Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 22 Mar, 2024 1 commit
-
-
Jaemin Choi authored
* Enable TP-AG overlap with return_layernorm_output Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Use ub_overlap_ag Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com>
-
- 21 Mar, 2024 2 commits
-
-
Sangkug Lym authored
* TP-RS overlap with send/recv Atomic GEMM based TP-RS overlap with send/recv Signed-off-by:
Sangkug Lym <slym@nvidia.com> Specify userbuffer overlap method of each overlap instance Signed-off-by:
Sangkug Lym <slym@nvidia.com> P2P TP-RS overlap with fp8 GEMM outputs Signed-off-by:
Sangkug Lym <slym@nvidia.com> Fix TP-RS overlap with send/recv Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * linting Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix typo Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Kite0011 authored
[Pytorch] Update context parallel softmax lse correction func. Signed-off-by:
kitefang <kitefang@tencent.com> Co-authored-by:
kitefang <kitefang@tencent.com>
-
- 20 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Mar, 2024 1 commit
-
-
Rachit Garg authored
* fix the perf regression because of constant property polling of the device Signed-off-by:
Rachit Garg <rachitg@nvidia.com> * Fix lint error Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Rachit Garg <rachitg@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Rachit Garg <rachitg@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 13 Mar, 2024 1 commit
-
-
Rachit Garg authored
Add envvar for SM margin in GEMM Signed-off-by:
Rachit Garg <rachitg@nvidia.com> Co-authored-by:
Rachit Garg <rachitg@nvidia.com>
-
- 07 Mar, 2024 1 commit
-
-
Hongbin Liu authored
* add_dtype_for_userbuf Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> * Update transformer_engine/pytorch/module/base.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix syntax Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Hongbin Liu <hongbinl@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Mar, 2024 2 commits
-
-
Oleg Goncharov authored
* Modified MHA and DPA logic to use causal softmax and FA for inference Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Adjusted unfused attention and softmax logic for inference Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Cleaned up the code per pylint Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added test cases to evaluate numerics of incremental decoding Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Apply suggestions from code review Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Apply suggestions from code review [sequence start-end] Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Apply suggestions from code review [inference_params offset update]] Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Fixed bug in KV-cache indices and updated test suite Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added inference_params description and applied suggestions from the code review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Adjusted absolute tolerances in numerics tests Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Cleaned up the files per pylint Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Chen Cui authored
* first draft of return_layernorm_output_gathered Signed-off-by:
Chen Cui <chcui@nvidia.com> * explain use case more thoroughly in docstring Signed-off-by:
Chen Cui <chcui@nvidia.com> * add same option in `LayerNormMLP` Signed-off-by:
Chen Cui <chcui@nvidia.com> * Update transformer_engine/pytorch/module/layernorm_linear.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Chen Cui <cxcui@alumni.cmu.edu> * Update transformer_engine/pytorch/module/layernorm_linear.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Chen Cui <cxcui@alumni.cmu.edu> * address comments Signed-off-by:
Chen Cui <chcui@nvidia.com> * add same option in LayerNormMLP Signed-off-by:
Chen Cui <chcui@nvidia.com> * address linter errors Signed-off-by:
Chen Cui <chcui@nvidia.com> --------- Signed-off-by:
Chen Cui <chcui@nvidia.com> Signed-off-by:
Chen Cui <cxcui@alumni.cmu.edu> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Mar, 2024 1 commit
-
-
Jaemin Choi authored
Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com>
-
- 04 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Update checkpoint API doc Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 01 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Avoid updating real during param cast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 Feb, 2024 1 commit
-
-
Tim Moon authored
Tweak error message for invalid FP8 GEMM dims Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 28 Feb, 2024 1 commit
-
-
cyanguwa authored
* added support for arbitrary bias shapes for fused_attn Signed-off-by:
Alp Dener <adener@nvidia.com> * Fix linting Signed-off-by:
Alp Dener <adener@nvidia.com> * Add b1ss/bhss/11ss bias shapes when not requiring dBias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add bias_b/h to plan cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixed compile errors after PR653 merge Signed-off-by:
Alp Dener <adener@nvidia.com> * updated JAX unittests for new bias shapes Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed mismatched mask type checking Signed-off-by:
Alp Dener <adener@nvidia.com> * corrected skip condition Signed-off-by:
Alp Dener <adener@nvidia.com> * fix selection logic for A100s Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * corrected skip checks for bias shapes Signed-off-by:
Alp Dener <adener@nvidia.com> * resolved test issues but neginf with float16 is still problematic with JAX Signed-off-by:
Alp Dener <adener@nvidia.com> * new bias shapes passing TE JAX CI for seqlen <= 512, seq_q == seq_kv and h_q == h_kv conditions Signed-off-by:
Alp Dener <adener@nvidia.com> * TE/JAX fused attn tests for new bias shapes passing with neg_inf=-2**27 for Bfloat16 and -2**15 for Float16 Signed-off-by:
Alp Dener <adener@nvidia.com> * code style fixes and test parameter ID cleanup Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect skip condition for backward fused attn test Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Alp Dener <adener@nvidia.com>
-
- 24 Feb, 2024 1 commit
-
-
Alp Dener authored
* added non-reentrant mode support to TE checkpoint Signed-off-by:
Alp Dener <adener@nvidia.com> * updated get_cuda_rng_tracker kwarg to get_rng_state_tracker to remain consistent with other TE API Signed-off-by:
Alp Dener <adener@nvidia.com> * docstring cleanup Signed-off-by:
Alp Dener <adener@nvidia.com> * added mechanism to disable bias_gelu_nvfusion in LayerNormMLP when checkpointing in non-reentrant mode Signed-off-by:
Alp Dener <adener@nvidia.com> * refactored checkpoint and recompute hook names to match PyTorch implementation Signed-off-by:
Alp Dener <adener@nvidia.com> * Fixed incorrect reference before assignment Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed argument error in calling native PyTorch checkpoint Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed linting errors for missing docstrings Signed-off-by:
Alp Dener <adener@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * bias GELU fusion consistency between checkpoint test and reference comparison Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Feb, 2024 1 commit
-
-
Alp Dener authored
* Added QuickGELUActivation from HuggingFace/Transformers to common and pytorch Signed-off-by:
Alp Dener <adener@nvidia.com> * Removing 'qgelu' from double-size activations list in LayerNormMLP. Signed-off-by:
Alp Dener <adener@nvidia.com> * indent fix Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com>
-
- 15 Feb, 2024 2 commits
-
-
Przemyslaw Tredak authored
* Use fused implementation of RoPE in MultiHeadAttention Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix freqs dtype Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
* Add option to avoid updating transpose cache when possible Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix typo Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use string kwarg for FP8 transpose caching Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused attr Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 14 Feb, 2024 1 commit
-
-
Jaemin Choi authored
* Pass knobs for TP comm overlap instead of env vars Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Comment out debugging print Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Remove docstring Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Remove debugging output Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com>
-
- 12 Feb, 2024 1 commit
-
-
Jaemin Choi authored
* Support GEMM-GELU fusion with split AG overlap Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Fix linter complaints Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jaemin Choi <minitu77@gmail.com> * Avoid code duplication Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Fix issue with modifying tuple Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Disable GEMM-GELU fusion when split AG overlap is not enabled Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Add ub_split_ag parameter to LayerNormMLP unit test Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Move knob into LayerNormMLP, auto-disable fusion when split AG overlap is not enabled Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Revert changes to test_layernorm_mlp_accuracy Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jaemin Choi <minitu77@gmail.com> --------- Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Signed-off-by:
Jaemin Choi <minitu77@gmail.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 08 Feb, 2024 4 commits
-
-
Tim Moon authored
* Implement fused kernel for FP8 scale update Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused kernel for amax and scale update Add unit test. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Replace paddle.fluid imports with paddle.base Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move fused kernel to core library Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use FP8 update kernel in Paddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug FP8 scale update in Paddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix lint errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Paddle test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make update kernel in-place for PyTorch Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Revert cudnn-frontend commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Use cloned scale_inv for fp8 cast Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Oleg Goncharov authored
* Added new unfused softmax cuda kernel to support causal attention mask Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added test suite for unfused causal softmax kernel Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Removed test cases with large matrices from the causal softmax test suite Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Cleaned up the code per lint Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added a compute buffer to causal softmax testing suite to store intermediate results without casting Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added more tests cases Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Relaxed absolute tolerance atol Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Relaxed absolute tolerance for BF16 Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com>
-
cyanguwa authored
* test alibi between fa and fu Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move alibi slopes and bias to global to avoid repeating calculation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix alibi slopes/bias generation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix _is_flash_attention_supported to allow alibi type Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable padding mask when alibi is used for fused attn arbi backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add support for custom [n_heads] alibi_slopes in flash, fused, unfused attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove alibi_type=none tests as they are unnecessary Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend to 1.0.2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change bias/dbias shape to allow b,1/1,h/b,h in arbi backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * tweak tests for arbi post_scale_bias [1,h,s,s] or alibi_slopes [n_heads] Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change bias/dbias shape in max512 backend - incomplete Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove max512 changes from last commit and disable max512 (and arbi temporarily) for [b, h, s, s]; pending cuDNN backend support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up and tweak backend selection logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace || with () in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix bias shape for max512 backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * combine slopes/bias generation to one function get_alibi() and fix alibi tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix PR557 bugs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * encapsulate global alibi tensors into a dict cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * reduce alibi slopes test size Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to cudnn-frontend 1.0.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use dBias shape to define bias_b/bias_h because jax materializes dBias rather than Bias in bwd abstract Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 06 Feb, 2024 1 commit
-
-
Tim Moon authored
Do not cache sequence lengths based on layer number Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Feb, 2024 3 commits
-
-
Przemyslaw Tredak authored
* Add zero_centered_gamma option to RMSNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Improving tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * More improvements to tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Tweaking the tolerances Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix LayerNormMLP test Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Update transformer_engine/common/rmsnorm/rmsnorm_api.cpp Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/common/rmsnorm/rmsnorm_api.cpp Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * docs suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Tweak tolerances with bfloat16 Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
JimmyZhang12 authored
* fixes for recomputation Signed-off-by:
Jimmy Zhang <jiemingz@nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix onnx export [wip] Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * register op; fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Jimmy Zhang <jiemingz@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
cyanguwa authored
* Update cudnn frontend to 1.0.3 to fix cudnn v9 Nans Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * make d_out contiguous for bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove cudnnDestroy to let torch handle it Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 31 Jan, 2024 1 commit
-
-
Tim Moon authored
Do not allocate FP8 workspace buffers when params are FP8 Signed-off-by:Tim Moon <tmoon@nvidia.com>
-