- 10 Apr, 2024 1 commit
-
-
Jinze Xue authored
Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com>
-
- 06 Apr, 2024 3 commits
-
-
Reese Wang authored
* value_and_grad requires same shape for input and gradients Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use high precision layernorm Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove local_device_ids as it caused unexpected behaviors Signed-off-by:
Reese Wang <rewang@nvidia.com> * Revert "Remove local_device_ids as it caused unexpected behaviors" This reverts commit c54349b2ce1e96ae696cf0d74f5210e55002cf72. Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
Sangkug Lym authored
fix the default userbuffer communicator init settings Signed-off-by:Sangkug Lym <slym@nvidia.com>
-
Jaemin Choi authored
* Enable DGRAD RS overlap Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * fix lint; apply suggestions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Apr, 2024 4 commits
-
-
Sangkug Lym authored
* userbuffer fp8 reduction support for individual overlap Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup dict ub_cfg dict value load Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Remove unnecessary fence from producer From @erhoo82 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Args can be None Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix other arg types Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Pavel Shamis (Pasha) authored
* Fixing potential integer overflow on sequence counter Current implementation may potential cause hangs or data corruption Signed-off-by:
Pasha (Pavel) Shamis <pasharesearch@gmail.com> * Fixing typo in comments Addressing reviewers comments Signed-off-by:
Pasha (Pavel) Shamis <pasharesearch@gmail.com> --------- Signed-off-by:
Pasha (Pavel) Shamis <pasharesearch@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Apr, 2024 5 commits
-
-
Santosh Bhavani authored
* Update README.rst 1. Updated latest news with databricks blog 2. Fixed formatting issues 3. Added GTC 2024 video Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst added back overview marker for docs generation Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Added MPT-13B convergence result Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Added Levanter/JAX to integrations section of README Signed-off-by:
Santosh Bhavani <santosh@semantic.md> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update README.rst Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Santosh Bhavani <santosh@semantic.md> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* Atomic gemm for TP-AR and TP-RS overlap with P2P exchanges Signed-off-by:
Sangkug Lym <slym@nvidia.com> * FP8 reduction for atomic TP-RS with p2p exchange Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* Do not store input activations when not computing weight gradients Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix userbuffer tp comm overlap case Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
vasunvidia authored
Fix license, and sign off everything Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
Kirthi Shankar Sivamani authored
This reverts commit 965803c9.
-
- 02 Apr, 2024 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 31 Mar, 2024 1 commit
-
-
Paweł Gadziński authored
Llama tutorial fixes - all Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 29 Mar, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Fix backward compatibility with checkpoint API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
Perform FP8 cast on gathered layernorm output in LayerNormLinear Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 22 Mar, 2024 2 commits
-
-
Jaemin Choi authored
* Enable TP-AG overlap with return_layernorm_output Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> * Use ub_overlap_ag Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> --------- Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com>
-
Reese Wang authored
* Remove unused headers Signed-off-by:
Reese Wang <rewang@nvidia.com> * Unify the fused attn workspace size cpp code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Reduce the skipped cases Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename self/cross attention to qkvpacked/kvpacked Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update attention mask docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the attn mask implementations Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 21 Mar, 2024 2 commits
-
-
Sangkug Lym authored
* TP-RS overlap with send/recv Atomic GEMM based TP-RS overlap with send/recv Signed-off-by:
Sangkug Lym <slym@nvidia.com> Specify userbuffer overlap method of each overlap instance Signed-off-by:
Sangkug Lym <slym@nvidia.com> P2P TP-RS overlap with fp8 GEMM outputs Signed-off-by:
Sangkug Lym <slym@nvidia.com> Fix TP-RS overlap with send/recv Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * linting Signed-off-by:
Sangkug Lym <slym@nvidia.com> * fix typo Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Kite0011 authored
[Pytorch] Update context parallel softmax lse correction func. Signed-off-by:
kitefang <kitefang@tencent.com> Co-authored-by:
kitefang <kitefang@tencent.com>
-
- 20 Mar, 2024 2 commits
-
-
Sudhakar Singh authored
* tutorial and doc fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove extra code Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix typos Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 18 Mar, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Mar, 2024 1 commit
-
-
Rachit Garg authored
* fix the perf regression because of constant property polling of the device Signed-off-by:
Rachit Garg <rachitg@nvidia.com> * Fix lint error Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Rachit Garg <rachitg@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Rachit Garg <rachitg@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 14 Mar, 2024 1 commit
-
-
Keshav Balasubramanian authored
* disallow sharding of layernorm learnable parameters; force duplication Signed-off-by:
Keshav <keshavb@nvidia.com> * fix tests and support tensors for gamma/beta in layernorms Signed-off-by:
Keshav <keshavb@nvidia.com> * reverting Signed-off-by:
Keshav <keshavb@nvidia.com> * added tests for rank-1 gamma/beta sharding Signed-off-by:
Keshav <keshavb@nvidia.com> * fix lint errors Signed-off-by:
Keshav <keshavb@nvidia.com> --------- Signed-off-by:
Keshav <keshavb@nvidia.com>
-
- 13 Mar, 2024 2 commits
-
-
Santosh Bhavani authored
Update README.rst - Latest News Added an entry to Latest News section Signed-off-by:Santosh Bhavani <santosh@semantic.md>
-
Rachit Garg authored
Add envvar for SM margin in GEMM Signed-off-by:
Rachit Garg <rachitg@nvidia.com> Co-authored-by:
Rachit Garg <rachitg@nvidia.com>
-
- 11 Mar, 2024 1 commit
-
-
Tim Moon authored
Remove deprecated cudnn_frontend::throw_if Deprecated in cudnn-frontend 1.1.0. Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 07 Mar, 2024 1 commit
-
-
Hongbin Liu authored
* add_dtype_for_userbuf Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> * Update transformer_engine/pytorch/module/base.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix syntax Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Hongbin Liu <hongbinl@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Mar, 2024 3 commits
-
-
Oleg Goncharov authored
* Modified MHA and DPA logic to use causal softmax and FA for inference Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Adjusted unfused attention and softmax logic for inference Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Cleaned up the code per pylint Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added test cases to evaluate numerics of incremental decoding Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Apply suggestions from code review Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Apply suggestions from code review [sequence start-end] Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Apply suggestions from code review [inference_params offset update]] Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Fixed bug in KV-cache indices and updated test suite Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added inference_params description and applied suggestions from the code review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Adjusted absolute tolerances in numerics tests Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Cleaned up the files per pylint Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
George Karpenkov authored
Bias and seed can both be None, type checking is failed otherwise. Signed-off-by:George Karpenkov <george@metaworld.me>
-
Chen Cui authored
* first draft of return_layernorm_output_gathered Signed-off-by:
Chen Cui <chcui@nvidia.com> * explain use case more thoroughly in docstring Signed-off-by:
Chen Cui <chcui@nvidia.com> * add same option in `LayerNormMLP` Signed-off-by:
Chen Cui <chcui@nvidia.com> * Update transformer_engine/pytorch/module/layernorm_linear.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Chen Cui <cxcui@alumni.cmu.edu> * Update transformer_engine/pytorch/module/layernorm_linear.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Chen Cui <cxcui@alumni.cmu.edu> * address comments Signed-off-by:
Chen Cui <chcui@nvidia.com> * add same option in LayerNormMLP Signed-off-by:
Chen Cui <chcui@nvidia.com> * address linter errors Signed-off-by:
Chen Cui <chcui@nvidia.com> --------- Signed-off-by:
Chen Cui <chcui@nvidia.com> Signed-off-by:
Chen Cui <cxcui@alumni.cmu.edu> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Mar, 2024 2 commits
-
-
Jaemin Choi authored
Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com>
-
Zhenhuan Liu authored
Signed-off-by:Zhenhuan Liu <nkulzh16@gmail.com>
-
- 04 Mar, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
Update checkpoint API doc Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Jinze Xue authored
* Enable incremental CMake build Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * Update setup.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * Update setup.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * remove tempfile import Signed-off-by:
Jinze Xue <jinzex@nvidia.com> --------- Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 01 Mar, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Avoid updating real during param cast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sudhakar Singh authored
-
- 29 Feb, 2024 1 commit
-
-
Tim Moon authored
Tweak error message for invalid FP8 GEMM dims Signed-off-by:Tim Moon <tmoon@nvidia.com>
-