"qa/L0_pytorch_distributed_unittest/test.sh" did not exist on "f8eb799aee047fda46723dc8bf323bc5e552d8fd"
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 22 Sep, 2023 1 commit
-
-
zlsh80826 authored
* Eliminate amax_and_scale_update bubbles Signed-off-by:
rewang <rewang@nvidia.com> * Add CUDA check Signed-off-by:
rewang <rewang@nvidia.com> --------- Signed-off-by:
rewang <rewang@nvidia.com>
-
- 06 Sep, 2023 1 commit
-
-
Tian Zheng authored
* Add recompute Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support recompute core attention Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix transformer layer recompute Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add doc Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve recompute test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve performance of stack backtrace Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve code stype Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix code style Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 26 Aug, 2023 1 commit
-
-
Tian Zheng authored
* [Paddle] Add TP, DP, PP, FSDP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Minor fix Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI failure Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove set_nccl_overlap_warning_if_tp Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve variable naming Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Refactor FP8 Buffer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Stylic changes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP32 parallel training Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix numel performance issue Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Squashed commit of the following: commit 79e2e5fd774e67dcdda9aae01a9f31a6479c5d70 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 14:39:16 2023 +0000 Add TP test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> commit 1d40ad60540490f97ed82ba877cc6eda8902cbf6 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 14:22:25 2023 +0000 Fix tp_size when disabled Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> commit 6632f735a0c8251862355fc74622af59fae3a509 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 05:52:18 2023 +0000 Add TP for attention and transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add shape check Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add FSDP check for stage 1,2,3 Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Review changes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix group_sharding test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support NVTE_FUSE_ATTN Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI errors Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-