- 21 Feb, 2024 1 commit
-
-
Shijie authored
* use separate qkv Signed-off-by:
jaywan <jaywan@nvidia.com> add support for GQA Signed-off-by:
jaywan <jaywan@nvidia.com> minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> change rtol Signed-off-by:
Shijie Wang <jaywan@nvidia.com> fix reshape issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> add rmsnorm and rotary position embedding Signed-off-by:
Shijie Wang <jaywan@nvidia.com> update rmsnorm Signed-off-by:
Shijie Wang <jaywan@nvidia.com> refactor layernorm and rmsnorm Signed-off-by:
Shijie Wang <jaywan@nvidia.com> support swiglu Signed-off-by:
Shijie Wang <jaywan@nvidia.com> add fused rope Signed-off-by:
Shijie Wang <jaywan@nvidia.com> minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> add rope api to __init__ Signed-off-by:
Shijie Wang <jaywan@nvidia.com> minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> fix fp8 dtype issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * simplify ut cases Signed-off-by:
jaywan <jaywan@nvidia.com> * Update transformer_engine/paddle/layer/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Shijie <505749828@qq.com> * fix name issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
jaywan <jaywan@nvidia.com> Signed-off-by:
Shijie <505749828@qq.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 26 Jan, 2024 1 commit
-
-
Shijie authored
* use separate qkv Signed-off-by:
jaywan <jaywan@nvidia.com> * add support for GQA Signed-off-by:
jaywan <jaywan@nvidia.com> * minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * change rtol Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix reshape issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
jaywan <jaywan@nvidia.com> Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 12 Jan, 2024 1 commit
-
-
Tian Zheng authored
* Actively free tensor in bwd Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * - Add inplace support for fp8 casting - Allow skipping weight update in fp8 meta update Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support weight caching for Linear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for LayernormLinear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for LayerNormMLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for Transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add PP unittests Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 21 Nov, 2023 1 commit
-
-
Shijie authored
* fix cudnn FA softmax shape Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * set inplace rng_state Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 08 Nov, 2023 1 commit
-
-
zlsh80826 authored
* Deprecate QKV_INTERLEAVED use in JAX Signed-off-by:
Reese Wang <rewang@nvidia.com> * Deprecate QKV_INTERLEAVED use in Paddle Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance qkv enum mappings Signed-off-by:
rewang <rewang@nvidia.com> * Fix LD_LIBRARY_PATH issue Signed-off-by:
rewang <rewang@nvidia.com> * Arbitrary seqlen kernels only support self attention currently Signed-off-by:
rewang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Signed-off-by:
rewang <rewang@nvidia.com>
-
- 03 Oct, 2023 1 commit
-
-
Shijie authored
* fix mask conversion and rng_state Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * refactor fused attn Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * use CUB to do prefix sum Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fuse dropout add Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * optimize kernel Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * Debug merge errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 17 Aug, 2023 1 commit
-
-
Shijie authored
* Add nn.layer: softmax, attention, transformer Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * code refactor Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * code refactor Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * update docs and set dropout=0.1 Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * Update transformer_engine/paddle/layer/attention.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 01 Aug, 2023 1 commit
-
-
Tian Zheng authored
* Add FP8 support - Add FP8 recipe - Add FP8 path for nn layers - Add MNIST FP8 example Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Update README Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix LayerNormMLP FP8 backward Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP8 training in float32 accumulation Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP8 checkpointing for non forward execution cases (same as #323) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Refactors and improvements for better code stype, readability and organization Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove unnecassary pylint override Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 20 Jul, 2023 1 commit
-
-
Shijie authored
* add flash attn tests Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * update flash attn Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix random seed Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jul, 2023 1 commit
-
-
Shijie authored
* add more ops Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * add skipif Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix bug Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor change Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor change on coding style Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 22 Jun, 2023 1 commit
-
-
Tian Zheng authored
* Add cast_transpose Add gelu, gelu_fp8 Add cast_transpose_bgrad_dgelu Add layernorm_fwd and layernorm_fwd_fp8 Add layernorm_bwd Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix missing header Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2023 1 commit
-
-
Tian Zheng authored
* First step of PaddlePaddle integration - Add build option for paddle - Add basic test framework - Add 3 basic operators: cast_from_fp8, cast_to_fp8, gemm Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support paddle build Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add paddle build support for new building framework Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Clean up build process for Paddle stub file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix pylint "wrong-import-order" warning Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Skip BF16 GEMM tests for unsupported arch Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-