- 19 Aug, 2024 1 commit
-
-
Frédéric Bastien authored
Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 14 Aug, 2024 2 commits
-
-
Reese Wang authored
* Propagate sm_margin to the underly layernorm kernels --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
Phuong Nguyen authored
* implemented custom call with ffi in csrc * moved headers of misc to misc.h, add ffi.h * ActLu and DActLu lowering with ffi_lowering * CastTranspose with ffi_lowering * enabled cudaGraph * added 4d input test case to TestActivationLu * added operand_output_aliases for CastTranspose * added env var NVTE_JAX_WITH_FFI, default value = 1 * replace casting ActivationEnum by taking its value --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 08 Aug, 2024 1 commit
-
-
Reese Wang authored
* Support non-deterministic algo Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the helper function name Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move fixture to conftest.py Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 06 Aug, 2024 2 commits
-
-
Reese Wang authored
* Support actlen = 0 after cuDNN 9.3.0 Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add runtime_segment < max_segment tests Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
Charlene Yang authored
* add multi-latent attention for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax/Paddle API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix typo in test script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix too-many-boolean lint error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix lint" This reverts commit 67399a3a6f45bb4ce9e5eaa6bcce40b28e347e5b. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stride check in get_qkv_layout Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix layout_thd tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: debug info Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge conflict Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix thd pad_between_seqs=False/True tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 02 Aug, 2024 1 commit
-
-
Przemyslaw Tredak authored
* Link attention docs to the main docs and fix errors reported by Sphinx Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Lower the version of nbsphinx Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * More fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Change the URL of example_attention.py to GitHub Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * More fixes in the attention tutorial Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 25 Jul, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Specify python version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add classifiers for python Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add utils to build wheels Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * make wheel scripts Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add aarch Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * PaddlePaddle only builds for x86 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add optional fwk deps Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python3.8; catch install error Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] cudnn9 compile with paddle support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [wip] dont link cudnn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * dlopen cudnn Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * dynamically load nvrtc Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove residual packages; exclude stub from nvrtc .so search Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Exclude builtins from nvrtc .so search Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * properly include files for sdist Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * paddle wheel tie to python version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle build from src [wip] Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix workflow paddle build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix paddle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix paddle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix lint from pr986 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add sanity wheel test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add sanity import to wheel test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove upper limit on paddlepaddle version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove unused imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove pybind11 dependency Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cpp tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Search .sos in cuda home Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CLeanup, remove residual code Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Jul, 2024 1 commit
-
-
Reese Wang authored
* Add enabled() to BasePrimitive * Add layernorm/rmsnorm fallback * Add cast_fp8 fallback * Add transpose/cast_transpose XLA fall back * Act_lu fallback * Add transpose fallback * Add softmax fallback * Unify the use of _cast_fp8 * Add tests for NVTE_JAX_CUSTOM_CALLS_RE --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 10 Jul, 2024 1 commit
-
-
Charlene Yang authored
* add cuDNN swa Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix SWA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add set_deterministic and minor fixes for swa Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add AttentionParams Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change window_size to int64_t; fix swa/determinism tests; cache _attention_backends Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add window_size to get_backend; fix jax and paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; add set_deter to bwd_impl Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FP8 tests due to determinism Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add support matrix for SWA and bias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes and lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add wording on window_size special cases Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor tweak on wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax assertion error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * call bwd with deterministic=true for jax/paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism words in documentation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 09 Jul, 2024 1 commit
-
-
Phuong Nguyen authored
* removed singleton wrapper Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 08 Jul, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Build for python < 3.8 Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jul, 2024 1 commit
-
-
Reese Wang authored
* Integrate experimental ragged offset Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use per sequence based offsets Signed-off-by:
Reese Wang <rewang@nvidia.com> * Format Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove v/o_seq_offsets Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add FP16 sanity tests and remove forward tests from the automatically run tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance input checks Signed-off-by:
Reese Wang <rewang@nvidia.com> * Separate fused attn to 2 differnt APIs and add the docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add experimental to the docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add runtime segments check Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove finished TODO Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 18 Jun, 2024 1 commit
-
-
Charlene Yang authored
* simplify offset tensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes; tests pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix C lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_offset with with_padding Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace with_padding with padded Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes after merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix for fused attn fwd/bwd calls Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Jax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust spacing in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix pytorch tests; fix paddle api Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn_biases Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix AttnFuncWithCP backward Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix attn with CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix paddle Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Jun, 2024 1 commit
-
-
Alp Dener authored
replaced plain C asserts with NVTE_CHECK to avoid unused-variable warnings Signed-off-by:Alp Dener <adener@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2024 1 commit
-
-
Phuong Nguyen authored
* Splitted cpp_extensions.py, renamed mlp.py and fused_attn.py Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixed import in tests Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 12 Jun, 2024 1 commit
-
-
Ming-Xu Huang authored
* Reformat FP8 Meta 1. Reformat FP8 meta to be one-set-per-tensor. 2. Remove fp8_max and scale_inv. 3. Remove unused functions in fp8.py Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix unit-tests Signed-off-by:
Ming Huang <mingh@nvidia.com> * Remove ShardingType and MajorShardingType Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix lint errors Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fixed unittests. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Rename few variables. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Add jit to update_amax_list Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fixed naming error in LayernormMLP Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fixed bugs in test_distributed_layernorm_mlp.py Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
- 11 Jun, 2024 1 commit
-
-
Keshav Balasubramanian authored
Signed-off-by:Keshav <keshavb@nvidia.com>
-
- 10 Jun, 2024 1 commit
-
-
Phuong Nguyen authored
- Made order of gated act consistent in all branches Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 08 Jun, 2024 1 commit
-
-
Phuong Nguyen authored
* categorized `csrc/modules.cpp` Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * adapted the build tool Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove interval arg from recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove usage of interval and use explicit kwarg for testing recipes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Cleanup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 May, 2024 1 commit
-
-
Charlene Yang authored
* add THD support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add seq_offsets_o and use new offset calculation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * addition to previous commit; fix unit test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add None for offset_o gradient Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: test padding between sequences Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix tests for padding between sequences Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix tests for sbhd/bshd layouts; clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend and add tests for max_seqlen_q=1 and d=256 for inference Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test sbhd/bshd layouts for sq1, d256 inference case Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace wording from accumulative to cumulative Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add offset tensors to custom fp8 mha tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add version control for cuDNN Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add sm>=90 constraint for thd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cuDNN support for sq=1, d=256 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint and minor tweak for fp8 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * modify cudnn version and restrict MQA/GQA support for THD Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add notes for seq offset tensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add dummy tensor to pass jax build Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add dummy tensor to pass paddle build Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 22 May, 2024 1 commit
-
-
Ming-Xu Huang authored
* Fixed the shape mismatching issue in MLP. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Add a corresponding test Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 15 May, 2024 1 commit
-
-
Ming-Xu Huang authored
Remove act_enum from the del list ActLuPrimitive.partition Signed-off-by:
Ming Huang <mingh@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 14 May, 2024 1 commit
-
-
Phuong Nguyen authored
fixed batcher in dbias_cast_transpose primitive Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 13 May, 2024 1 commit
-
-
Phuong Nguyen authored
* renamed gelu to act * added relu, srelu, qgelu * fixes initialization for layernorm_fp8_mlp tests * moved activation_fp8 prim into testunit file * Moved NVTE_Activation_Enum to common/.../activation.h --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 09 May, 2024 1 commit
-
-
Phuong Nguyen authored
* fixes for ActLuPrimitive in PAXML * changed indices for arg_infos in sharding func in dbias_cast_transpose primitive --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 03 May, 2024 1 commit
-
-
Phuong Nguyen authored
* templated primitives and respective C++ functions Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixes for LayerNormMLP, tests in test_custom_compute all passed Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * added default arg for pybind get_workspace_size funcs Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixes for TestTransFormer with non-gated act tests Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * renamed gelu to act Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * improved enum implementation, avoid using magic numbers Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Exposed C++ ActivationEnum to python side Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Changed error messages Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * changed conditional check on input shape for dbias_cast_transpose Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * changed dtype (tol) for bias grad tests Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixes so that layer_norm_fp8_mlp can take bias = None Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Set bias = None in flax modules Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 02 May, 2024 1 commit
-
-
Reese Wang authored
* Add layernorm_fp8_dot unit test Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update the softmax primitives support conditions Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add tests for the softmax primitives Signed-off-by:
Reese Wang <rewang@nvidia.com> * Round1 refactor of test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Split dropout arguments of ref code and add hidden/intermediate dropout elementwise comparison Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add dropout_braodcast_dim, self_attn_mask tests and clean a few code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Abstract test layer and fix a rope reference code diff Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add bias tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add epsilon and float32 tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add relpos_bias and attention dropout tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Loose the atol Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move common fixtures to conftest.py Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add doc string for test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add doc string for test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix conflicts of test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Avoid to left bias parameters in graph when use_bias=False Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 01 May, 2024 1 commit
-
-
Ming-Xu Huang authored
* Support FP8 Meta Dtype (FM32) and Align FP8 Scale Update with PyTorch. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Modify with the feedback of code review Signed-off-by:
Ming Huang <mingh@nvidia.com> * Hiding FlaxFloatMeta32 inside fp8.py Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make functions to be JAX tracable objects. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Rebased with mian. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Update jax images for github workflow. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
- 24 Apr, 2024 2 commits
-
-
Phuong Nguyen authored
* Implemented swiglu and silu Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Renamed nvte-*silu to nvte-*swish + generalized GetDBiasDact functions Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* combined layernorm_geglu with layernorm_gelu into fused_layernorm Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixes to pass all unit tests in test_custom_call_compute.py, test_layer.py, and test_praxis_layer.py Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * cleaning and formatting Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * renaming based on reviewers suggestions Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * implemented partial fused layernorm Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * geglu + bias passed tests Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * added partial fused calculation for dbias_1 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * clean up Co-authored-by:
Alp Dener <adener@nvidia.com> Signed-off-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com> Co-authored-by:
Alp Dener <adener@nvidia.com>
-
- 19 Apr, 2024 1 commit
-
-
Ming-Xu Huang authored
* Allow multi-dims for dgamma and dbeta in LN descriptor. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix the jit error in examples/jax Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-
- 16 Apr, 2024 1 commit
-
-
Ming-Xu Huang authored
-
- 22 Mar, 2024 1 commit
-
-
Reese Wang authored
* Remove unused headers Signed-off-by:
Reese Wang <rewang@nvidia.com> * Unify the fused attn workspace size cpp code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Reduce the skipped cases Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename self/cross attention to qkvpacked/kvpacked Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update attention mask docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the attn mask implementations Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 14 Mar, 2024 1 commit
-
-
Keshav Balasubramanian authored
* disallow sharding of layernorm learnable parameters; force duplication Signed-off-by:
Keshav <keshavb@nvidia.com> * fix tests and support tensors for gamma/beta in layernorms Signed-off-by:
Keshav <keshavb@nvidia.com> * reverting Signed-off-by:
Keshav <keshavb@nvidia.com> * added tests for rank-1 gamma/beta sharding Signed-off-by:
Keshav <keshavb@nvidia.com> * fix lint errors Signed-off-by:
Keshav <keshavb@nvidia.com> --------- Signed-off-by:
Keshav <keshavb@nvidia.com>
-
- 06 Mar, 2024 1 commit
-
-
George Karpenkov authored
Bias and seed can both be None, type checking is failed otherwise. Signed-off-by:George Karpenkov <george@metaworld.me>
-
- 28 Feb, 2024 1 commit
-
-
cyanguwa authored
* added support for arbitrary bias shapes for fused_attn Signed-off-by:
Alp Dener <adener@nvidia.com> * Fix linting Signed-off-by:
Alp Dener <adener@nvidia.com> * Add b1ss/bhss/11ss bias shapes when not requiring dBias Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add bias_b/h to plan cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixed compile errors after PR653 merge Signed-off-by:
Alp Dener <adener@nvidia.com> * updated JAX unittests for new bias shapes Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed mismatched mask type checking Signed-off-by:
Alp Dener <adener@nvidia.com> * corrected skip condition Signed-off-by:
Alp Dener <adener@nvidia.com> * fix selection logic for A100s Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * corrected skip checks for bias shapes Signed-off-by:
Alp Dener <adener@nvidia.com> * resolved test issues but neginf with float16 is still problematic with JAX Signed-off-by:
Alp Dener <adener@nvidia.com> * new bias shapes passing TE JAX CI for seqlen <= 512, seq_q == seq_kv and h_q == h_kv conditions Signed-off-by:
Alp Dener <adener@nvidia.com> * TE/JAX fused attn tests for new bias shapes passing with neg_inf=-2**27 for Bfloat16 and -2**15 for Float16 Signed-off-by:
Alp Dener <adener@nvidia.com> * code style fixes and test parameter ID cleanup Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed incorrect skip condition for backward fused attn test Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Alp Dener <adener@nvidia.com>
-