- 02 May, 2024 1 commit
-
-
Reese Wang authored
* Add layernorm_fp8_dot unit test Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update the softmax primitives support conditions Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add tests for the softmax primitives Signed-off-by:
Reese Wang <rewang@nvidia.com> * Round1 refactor of test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Split dropout arguments of ref code and add hidden/intermediate dropout elementwise comparison Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add dropout_braodcast_dim, self_attn_mask tests and clean a few code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Abstract test layer and fix a rope reference code diff Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add bias tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add epsilon and float32 tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add relpos_bias and attention dropout tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Loose the atol Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move common fixtures to conftest.py Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add doc string for test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add doc string for test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix conflicts of test_layer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Avoid to left bias parameters in graph when use_bias=False Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 24 Apr, 2024 1 commit
-
-
Phuong Nguyen authored
* Implemented swiglu and silu Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Renamed nvte-*silu to nvte-*swish + generalized GetDBiasDact functions Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 16 Apr, 2024 1 commit
-
-
Ming-Xu Huang authored
-
- 22 Mar, 2024 1 commit
-
-
Reese Wang authored
* Remove unused headers Signed-off-by:
Reese Wang <rewang@nvidia.com> * Unify the fused attn workspace size cpp code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Reduce the skipped cases Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename self/cross attention to qkvpacked/kvpacked Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update attention mask docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the attn mask implementations Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 27 Feb, 2024 1 commit
-
-
Ming-Xu Huang authored
Support various implementations of RoPE and fix a coordinate representation bug Signed-off-by:Ming Huang <mingh@nvidia.com>
-
- 22 Feb, 2024 1 commit
-
-
Reese Wang authored
* Refine MHA API Signed-off-by:
Reese Wang <rewang@nvidia.com> * Reuse func from the flax Signed-off-by:
Reese Wang <rewang@nvidia.com> * DPA draft Signed-off-by:
Reese Wang <rewang@nvidia.com> * qkv packed draft Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix test_layer with fused attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add attn_bias_type and enhance a few code flow Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move scale_factor from __call__ to init Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance the docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add DPA public API and tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix conflict Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add qkv separate fused attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Apply BSHD_BSHD_BSHD format Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove debug log Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add fused attention layer tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add NVTE_FUSED_ATTN docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fine-grained fused attn settings Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove the default value of num_attetnion_head and head_dim Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add teardown for fused attn env Signed-off-by:
Reese Wang <rewang@nvidia.com> * Unify the Optional notation Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix Pre/Post scale bias comments Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add no_mask tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add checkpoint_name for fused attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix the fused attn batcher Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 02 Feb, 2024 1 commit
-
-
Ming-Xu Huang authored
* Adding support of sequence parallelism Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding RoPE Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix wrong batch_logical_axes Signed-off-by:
Ming Huang <mingh@nvidia.com> * Rnaming FSDP outer env var Signed-off-by:
Ming Huang <mingh@nvidia.com> * Poring RoPE to Praxis layers. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Porting GeLU + [FP8 Cast]. Signed-off-by:
Ming Huang <mingh@nvidia.com> * WAR to make XLA successfully match FP8 GEMM on FFN1 with GeLU. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Allowing arbitrary dimension of NVShape for the workspace allocation Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding checkpoint_name to fused functions of mlp.py to get better perf with nn.scan. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Modify with review feedback. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix bugs Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix typo. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fixed for lint Signed-off-by:
Ming Huang <mingh@nvidia.com> * Follow review feedback to modify code. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix typo. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Port SP to Praxis Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Fix an issue when enabling both GQA and RoPE. Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> * Update docs Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Ming-Xu Huang <mingh@nvidia.com>
-
- 16 Jan, 2024 1 commit
-
-
zlsh80826 authored
* Support num_gqa_groups arguments Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add GQA support on the JAX bridge code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix the kv stride of the arbitrary backend Signed-off-by:
Reese Wang <rewang@nvidia.com> * Complete rewrite fused attention tests and add GQA coverage Signed-off-by:
Reese Wang <rewang@nvidia.com> * Support unfused GQA Signed-off-by:
Reese Wang <rewang@nvidia.com> * Calculate seqlen before the primitive for the better perf Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add GQA layer tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Apply code style checks for te_jax Signed-off-by:
Reese Wang <rewang@nvidia.com> * Apply code style checks for tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add num_gqa_groups doc Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the qkv_type Signed-off-by:
Reese Wang <rewang@nvidia.com> * Correct the variable naming Signed-off-by:
Reese Wang <rewang@nvidia.com> * Handle Max512 CAUSAL Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add WAR for the latest jax image Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 13 Nov, 2023 1 commit
-
-
zlsh80826 authored
[C/JAX] Support more mask types for the arbitrary seqlen kernels and minor changes of JAX bias (#469) * Move bias to float32 Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enable varlen Signed-off-by:
Reese Wang <rewang@nvidia.com> * Increase neg infinity abs values Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enable varlen tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove unnecessary code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Support variable sequence length after cuDNN 8.9.6 Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use unique_ptr instead of shared_ptr Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add a new mask type: PADDING_CAUSAL_MASK Signed-off-by:
Reese Wang <rewang@nvidia.com> * Support flash padding mask after 8.9.6 Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance the Max512 handling for causal masking and add the related tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update the fused attn support lists Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove padding_aware from the caching Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix libtransformer.so issue Signed-off-by:
Reese Wang <rewang@nvidia.com> * Reduce the pad ratio tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix a bug with cuDNN 8.9.5 Signed-off-by:
Reese Wang <rewang@nvidia.com> * Release backend resource after the module level unit test Signed-off-by:
Reese Wang <rewang@nvidia.com> * Clean the jax live arrays before running the unit tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix too-few-public-methods lint Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 06 Oct, 2023 1 commit
-
-
Ming-Xu Huang authored
* [JAX] Enhance Dropout in TransformerLayer. 1. Fixed missing setup of dropout RNG key in TransformerLayer and LayerNormMLP. 2. Allowing seperated dropout rate for FC1's output and other hiddens. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix wrong fp8 scale in _update_fp8_metas_impl Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix typo Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Aug, 2023 1 commit
-
-
Ming-Xu Huang authored
* Cast Flax collections to FrozenDict as WAR to adapt Flax 0.7.1. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding min version of flax to requirements.txt in examples Signed-off-by:
Ming Huang <mingh@nvidia.com> * Fix praxis tests and rename compare_frozen_dict to compare_dict Signed-off-by:
Ming Huang <mingh@nvidia.com> * [Paddle] Refactor FP8 state (#350) Refactor fp8 state Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Store FP8 checkpointing data in CPU (#351) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make test_layer be able to run on both Flax >=0.7.1 and <=0.7.0 Signed-off-by:
Ming Huang <mingh@nvidia.com> * Update Flax version Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tian Zheng <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 18 Jul, 2023 1 commit
-
-
zlsh80826 authored
* Fully remove attn_type and set self_attn_mask_type default to 'causal' Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix tests with new arguments Signed-off-by:
Reese Wang <rewang@nvidia.com> * Explicit self_attn_mask_type for examples Signed-off-by:
Reese Wang <rewang@nvidia.com> * Update transformer_engine/jax/flax/transformer.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
zlsh80826 <rewang@nvidia.com> * Update transformer_engine/jax/flax/transformer.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
zlsh80826 <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Signed-off-by:
zlsh80826 <rewang@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Jun, 2023 1 commit
-
-
zlsh80826 authored
* Add self_attn_mask_type and replace attn_type Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the keyword style for the better readability Signed-off-by:
Reese Wang <rewang@nvidia.com> * Replace attn_type with attn_mask_type in praxis transformer Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix typos Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 May, 2023 1 commit
-
-
Ming-Xu Huang authored
* Adding JAX/Praxis modules and dependencies. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding UTs to JAX/Praxis modules. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Remove praxis as a dependency due to not strictly needed Signed-off-by:
Ming Huang <mingh@nvidia.com> * Repalce is_fp8_supported to is_fp8_available Signed-off-by:
Ming Huang <mingh@nvidia.com> * Make Praxis as an optional dependency. 1. Removed 'from . import praxis' in __init__.py. 1.1 Noted, keep 'from . import flax' for deprecated warning. 2. Changed te.flax to te_flax in examples and README.rst. Signed-off-by:
Ming Huang <mingh@nvidia.com> * Adding a workaround to FP8 training on Praxis. Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Ming Huang <mingh@nvidia.com>
-