- 29 May, 2025 1 commit
-
-
Hua Huang authored
* Support SWA in CP Ring Attn THD striped sharding Signed-off-by:
Hua Huang <huah@nvidia.com> * Add some comments; move check to _FusedAttnCPWithP2PHelper.check_supported() Signed-off-by:
Hua Huang <huah@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove unused check Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com>
-
- 22 May, 2025 1 commit
-
-
jberchtold-nvidia authored
Fix incorrectly skipped test_quantize_dbias tests Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
- 16 May, 2025 1 commit
-
-
jberchtold-nvidia authored
* [JAX] Update flax module param initialization to support logical partitioning axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix ffn1 intermediate result being replicated Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add documentation and assert when logical_axes=None Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix bias in LayerNormMLP flax module Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix layer tests to not use nn_partitioning and instead use nn.with_logical_axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 06 May, 2025 1 commit
-
-
jberchtold-nvidia authored
* Fix L2 test_custom_call_compute.py L2 tests Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix test_helper.py Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Address comments Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 01 May, 2025 1 commit
-
-
Phuong Nguyen authored
* exclude GroupedGemm APIs Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 30 Apr, 2025 1 commit
-
-
jberchtold-nvidia authored
Fix distributed layernorm test failure Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
- 29 Apr, 2025 1 commit
-
-
jberchtold-nvidia authored
* Update test_helper.py and add QuantizeConfig class for CurrentScaling Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * WIP distributed current scaling Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Distributed Current Scaling (debugging). Distributed implementation with replicated scale_inv works for layernorm_mlp but feels like a hack Has different per-device scale_inv values, but jax.debug.print only shows one of them. Since we're telling JAX/XLA that this scale is replicated, I think it assumes all the values are equal. However, it doesn't actually check this, so it seems we are able to get away with per-device scales for current scaling but I am not sure how stable this will be and may randomly fail if us or the user changes partitioning at all or if XLA decides to actually act on the assumption that all these scale_invs are the same. Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Implement distributed current scaling by computing a global amax and scale before quantization Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add encoder and mnist tests for current scaling Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add primitive prefix to shardy unique_vars to prevent factor conflicts when performing unfused primitives for current scaling Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Remove scale_shape primitive arg that is no longer used Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Format Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix expected result on multiprocessing encoder test Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint fix Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Update multiprocessing current scaling tolerances Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Uncomment test case that was disabled for testing Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Remove commented out debug line Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 22 Apr, 2025 1 commit
-
-
jberchtold-nvidia authored
* [JAX-Q] Single GPU current scaling for JAX Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix scale check dtype for MXFP8 scales affecting tests using assert_bitwise_scaled_tensors Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Address comments Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Remove cast to fp32 for norm primitives now that zero-centered gamma dtype issue is fixed Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix lint issue Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Remove unnecessary cast to fp32 Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 21 Apr, 2025 1 commit
-
-
jberchtold-nvidia authored
Check CuDNN version and apply unfused norm if below a version with the fix Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
- 16 Apr, 2025 1 commit
-
-
Kshitij Lakhani authored
* Add test cases for full coverage in jax/test_layer.py - causal and window size None - causal and window size default (-1,1) - no_mask and window size default (-1,1) - no_mask and window size default (2,2) - padding and window size None - padding_causal and window_size (2,2) Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Correct the condition where padding_causal_mask was being mapped to scaled upper triangle Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Fix Issue #1524 Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Add a runner and test cases for jax.flax.module.Softmax class for fwd pass only Segregate runner classes for Softmax module and softmax primitives Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Simplify logic when picking softmax primitives and softmax jax framework calls Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Simplify the logic for performing jax based softmax Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Code clean up Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add support table for mask, SWA and Softmax type. Code linting Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Explicit SWA conditons in comments. Fix Typo Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve typo to remove None in SWA comments section Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Apr, 2025 1 commit
-
-
Johannes Reifferscheid authored
* Add experimental Shardy support. Production use is not yet recommended. --------- Signed-off-by:Johannes Reifferscheid <jreiffers@nvidia.com>
-
- 09 Apr, 2025 1 commit
-
-
Phuong Nguyen authored
* scaling enum abstract * rm NVTE_ from ScalingMode names * rework scaling mode enum in grouped gemm * fix norm sharding --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 04 Apr, 2025 2 commits
-
-
Phuong Nguyen authored
* rename QuantizeAxis to QuantizeLayout, get_layout to get_data_layout, q_axis to q_layout * add fatten_axis option * added gated act to test encoder * sharding constraint fixes * fix padding when flattening first dim needs to be padded * update test sizes so that padding is tested * rm output sharding as it can be done in the flax module * sharding scale_inv for mxfp8 --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
jberchtold-nvidia authored
MXFP8 flax layer tests Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
- 01 Apr, 2025 1 commit
-
-
Phuong Nguyen authored
* refactor + mxfp8 * added grouped gemm * rename linear to dense * added cublas init phase for groupedGemm * relax the tol of test encoder multiprocessing mxfp8 by 0.001 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 12 Mar, 2025 1 commit
-
-
Reese Wang authored
Remove xla_ignore_channel_id check and ignore Scan loop warning in unit test Signed-off-by:Reese Wang <rewang@nvidia.com>
-
- 05 Mar, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix wheel install after src install Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix JAX imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * switch order of dirs for finding so Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use existing dir src build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Mar, 2025 1 commit
-
-
Reese Wang authored
* Support THD + ring attention for self attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Consolidate reorder strategy Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix dataclass frozen issue Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove redundant code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use AttnBiasType, AttnMaskType, QKVLayout in cpp_extension Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine P2P helper check_supported Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add segment_ids/pos check Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fixup Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add dual chunk swap example Signed-off-by:
Reese Wang <rewang@nvidia.com> * Align different reorder code structure Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 18 Feb, 2025 1 commit
-
-
Phuong Nguyen authored
flax module with compute dtype inferred from the inputs Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 14 Feb, 2025 2 commits
-
-
Reese Wang authored
Fix issues when mask/sequence_descriptor is None Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* fixes L1 test * fix test_multigpu_encoder * fixes for other multi-encoder tests * jax.extend.ffi to jax.ffi * initialization with float32 * add init_dtype as an optional arg to all modules * update use_scan query from xla flags * relax threshold for test_encoder fp8 * relax the tols --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 24 Jan, 2025 1 commit
-
-
Reese Wang authored
* POC for segment_ids/segment_pos Signed-off-by:
Reese Wang <rewang@nvidia.com> * Change segment_pos position Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use RemainingArgs to solve number of parameters mismatches Signed-off-by:
Reese Wang <rewang@nvidia.com> * Test mask_descriptor for accomendating different mask representations Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix bugs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use descriptor in bwd Signed-off-by:
Reese Wang <rewang@nvidia.com> * Primitives only accepts pure jnp array Signed-off-by:
Reese Wang <rewang@nvidia.com> * segment_ids/pos support POC Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move seqlens/offsets generation to mask descriptor Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename MaskDescriptor to SequenceDescriptor Signed-off-by:
Reese Wang <rewang@nvidia.com> * Generalize get_seqlens_and_offsets Signed-off-by:
Reese Wang <rewang@nvidia.com> * Utilize sequence desc on FA bwd Signed-off-by:
Reese Wang <rewang@nvidia.com> * Migrate to new API Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add docstrings Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove small inputs and test different input format Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix seed shardings Signed-off-by:
Reese Wang <rewang@nvidia.com> * Optimize sequence converting overhead Signed-off-by:
Reese Wang <rewang@nvidia.com> * Optimize seq_offsets calculation Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix up Signed-off-by:
Reese Wang <rewang@nvidia.com> * fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix conflicts Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove reduntant line Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 17 Jan, 2025 1 commit
-
-
Michael Goldfarb authored
Consolidate the distributed fused attention tests to shared input generation and execition logic. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 08 Jan, 2025 2 commits
-
-
Michael Goldfarb authored
Correct fused attention output after each step to reduce intermediate memory use. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
Reese Wang authored
* Fix SWA mask for THD and forcing seqlen_kv >= seqlen_q for SWA Signed-off-by:
Reese Wang <rewang@nvidia.com> * Generalize sliding window mask Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix pylint Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Dec, 2024 1 commit
-
-
Charlene Yang authored
* add swa (left,0) + padding + brcm support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * final fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upgrade to FE 1.9-rc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip thd + CP + fused attn tests for cuDNN 9.6+ due to different stats shapes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Dec, 2024 1 commit
-
-
Reese Wang authored
* Add util functions to attn_mask_type Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add util functions to qkv_layout Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix THD cross reference code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove explicit segment_pad, encoding it to segment_ids Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add jax.jit, replace _token with segment_ids, rename bias shape enum Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add comment for make_mask Signed-off-by:
Reese Wang <rewang@nvidia.com> * Clean code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add doc strings for the added functions Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove cache for fa deterministic which causes UT failed Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename fixture to avoid conflict Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 04 Dec, 2024 1 commit
-
-
Michael Goldfarb authored
Scale sequence length in CP tests to avoid tiny sizes. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 11 Nov, 2024 1 commit
-
-
Ming-Xu Huang authored
* Implement ring attention primative for Jax. Signed-off-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> Co-authored-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 06 Nov, 2024 1 commit
-
-
Hua Huang authored
* FFI for some transpose & activation functions Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove comments in transformer_engine/jax/csrc/extensions/activation.cpp Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com> Signed-off-by:
Hua Huang <huangh1994@outlook.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Signed-off-by:
Hua Huang <huangh1994@outlook.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 04 Nov, 2024 1 commit
-
-
Md Fahim Faysal Khan authored
Exposed context parallel params to DPA api Signed-off-by:
Md Fahim Faysal Khan <mdfahimfaysa@nvidia.com> Signed-off-by:
Michael Goldfarb <mgoldfarb@nvidia.com> --------- Signed-off-by:
Md Fahim Faysal Khan <mdfahimfaysa@nvidia.com> Signed-off-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Co-authored-by:
Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 24 Oct, 2024 2 commits
-
-
Hua Huang authored
[JAX] XLA Custom Calls with FFI for FusedAttnFwd, Quantize, Transpose, ActLuFP8, LayerNormForwardFP8FFI, and LayerNormBackwardFFI (#1263) * Add TransposeFFI, test passed Signed-off-by:
Hua Huang <huah@nvidia.com> * Add ActLuFP8FFI; fix TransposeFFI Signed-off-by:
Hua Huang <huah@nvidia.com> * Add QuantizeFFI Signed-off-by:
Hua Huang <huah@nvidia.com> * Add FusedAttnForwardFFI and some unit tests Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor fix Signed-off-by:
Hua Huang <huah@nvidia.com> * Add LayerNormForwardFP8FFI & LayerNormBackwardFFI Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revise FusedAttnForwardFFI() Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FFI_CudaGraph_Traits All tests passed, ready for merge Signed-off-by:
Hua Huang <huah@nvidia.com> * Bug fix for FFI data type mismatch Also add a safeguard on the entrance to FFI function Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Michael Goldfarb authored
[JAX] Fix correctness of JAX fused attention with CP and improve numerics check in unit tests (#1282) Fix correctness of JAX fused attention with CP. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 22 Oct, 2024 1 commit
-
-
Reese Wang authored
Add THD + GQA supports for cuDNN >= 9.6 Signed-off-by:Reese Wang <rewang@nvidia.com>
-
- 15 Oct, 2024 1 commit
-
-
Michael Goldfarb authored
Update test to check support for context parallel attention. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 10 Oct, 2024 1 commit
-
-
Hua Huang authored
* Expose JAX sliding window attn API Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No SWA in context parallel; fix RNG seed in test Signed-off-by:
Hua Huang <huah@nvidia.com> * Handle SAW API discrepancy in cuDNN and Python Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add SAW API for flax, all tests passed Will update tests/jax/test_praxis_layers.py next Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_praxis_layers.py for SWA, test passed Signed-off-by:
Hua Huang <huah@nvidia.com> * Use tuple window_size; update for PR #1212 Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add and adjust some pytest.skip Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revised following Reese Wang's comments Still need further debugging: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: These errors does not exist in the previous commit Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix no-SWA test case errors in previous commit Signed-off-by:
Hua Huang <huah@nvidia.com> * Add Padding mask w/ sliding windows sanity tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use float32 for the reference code softmax calculation Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Reese Wang <rewang@nvidia.com>
-
- 17 Sep, 2024 1 commit
-
-
Michael Goldfarb authored
Implementation of context parallel fused attention using all-gather. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 16 Sep, 2024 1 commit
-
-
Michael Goldfarb authored
Modify unit tests to work around cuDNN 9.4 regression. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-