"...git@developer.sourcefind.cn:kecinstone/2024-pra-vllm.git" did not exist on "318e2b5ad673ec06d9f6519c6d08ba851d888b1b"
- 29 May, 2025 1 commit
-
-
Hua Huang authored
* Support SWA in CP Ring Attn THD striped sharding Signed-off-by:
Hua Huang <huah@nvidia.com> * Add some comments; move check to _FusedAttnCPWithP2PHelper.check_supported() Signed-off-by:
Hua Huang <huah@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove unused check Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com>
-
- 14 Apr, 2025 1 commit
-
-
Johannes Reifferscheid authored
* Add experimental Shardy support. Production use is not yet recommended. --------- Signed-off-by:Johannes Reifferscheid <jreiffers@nvidia.com>
-
- 01 Apr, 2025 1 commit
-
-
Phuong Nguyen authored
* refactor + mxfp8 * added grouped gemm * rename linear to dense * added cublas init phase for groupedGemm * relax the tol of test encoder multiprocessing mxfp8 by 0.001 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 18 Mar, 2025 1 commit
-
-
Michael Goldfarb authored
* Fix softmax shape for THD format. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 13 Mar, 2025 1 commit
-
-
Reese Wang authored
Make ffi compatible with jax 0.4 Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 12 Mar, 2025 1 commit
-
-
Reese Wang authored
Remove xla_ignore_channel_id check and ignore Scan loop warning in unit test Signed-off-by:Reese Wang <rewang@nvidia.com>
-
- 05 Mar, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Fix wheel install after src install Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix JAX imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * switch order of dirs for finding so Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use existing dir src build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Mar, 2025 1 commit
-
-
Reese Wang authored
* Support THD + ring attention for self attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Consolidate reorder strategy Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix dataclass frozen issue Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove redundant code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use AttnBiasType, AttnMaskType, QKVLayout in cpp_extension Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine P2P helper check_supported Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add segment_ids/pos check Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fixup Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add dual chunk swap example Signed-off-by:
Reese Wang <rewang@nvidia.com> * Align different reorder code structure Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 14 Feb, 2025 2 commits
-
-
Reese Wang authored
* Expose THD to flex MHA module Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance docs Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* fixes L1 test * fix test_multigpu_encoder * fixes for other multi-encoder tests * jax.extend.ffi to jax.ffi * initialization with float32 * add init_dtype as an optional arg to all modules * update use_scan query from xla flags * relax threshold for test_encoder fp8 * relax the tols --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 11 Feb, 2025 1 commit
-
-
Phuong Nguyen authored
* flax module to init params with given dtype Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * all tests passed Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * remove unneccessary reshape for kernel Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * remove casting output of dot Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * clean up Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 24 Jan, 2025 1 commit
-
-
Reese Wang authored
* POC for segment_ids/segment_pos Signed-off-by:
Reese Wang <rewang@nvidia.com> * Change segment_pos position Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use RemainingArgs to solve number of parameters mismatches Signed-off-by:
Reese Wang <rewang@nvidia.com> * Test mask_descriptor for accomendating different mask representations Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix bugs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use descriptor in bwd Signed-off-by:
Reese Wang <rewang@nvidia.com> * Primitives only accepts pure jnp array Signed-off-by:
Reese Wang <rewang@nvidia.com> * segment_ids/pos support POC Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move seqlens/offsets generation to mask descriptor Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename MaskDescriptor to SequenceDescriptor Signed-off-by:
Reese Wang <rewang@nvidia.com> * Generalize get_seqlens_and_offsets Signed-off-by:
Reese Wang <rewang@nvidia.com> * Utilize sequence desc on FA bwd Signed-off-by:
Reese Wang <rewang@nvidia.com> * Migrate to new API Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add docstrings Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove small inputs and test different input format Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix seed shardings Signed-off-by:
Reese Wang <rewang@nvidia.com> * Optimize sequence converting overhead Signed-off-by:
Reese Wang <rewang@nvidia.com> * Optimize seq_offsets calculation Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix up Signed-off-by:
Reese Wang <rewang@nvidia.com> * fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix conflicts Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove reduntant line Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 08 Jan, 2025 1 commit
-
-
Michael Goldfarb authored
Correct fused attention output after each step to reduce intermediate memory use. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Dec, 2024 1 commit
-
-
Reese Wang authored
* Add util functions to attn_mask_type Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add util functions to qkv_layout Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix THD cross reference code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove explicit segment_pad, encoding it to segment_ids Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add jax.jit, replace _token with segment_ids, rename bias shape enum Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add comment for make_mask Signed-off-by:
Reese Wang <rewang@nvidia.com> * Clean code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add doc strings for the added functions Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove cache for fa deterministic which causes UT failed Signed-off-by:
Reese Wang <rewang@nvidia.com> * Rename fixture to avoid conflict Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 12 Nov, 2024 1 commit
-
-
Hua Huang authored
* FFI for all softmax functions Signed-off-by:
Hua Huang <huah@nvidia.com> * FFI for FusedAttnBackward and Dequantize FusedAttnBackward passed all testes in test_fused_attn.py. Dequantize is not used currently; finish it for completeness. Signed-off-by:
Hua Huang <huah@nvidia.com> * Fix FusedAttnBackward FFI pybind & simplify Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert changes to tests/jax/test_fused_attn.py Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 11 Nov, 2024 1 commit
-
-
Ming-Xu Huang authored
* Implement ring attention primative for Jax. Signed-off-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> --------- Signed-off-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Signed-off-by:
Ming Huang <mingh@nvidia.com> Co-authored-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Nov, 2024 1 commit
-
-
Phuong Nguyen authored
* added prepare phase for the FusedAttnForwardFFI * enabled FusedAttnForwardFFI by default * moved prepare phase into pybind --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 01 Nov, 2024 1 commit
-
-
Phuong Nguyen authored
rm default value for NVTE_JAX_FUSED_ATTN_WITH_FFI Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 31 Oct, 2024 2 commits
-
-
Phuong Nguyen authored
* disable fused attn with ffi --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* lowering a dict of attrs * improve err message with line and func info * implement a product() for ffi dimensions --------- Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 25 Oct, 2024 1 commit
-
-
Charlene Yang authored
* WIP: add max_t support for THD Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: save tensors for debug and point to new FE Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stats in bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix stats in fwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add docstring for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: first try on adding max_b and max_t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit c3d522e9f5aef3c8ddfec5bf6ff24c3db97bb059. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "WIP: first try on adding max_b and max_t" This reverts commit 3bc01ebaf2aa846fd16634e2d33b0d0f5803a076. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update docstring and fix max_seqlen logic for thd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert two lines of change in docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add get_max_b/t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix max_seqlen code and docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * sucess: add max_b/max_t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove debug code Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change max_b/max_t buckets Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix b vs orig_b Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix b vs orig_b with 0 fill Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FE for T3HD/TH3D Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add max_b to conversion kernels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix changes after last merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add Jax support for max_t Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.8.0-rc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FE to 1.8.0 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * code review/formating fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Stats shape for <9.6 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * return nullptr for offset_stats when cudnn < 9.6 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more version control Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 24 Oct, 2024 2 commits
-
-
Hua Huang authored
[JAX] XLA Custom Calls with FFI for FusedAttnFwd, Quantize, Transpose, ActLuFP8, LayerNormForwardFP8FFI, and LayerNormBackwardFFI (#1263) * Add TransposeFFI, test passed Signed-off-by:
Hua Huang <huah@nvidia.com> * Add ActLuFP8FFI; fix TransposeFFI Signed-off-by:
Hua Huang <huah@nvidia.com> * Add QuantizeFFI Signed-off-by:
Hua Huang <huah@nvidia.com> * Add FusedAttnForwardFFI and some unit tests Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor fix Signed-off-by:
Hua Huang <huah@nvidia.com> * Add LayerNormForwardFP8FFI & LayerNormBackwardFFI Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revise FusedAttnForwardFFI() Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FFI_CudaGraph_Traits All tests passed, ready for merge Signed-off-by:
Hua Huang <huah@nvidia.com> * Bug fix for FFI data type mismatch Also add a safeguard on the entrance to FFI function Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Michael Goldfarb authored
[JAX] Fix correctness of JAX fused attention with CP and improve numerics check in unit tests (#1282) Fix correctness of JAX fused attention with CP. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 16 Oct, 2024 1 commit
-
-
Md Fahim Faysal Khan authored
fixed assertion bug for SWA Signed-off-by:
Md Fahim Faysal Khan <mdfahimfaysa@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 15 Oct, 2024 1 commit
-
-
Michael Goldfarb authored
Update test to check support for context parallel attention. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 10 Oct, 2024 1 commit
-
-
Hua Huang authored
* Expose JAX sliding window attn API Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * No SWA in context parallel; fix RNG seed in test Signed-off-by:
Hua Huang <huah@nvidia.com> * Handle SAW API discrepancy in cuDNN and Python Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add SAW API for flax, all tests passed Will update tests/jax/test_praxis_layers.py next Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_praxis_layers.py for SWA, test passed Signed-off-by:
Hua Huang <huah@nvidia.com> * Use tuple window_size; update for PR #1212 Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add and adjust some pytest.skip Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revised following Reese Wang's comments Still need further debugging: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-KV_PACKED-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-NO_BIAS] - AssertionError: FAILED test_fused_attn.py::TestFusedAttn::test_backward[NO_SWA-DROP_0.0-4-128-256-16-16-64-BF16-CROSS-SEPARATE-NO_MASK-POST_SCALE_BIAS-1HSS] - AssertionError: These errors does not exist in the previous commit Signed-off-by:
Hua Huang <huah@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix no-SWA test case errors in previous commit Signed-off-by:
Hua Huang <huah@nvidia.com> * Add Padding mask w/ sliding windows sanity tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use float32 for the reference code softmax calculation Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com> Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Reese Wang <rewang@nvidia.com>
-
- 17 Sep, 2024 1 commit
-
-
Michael Goldfarb authored
Implementation of context parallel fused attention using all-gather. Signed-off-by:Michael Goldfarb <mgoldfarb@nvidia.com>
-
- 19 Aug, 2024 1 commit
-
-
Frédéric Bastien authored
Signed-off-by:
Frederic Bastien <fbastien@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 08 Aug, 2024 1 commit
-
-
Reese Wang authored
* Support non-deterministic algo Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine the helper function name Signed-off-by:
Reese Wang <rewang@nvidia.com> * Move fixture to conftest.py Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
-
- 06 Aug, 2024 1 commit
-
-
Reese Wang authored
* Support actlen = 0 after cuDNN 9.3.0 Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add runtime_segment < max_segment tests Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 03 Jul, 2024 1 commit
-
-
Reese Wang authored
* Integrate experimental ragged offset Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use per sequence based offsets Signed-off-by:
Reese Wang <rewang@nvidia.com> * Format Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove v/o_seq_offsets Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add FP16 sanity tests and remove forward tests from the automatically run tests Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance input checks Signed-off-by:
Reese Wang <rewang@nvidia.com> * Separate fused attn to 2 differnt APIs and add the docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add experimental to the docs Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add runtime segments check Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove finished TODO Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Jun, 2024 1 commit
-
-
Phuong Nguyen authored
* Splitted cpp_extensions.py, renamed mlp.py and fused_attn.py Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixed import in tests Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-