- 04 Dec, 2025 1 commit
-
-
Tim Moon authored
* Initialize empty tensors with shape=[0] instead of shape=[]. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix runtime crash in LayerNorm Still seeing correctness issues. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure norm workspace sizes are not zero Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove assumption in swizzle kernel that data is available. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove assumption in multi-swizzle kernel that data is available. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unnecessary explicit call to default constructor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid accessing tensor data pointer if tensor has no entries Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply suggestions from code review Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/swizzle/swizzle.cu Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @ptrendx and @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Prefer using row-wise/col-wise shape based on which has data Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflict, expand docs, fix inconsistency in dim function Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Change Tensor::has_data to check whether tensor is initialized, not whether pointer is valid. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestion from @greptile-apps Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Debug incorrect tensor initialization in tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Clarify comments that has_data does not guarantee safe pointer accesses Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failure when computing amaxes Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 02 Dec, 2025 3 commits
-
-
Kunlun Li authored
* Add primary weighs fp8 support for mxfp8 Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix unit test and add better error log to unit test Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Move post all-gather processing out of for loop Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add descriptions and ASCII diagrams for partial cast and partial amax functions Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Minor fix based on greptile bot Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix compilation errors due to arch-specific PTX instructions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused noop flag from C API Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Expose test_partial_cast Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Skip mxfp8 partial cast test if mxfp8 is not available Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix pytest error Signed-off-by:
kunlunl <kunlunl@nvidia.com> * pylint ignore unused manual_post_all_gather_processing Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix error when using is_mxfp8_available Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
Phuong Nguyen authored
* add grouped_tensor classes and helpers Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * rm non-contiguous option and dptrs Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * address comments + rework CheckIn/OutputGroupedTensor Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix for compilation Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * make first_dims/last_dims optional + data.shape 2d Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * added assertion Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rs conflicts Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * add data.shape info Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * added logical shape field Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * compilation fix Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fixed issues raised by greptile Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * return default dtype when grouped_tensor is empty Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use has_data() for dim queries Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update comments Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * fix index bound Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Update transformer_engine/common/transformer_engine.cpp Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * Update transformer_engine/common/transformer_engine.cpp Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * restore Tensor.has_data() + add experimental marks Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restore Tensor::has_columnwise_data Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * cleanup Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
Phuong Nguyen authored
* init triton binding with test case/example * added Triton as TE-JAX test dependency * grid with blocksize from autotune Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 26 Nov, 2025 2 commits
-
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lines lenght Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * subtitle --- fix in many files: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cross entropy _input -> input rename Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cross entropy _input -> input rename Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * a lot of small fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * torch_version() change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing module and fix warnings Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * removed training whitespace: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update docs/api/pytorch.rst Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Fix import Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix more imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix NumPy docstring parameter spacing and indentation - Standardize parameter documentation to use 'param : type' format (space before and after colon) per NumPy style guide - Fix inconsistent indentation in cpu_offload.py docstring - Modified 51 Python files across transformer_engine/pytorch Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
Do not initialize recipe state in base op class Op attrs may not be set. Move recipe state initialization to linear op constructor. Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 25 Nov, 2025 7 commits
-
-
Paweł Gadziński authored
* main Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * docs Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * add Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * test fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Pingtian Li authored
* fix backward_dw cuda graph order Signed-off-by:
Pingtian Li <pingtianl@nvidia.com> * add validation for num_layers_per_chunk Signed-off-by:
Pingtian Li <pingtianl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pingtian Li <pingtianl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
vthumbe1503 authored
* fix ci issue Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert back testing changes Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * remove quantizer copy + fused adam working Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix test Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix mxfp8 bug, god knows who created it Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/optimizers/fused_adam.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> * Update comment Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Varun Thumbe <vthumbe@nvidia.com> Signed-off-by:
vthumbe1503 <vthumbe@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
Zhongbo Zhu authored
* minor fix of torch view dtype Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * multi-tensor RHT amax, compiles Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * setup multi_tensor_quantize_nvfp4_impl Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * wire things up and run without crash Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * numerical test Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * unit test passing Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * finish unit test of split quantize api Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * bump up padding to 64 for nvfp4 grouped quantize Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix stochastic rounding Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * lint Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * change error message Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * clean up Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * enable multi-amax without RHT Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix col-only quantize mode Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * improve benchmark script Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * add NCU example script Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * add larger test case Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * add contiguous_data_and_scale check to bulk allocator Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * unified naming and differentiate between group_ and multi_ Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * move regular amax into multi_tensor.h Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Disentangle logic for split-quantize and general multi-tensor quantize Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use size_t for split sections Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Suggestions from @greptile-apps Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
vthumbe1503 authored
remove linear redundant check Signed-off-by:Varun Thumbe <vthumbe@nvidia.com>
-
Phuong Nguyen authored
* allow dp + fsdp and fixed sr_rng_state partitioning Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup for lint test Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Teddy Do authored
* Change order of arguments to make jax works Signed-off-by:
tdophung <tdophung@nvidia.com> * make num_experts a tl.constepxr again Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com>
-
- 21 Nov, 2025 8 commits
-
-
Sudhakar Singh authored
* Add support for THD+CP+SWA through A2A comms Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * unblock the `padding`+`THD`+`CP(A2A)` with SWA case in A2A forward Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add proper support for thd Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * enable thd+cp tests as essential Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add cp+thd+a2a test to essential Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix comments from greptile Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add proper skip for flash attention Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the test to create separate tensors for flash and fused attention backend scenarios Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove redundant compare Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * simplify code Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add note for cu_seqlens_kv and cu_seqlens_kv_padded Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * Update tests/pytorch/attention/test_attention_with_cp.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * Update transformer_engine/pytorch/attention/dot_product_attention/context_parallel.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fixo Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix docs Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the argument name Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
Kshitij Lakhani authored
* Remove unnecessary SWA calculation from _segment_ids_pos_to_seqlens_offsets Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kshitij Lakhani authored
* Make BSHD default for Unfused DPA, DPA and MHA in TE JAX Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Remove explicit transpose_batch set for BSHD for DPA in JAX quickstart Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Add warnings in DPA and MHA to warn users of change defaults to BSHD instead of SBHD Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Minimize the scope of when to trigger warnings for changed defaults for transpose_batch_sequence Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Jan Bernlöhr authored
Signed-off-by:
janbernloehr <jan@bernloehrs.de> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
oliver könig authored
* ci: Build and attach bdist wheels to release page Signed-off-by:
oliver könig <okoenig@nvidia.com> * free up space Signed-off-by:
oliver könig <okoenig@nvidia.com> * cleanup Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * test Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * fix Signed-off-by:
oliver könig <okoenig@nvidia.com> * c28619d8999a147d5e09c1199f84ff6af6ad5794 Signed-off-by:
oliver könig <okoenig@nvidia.com> * c28619d8999a147d5e09c1199f84ff6af6ad5794 Signed-off-by:
oliver könig <okoenig@nvidia.com> * Reduce months to check from 7 to 5 Signed-off-by:
oliver könig <okoenig@nvidia.com> * Update .github/scripts/check_for_ngc_images.sh Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update .github/actions/build-pytorch-wheel/build.sh Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
oliver könig <okoenig@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Jack authored
Signed-off-by:Jack <lityangweiguang@163.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 19 Nov, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
* Minor CPU overhead changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cache per device Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Jianbing authored
* refactor mxfp8_cast_only kernel Signed-off-by:
Jianbing Dong <jianbingd@nvidia.com> * fix ptx.cuh after format Signed-off-by:
Jianbing Dong <jianbingd@nvidia.com> --------- Signed-off-by:
Jianbing Dong <jianbingd@nvidia.com> Co-authored-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>
-
- 18 Nov, 2025 4 commits
-
-
Jaime authored
[PyTorch] Implement Selective Activation Checkpointing for LayerNormMLP with checkpoint flag (#2311) * custom tests for selective activation checkpointing for layernorm mlp Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * add selective layernorm mlp to te.pytorch Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * update test and fix SLNMLP bug Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * implement slnmlp Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix tests pointed out by greptile app bot, still pass Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * minor formatting change in tests/pytorch/selective_layernorm_mlp/distributed/run_numerics.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * remove duplicate import in test/pytorch/selective_layernorm_mlp/test_recipe.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * clean up tests, remove unused imports Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * remove unused paths in test_deffered_init Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix issue with zero_centered_gamma in test_numerics reference implementation Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * clean up tests Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * make comparison.py more extensive, cleaner output Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix small typo in tests/pytorch/selective_layernorm_mlp/compare.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix typo by grepbot in compare.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * make selectiuve activation checkpointing optional in slnmlp via checkpoint flag Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * add comments to clarify logic Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * add checkpoint param to pytests, change compare.py to compare checkppoint=False vs checkpoint=True, skip cuda graph tests for checkpoint=True Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * refactor tests to call modified LayerNormMLP Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * refactor to implement selective activation checkpointing directly into LayerNormMLP, also fix bug to reach cleanup logic in fwd Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix skip explanation for cuda_graphs.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * make _recompute deal with lists instead of tuples Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix MOST cuda graph failures by initializing identical quantizers during fwd. Float8CurrentScaling with bf16 and fp16 still fail with checkpointing Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix cuda graphs issue, all tests pass now Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix small logic bugs, clean up Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * integrate tests into main testing scripts Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * incorporate rng state tracking in checkpointing Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean up tests Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix return type mismatches Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * remove checkpoint test from test_recipe, add sperate test in test_numerics Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor typo fix Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clear up assertions in tests/pytorch/layernorm_mlp/test_selective_activation_checkpoint.py Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add license and copyright info Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix lint issues in layernorm_mlp Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * fix cpu_offload_v1 error Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * possibly fix recomputation in cuda graph bug Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip cuda graphs test for SLNMLP with SM>=10.0 and using delayed scaling Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo for setting IS_FIRST_FP8_MODULE Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> --------- Signed-off-by:
Jaime Cardenas <jaime@evolutionaryscale.ai> Signed-off-by:
Jaime <102792198+jaimec00@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Cache device tensors properly Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix annotation and add test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * skip nvfp4 test if not supported Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed packed versions Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * jax Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * sofmtax_fusion -> softmax_fusion_type Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Nov, 2025 3 commits
-
-
Charlene Yang authored
* [Common] Deleted unused header (#2324) Deleted unused header Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] L1_jax_distributed_test suit with individual executions (#2321) * L1 rework Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * comment out test_multi_process_grouped_gemm for now Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * rm e5m2 from test norm + MXFP8 Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * for branch Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * clean up and tests Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * change tests Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [PyTorch debug] Fixes to debug tests failures (#2268) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix: Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [PyTorch Debug] Add max_blockwise_dynamic_range stats (#2137) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Fix bug with pre scale bias (#2300) * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Try to use pre-downloaded dataset artifacts first (#2345) * Try to use pre-downloaded dataset artifacts first Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Set HF_HUB_OFFLINE to disable any network calls to HF when the pre-downloaded dataset is available Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Fix out of bounds access in the FP4 dequantize kernel (#2346) Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Make FP8 weights compatible with older MCore version (#2342) * Make cast_master_weights_to_fp8 compatible with older MCore version Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Rename keep_columnwise to manual_post_all_gather_processing & Optimize unit test Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove redundant _test_mini_optimizer() Signed-off-by:
kunlunl <kunlunl@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Add test to check jaxpr that amax is reused for nvfp4 recipe (#2348) * Add test to check jaxpr that amax is reused for nvfp4 recipe Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Move test to test_helper.py and rename file Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Fix sharding of segment position to match id in ring attention. (#2349) Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Disable cuDNN attention for known IMA and NaNs (#2344) * Fix cuDNN backend selection for more case. Add CG as a option as well Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuDNN checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add more checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuddn version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix error message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add check for window size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Default to fused attention in JAX DPA (#2363) * Default to fused attention in JAX DPA Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> * Consolidate documentation for DPA in JAX Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> * Correctly update the documentation for defaults in JAX DPA Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> --------- Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Update cudnn frontend to v1.16.0 (#2362) Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [common] Remove kvpacked and qkvpacked attention functions for every kernel type. (#2287) * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * depracted compile time warning + \warning -> \deprecated Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * Move Triton to common (#2359) * move triton to common and change paths Signed-off-by:
tdophung <tdophung@nvidia.com> * Formatting Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [JAX] Fused layers argument default values changed (#2347) * Changing default activations in MLP, TransformerLayer, dropout rate after FC1 to 0, and return_layernorm_output to False Signed-off-by:
tdophung <tdophung@nvidia.com> * Fixing the failing tests by hard coding arguments to the previous values instead of relying on newer default values Signed-off-by:
tdophung <tdophung@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * remove comment from gpt Signed-off-by:
Peter Dykas <wdykas@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor changes for num_splits logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace None with 1 as default Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix dtype in pack/unpack when FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add fused_attn_supported constraint for some tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FA3 installation commands Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update FA3 installation commands in DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * separate fused fp8 and f16 flags in tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * initialize fused_attn_supported_f16 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA installation in L3 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Peter Dykas <wdykas@nvidia.com> Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kshitij Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
tdophung <tdophung@nvidia.com> Co-authored-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
root <root@gpu-h100-0496.cm.cluster> Co-authored-by:
Peter Dykas <wdykas@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kunlun Li <94586211+kunlunl@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Michael Goldfarb <mgoldfarb@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
Teddy Do <tdophung@nvidia.com> Co-authored-by:
wdykas <73254672+wdykas@users.noreply.github.com>
-
Evgeny Tsykunov authored
* Enable reference current scaling recipe Signed-off-by:
Evgeny <etsykunov@nvidia.com> * minor Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * linter Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Test ref vs native Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Evgeny <etsykunov@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
Initial changes to remove pytorch overheads Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 Nov, 2025 5 commits
-
-
jberchtold-nvidia authored
* Use TE quant if TE fused act is disabled Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Keep existing precision Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * offloading Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * all types Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * typo Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * refactor Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * cpu offload + debug warning Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change empty_like implementation to use make_like Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * main_grad fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * manual synchornization Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * old path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * remove example Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * api changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * reverted grouped linear Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * make odl code path work for modules Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * attention old code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * legacy tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * updated code path Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/pytorch/tensor/quantized_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvfp4 support Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update tests/pytorch/test_cpu_offloading.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * small fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
root <root@ptyche0312.ptyche.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
jberchtold-nvidia authored
* Refactor to avoid storing a global quantization config so direct recipe passing works as intended Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix use_split_accumulator for current scaling recipe Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix tests that pass direct recipe and were missing quantize meta set Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Revert "fix use_split_accumulator for current scaling recipe" This reverts commit a74ab7df812ec0a069b1bdd208debb93ec25a900. Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix ci failures Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix amax_history post_init Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update transformer_engine/jax/quantize/quantizer.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci failures Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix ci issue Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * address comments Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * make recipe assertion classes in test_recipe_characteristics not inherit from unittest.TestCase Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * add notes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Robin Zhang authored
* reset cudagraph Signed-off-by:
Robin Zhang <robinz@nvidia.com> * use closure instead of mutable default values Signed-off-by:
Robin Zhang <robinz@nvidia.com> * add test Signed-off-by:
Robin Zhang <robinz@nvidia.com> * fix test Signed-off-by:
Robin Zhang <robinz@nvidia.com> --------- Signed-off-by:
Robin Zhang <robinz@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Nov, 2025 5 commits
-
-
jberchtold-nvidia authored
* Support for checkpointing quantizations Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add jaxpr test for quant checkpoint name Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Revert "Support for checkpointing quantizations" This reverts commit f7b784940369d0da2a77c57fa6ea744e883c5832. Signed-off-by:
JAX Toolbox <jax@nvidia.com> * Checkpoint quantizations Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * revert other files Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * move checkpointing to VJPs Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * fix ci failure Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
JAX Toolbox <jax@nvidia.com> Co-authored-by:
JAX Toolbox <jax@nvidia.com>
-
Phuong Nguyen authored
* shardy + quantize_layout rework Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * add assertion for NVFP4 in fused act and fused norm primitive Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> * add assertions Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Phuong Nguyen authored
* swizzle via nvte Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Evgeny Tsykunov authored
Fix amax computation using output_t data in normalization Signed-off-by:Evgeny <etsykunov@nvidia.com>
-
Lifu Zhang authored
Signed-off-by:
Lifu Zhang <lifuz@login-lyris01.lyris.clusters.nvidia.com> Co-authored-by:
Lifu Zhang <lifuz@login-lyris01.lyris.clusters.nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-