- 10 Jun, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
* Initial basic setup Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm setup reqs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * buil-isolation support Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rm not needed funcs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix workflows Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix invalid wheel Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix JAX build in baremetal env Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update install inst in readme Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update build.yml Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * docstring fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
yuzhongw-nvidia authored
* Support MLA (qk_dim != v_dim) for AttnFuncWithCPAndKVP2P Signed-off-by:
Yuzhong Wang <yuzhongw@nvidia.com> * add UT for MLA CP Signed-off-by:
Yuzhong Wang <yuzhongw@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine the code Signed-off-by:
Yuzhong Wang <yuzhongw@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refine the code Signed-off-by:
Yuzhong Wang <yuzhongw@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Yuzhong Wang <yuzhongw@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Xiaowei Ren <103958965+xrennvidia@users.noreply.github.com>
-
- 09 Jun, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
* Manage deps and add einops Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update build.yml Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Selvaraj Anandaraj authored
* Lora spike Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> * Added FP8 param support Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@login-preos02.a51.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 07 Jun, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
Remove extra include Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Xin Yao authored
only get skip_fp8_weight_update in fp8_graph_capturing Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2025 3 commits
-
-
Alp Dener authored
* added missing deallocs in Userbuffers destroyer Signed-off-by:
Alp Dener <adener@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Alp Dener <adener@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Phuong Nguyen authored
* refactor the multi_stream utils + implement nvte_multi_tensor_quantize in TE/Common * implement GroupedQuantizer and grouped_quantize in jaxx * fix logical_axes_names for transpose tensor in ScaledTensor Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> Co-authored-by:
Hua Huang <huah@nvidia.com> Co-authored-by:
Ming Huang <mingh@nvidia.com>
-
Zhongbo Zhu authored
[PyTorch] FP8 Subchannel Recipe With FP8 Gather And Configurable Scaling Factor Tensor Swizzling (#1707) * functional kernel for columnwise + no-transpose option, still hacky Signed-off-by:
zhongboz <zhongboz@nvidia.com> * pass all quantizer unit tests Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor, add gemm ready api Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make format options private members, simplify api Signed-off-by:
zhongboz <zhongboz@nvidia.com> * swizzle scales right before gemm Signed-off-by:
zhongboz <zhongboz@nvidia.com> * bug fix of single layer test Signed-off-by:
zhongboz <zhongboz@nvidia.com> * attempt to fix lint issue Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fp8 gather pass, need minor refine Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix return_layernorm_output_gathered case Signed-off-by:
zhongboz <zhongboz@nvidia.com> * remove special cases, add sanity check before gemm Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> * lint ungrouped imports Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Implement dequantize for compact 1D blocks. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * add more unit test with dequantize compact supported Signed-off-by:
zhongboz <zhongboz@nvidia.com> * lint again Signed-off-by:
zhongboz <zhongboz@nvidia.com> * make ag for subchannel respect async Signed-off-by:
zhongboz <zhongboz@nvidia.com> * zero tolerance in distributed test Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix zero tolerance test Signed-off-by:
zhongboz <zhongboz@nvidia.com> * resolve rebase issues Signed-off-by:
zhongboz <zhongboz@nvidia.com> * lint & format Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix lint Signed-off-by:
zhongboz <zhongboz@nvidia.com> * clean up Signed-off-by:
zhongboz <zhongboz@nvidia.com> * bug fix Signed-off-by:
zhongboz <zhongboz@nvidia.com> * relax rtol for fp32 distributed test Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix some ci issue Signed-off-by:
zhongboz <zhongboz@nvidia.com> * fix ci test failure in debug mode Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Force row-wise and column-wise data to have same data format Prototype "all-gather usage" in quantizer. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove dead logic for high-precision AGs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug FP8 block-wise tensor tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug distributed test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Handle case where LayerNormLinear returns gathered norm output Signed-off-by:
Tim Moon <tmoon@nvidia.com> * fix debug mode Signed-off-by:
zhongboz <zhongboz@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com> Signed-off-by:
Keith Wyss <kwyss@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Keith Wyss <kwyss@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 05 Jun, 2025 4 commits
-
-
Phuong Nguyen authored
* fix otype for fp8 gemm Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
Przemyslaw Tredak authored
* Use versioned flavor of get driver entrypoint function Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Update the check to call the versioned API starting with CUDA 12.5 where it was added Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Dynamically find entrypoint functions Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Error checking Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Lint fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
jberchtold-nvidia authored
* Fix 1x quantize kernel availability check on hopper --------- Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix NVTE_FRAMEWORK=all Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Workflow tests and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix jax install Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update dep Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add numpy Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add dep Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jun, 2025 2 commits
-
-
Charlene Yang authored
* add missing args in cross-attn Signed-off-by:
Charlene Yang <charleney@nvidia.com> * allow thd for TELayer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add CP note for reordering Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix wording about CP Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add modulo cpx2 requirement Signed-off-by:
Charlene Yang <charleney@nvidia.com> * add example of token reordering Signed-off-by:
Charlene Yang <charleney@nvidia.com> * improve the CP docstring Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tweak CP wording Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test thd TELayer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add enc_dec_kv for decoder Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix cross attn in decoder Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix unfused + bshd/sbhd + telayer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove debugging Signed-off-by:
Charlene Yang <charleney@nvidia.com> --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
jberchtold-nvidia authored
Fix NVTETensor leak in attention.cpp Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
- 02 Jun, 2025 2 commits
-
-
Kirthi Shankar Sivamani authored
minor build improvements Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
* Use 1x quantization + jax transpose on BW for performance Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Use 1x quantization on Hopper as well as it is also faster Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Undo architecture check helper function Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 31 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add warning for multi framework case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Alp Dener <adener@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Alp Dener <adener@nvidia.com>
-
- 30 May, 2025 2 commits
-
-
Evgeny Tsykunov authored
* Quantizer update Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> * Update import Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Introduce _update_weight_quantizers and _get_weight_tensors/_get_weight_quantizers Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Add test Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Move _quantizer to the QuantizedTensorBase Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix import Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Signed-off-by:
Evgeny <etsykunov@nvidia.com> Co-authored-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com>
-
Tim Moon authored
Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 29 May, 2025 4 commits
-
-
Hua Huang authored
* Support SWA in CP Ring Attn THD striped sharding Signed-off-by:
Hua Huang <huah@nvidia.com> * Add some comments; move check to _FusedAttnCPWithP2PHelper.check_supported() Signed-off-by:
Hua Huang <huah@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Remove unused check Signed-off-by:
Hua Huang <huah@nvidia.com> --------- Signed-off-by:
Hua Huang <huah@nvidia.com>
-
Przemyslaw Tredak authored
* Changed the Tensor allocation strategy Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Disable debug flag Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix the double free error Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed pyTorch recipe extension Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Hide TensorAllocator and fix the usage in LayerNorm Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleaning Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix permutation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Tim Moon authored
* Linear op avoids saving input tensor if weight grad is not needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Linear op forward avoids producing quantized tensors with unnecessary usages Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid unnecessary usages in fused linear ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
Phuong Nguyen authored
* removes unnecessary reshapes for FP8 GEMM * use nn.jax.scaled_matmul Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 28 May, 2025 3 commits
-
-
Phuong Nguyen authored
* fp8 gemm with direct quant Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com> --------- Signed-off-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
jberchtold-nvidia authored
Update jax_scaled_masked_softmax to match TE kernel implementation Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix single FW build with multi FW available Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Some fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * sug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 23 May, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Modify the test cases Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Make the tests reproducible on different machines Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fixed the cache of the gamma_in_weight_dtype setting Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Reinstate the tests Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * More verbose code and comments Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 May, 2025 5 commits
-
-
Tim Moon authored
Activation ops support fusing backward pass with quantize Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Kirthi Shankar Sivamani authored
Document all recipes Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
Make primitive names more granular for better disabling granularity Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Build support for cuda 13 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix build for cudnn 8.9*; cuda 12.1 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * readd include Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
fix saved_tensors Signed-off-by:Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 21 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add missing docs for C API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Grammar, typos, copy-paste errors Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove contiguous word Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better wording Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 May, 2025 4 commits
-
-
Charlene Yang authored
add docstring for CP Signed-off-by:Charlene Yang <charleney@nvidia.com>
-
Paweł Gadziński authored
* docs drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * a Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update docs/debug/1_getting_started.rst Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update docs/debug/1_getting_started.rst Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix imgs Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
Peter St. John authored
* Use an empty torch tensor to indicate no fp8 information in extra_state Signed-off-by:
Peter St. John <pstjohn@nvidia.com> * Add huggingface from_pretrained / save_pretrained tests Adds integration tests to ensure models containing TransformerLayer objects can be saved and loaded using the from_pretrained and save_pretrained methods. Signed-off-by:
Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
guyueh1 authored
* Fix split_overlap_rs aggregate=True chunk offset calculation Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add unit test for aggregate=True Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix unit test Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Guyue Huang <guyueh@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 May, 2025 2 commits
-
-
Evgeny Tsykunov authored
* Check tensor-recipe compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Paweł Gadziński authored
* tests drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move dir Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * tests fox Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-