- 07 Mar, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Don't set data to null Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Mar, 2025 6 commits
-
-
Jaemin Choi authored
Add NVTX ranges Signed-off-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Jaemin Choi <jaeminc@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
vasunvidia authored
* Remove cudaStreamSync. call Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Use cudaMemsetAsync instead of cudaMemcpyAsync Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> * Update transformer_engine/common/transformer_engine.cpp Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Vasudevan Rengasamy <vrengasamy@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * test Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more sensitive tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * typo fix and skip test on blackwell fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Xiaowei Ren authored
Signed-off-by:Xiaowei Ren <xren@nvidia.com>
-
Nicolas Castet authored
Signed-off-by:Nicolas Castet <ncastet@nvidia.com>
-
Tim Moon authored
* Enable MXFP8 LayerNorm and RMSNorm Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix compilation Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix envvar Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Mar, 2025 4 commits
-
-
Tim Moon authored
Move Lightning-Thunder integration test to L1 Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix wheel install after src install Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix JAX imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * switch order of dirs for finding so Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Use existing dir src build Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sérgio Agostinho authored
--------- Signed-off-by:Sérgio Agostinho <sagostinho@nvidia.com>
-
Nicolas Castet authored
* Add support for UB MNNVL Signed-off-by:
Nicolas Castet <ncastet@nvidia.com> * Address review comments Signed-off-by:
Nicolas Castet <ncastet@nvidia.com> * Fix lint Signed-off-by:
Nicolas Castet <ncastet@nvidia.com> * Dlopen nvml lib since it comes with the cuda driver Signed-off-by:
Nicolas Castet <ncastet@nvidia.com> * Add initial copyright date Signed-off-by:
Nicolas Castet <ncastet@nvidia.com> --------- Signed-off-by:
Nicolas Castet <ncastet@nvidia.com>
-
- 04 Mar, 2025 3 commits
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
Kshitij Lakhani authored
* Expose only required symbols from libtransformer_engine.so during linking for pytorch Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Augment libtransformer_engine.version for jax compatibility Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Augment the libtransformer_engine.version to ensure compatibility with CPP tests Remove getenv from the .version file Combine system.cpp and system.h Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Nit: Remove commented code for not including common.h Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Replace explicit getenv instantiations with a helper template Use filesystem calls in file_exists() Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert comment to falsy instead of false Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Signed-off-by:
Kshitij Lakhani <33047503+KshitijLakhani@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Mar, 2025 3 commits
-
-
Oleg Goncharov authored
Added constexpr checks of tensor boundaries Signed-off-by:Oleg Goncharov <ogoncharov@nvidia.com>
-
vasunvidia authored
Signed-off-by:Vasudevan Rengasamy <vrengasamy@nvidia.com>
-
Reese Wang authored
* Support THD + ring attention for self attn Signed-off-by:
Reese Wang <rewang@nvidia.com> * Consolidate reorder strategy Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix dataclass frozen issue Signed-off-by:
Reese Wang <rewang@nvidia.com> * Remove redundant code Signed-off-by:
Reese Wang <rewang@nvidia.com> * Use AttnBiasType, AttnMaskType, QKVLayout in cpp_extension Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fix lint Signed-off-by:
Reese Wang <rewang@nvidia.com> * Refine P2P helper check_supported Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add segment_ids/pos check Signed-off-by:
Reese Wang <rewang@nvidia.com> * Fixup Signed-off-by:
Reese Wang <rewang@nvidia.com> * Add dual chunk swap example Signed-off-by:
Reese Wang <rewang@nvidia.com> * Align different reorder code structure Signed-off-by:
Reese Wang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Co-authored-by:
Phuong Nguyen <phuonguyen@nvidia.com>
-
- 01 Mar, 2025 1 commit
-
-
Tim Moon authored
Set flag in norm modules for Mcore sequence-parallel support Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 28 Feb, 2025 5 commits
-
-
Sudhakar Singh authored
* delete extra tensor objects after restoring float8 tensors Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nit fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix the leak in float8tensor and mxfloat8tensor classes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * uncomment the fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* Enforce torch 2.0 and run attn tests with torch.compile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * replace torch.compile with jit_fuser Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix quantized tensor shape Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add shape to make_like; add test for chunk Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix typo from suggestion Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* TP-RS local reduction: fix lint err Signed-off-by:
Sangkug Lym <slym@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Sangkug Lym authored
* Support vectorized local reduction for p2p-based ReduceScatter overlap Signed-off-by:
Sangkug Lym <slym@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 27 Feb, 2025 1 commit
-
-
Tim Moon authored
Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 26 Feb, 2025 3 commits
-
-
Tim Moon authored
* Skip context parallelism tests if not enough GPUs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Oleg Goncharov authored
* Added TMA alignment check to cast_fp8_1D Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use tensor const-ref instead of tensor const-ptr Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Selvaraj Anandaraj authored
* Added parallel cross entropy loss implementation using online softmax Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added tests Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added reshape of loss output Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added to test list Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Added Triton dependency Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Added copyright Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Fixed lint errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update setup.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> * Fixed lint and triton failure Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Removed flattening for scalars Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Skip tests on Blackwell due to TE CI caveat Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added reason arg Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Do not register Triton dependency with setuptools Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 25 Feb, 2025 3 commits
-
-
Youngeun Kwon authored
* add remove_caches api Signed-off-by:
Youngeun Kwon <youngeunk@nvidia.com> * Update transformer_engine/pytorch/tensor/float8_tensor.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Youngeun Kwon <youngeunk@nvidia.com> * explicit delete Signed-off-by:
Youngeun Kwon <youngeunk@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
guyueh1 authored
* Fix a crash with module._apply(lambda t: t.cpu()) Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * Add comments Signed-off-by:
Guyue Huang <guyueh@nvidia.com> * Make sure tensor is moved to dst device before quantizer quantizes Signed-off-by:
Guyue Huang <guyueh@nvidia.com> --------- Signed-off-by:
Guyue Huang <guyueh@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Charlene Yang authored
* minor fixes for attention Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 24 Feb, 2025 2 commits
-
-
Paweł Gadziński authored
* non-exit tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * reshape inp Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 22 Feb, 2025 2 commits
-
-
Kshitij Lakhani authored
* Remove dependency on transformer_engine::Tensor in attention.cu Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Templatize thd_partition_indices_kernel and thd_read_half_tensor_kernel kernels ONLY for invoking recompilation and not directly using the pre-compiled symbols in libtransformer.so Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Modify attention.cu for thd templatized kernels. Remove dependency on common.h Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Move thd structs from libtransformer.so to framework extensions include header Code cleanup Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate and move thd_utils from common to framework extensions Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * Remove template decorators around thd_partition_indices_kernel and thd_read_half_tensor_kernel Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Code clean up Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Kshitij Janardan Lakhani <klakhani@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Tim Moon authored
Use same API in optimizer zero_grad as PyT optimizers Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 20 Feb, 2025 2 commits
-
-
Xiaowei Ren authored
* commit some debug code Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add more debug info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * debug code commit and typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * a typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * remove debug info Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not return lse Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add amax_per_step for quantizers of CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix FP8 + CP Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * dtype fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * bug fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xiaowei Ren <xren@login-preos01.a51.clusters.nvidia.com>
-
Kirthi Shankar Sivamani authored
* Fix te sequential for older pytorch versions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * FIxes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Feb, 2025 3 commits
-
-
Xin Yao authored
* fix fuse_wgrad_accumulation for GroupedLinear Signed-off-by:
Xin Yao <xiny@nvidia.com> * fix fuse_wgrad_accumulation for GroupedLinear Signed-off-by:
Xin Yao <xiny@nvidia.com> * update tests Signed-off-by:
Xin Yao <xiny@nvidia.com> --------- Signed-off-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Tim Moon authored
Fix typo Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Zhenhuan Liu authored
* Fix issues for MCore DDP. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Remove force data release for CPU offloading. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Add preserved attributeds. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add main_grad to prevserved attributes. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Change prepare_for_saving to original tensor and add .data to CPU hook. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update. Signed-off-by:
Dennis Liu <denliu@nvidia.com> * Fix for LayernormLinear in FP8. Signed-off-by:
Dennis Liu <denliu@nvidia.com> --------- Signed-off-by:
Dennis Liu <denliu@nvidia.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 18 Feb, 2025 1 commit
-
-
Phuong Nguyen authored
flax module with compute dtype inferred from the inputs Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-