- 23 Jan, 2026 2 commits
-
-
maxiao3 authored
1,Resolve out-of-bounds issues for types struct 2,Fix TestFusedCastFloat8Vectorwise test case failure Signed-off-by:maxiao3 <maxiao3@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!73
-
zc20020701 authored
Signed-off-by:
zhaochao <zhaochao1@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!72 Co-authored-by:
zhaochao <zhaochao1@sugon.com>
-
- 12 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 09 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com> See merge request dcutoolkit/deeplearing/TransformerEngine!67
-
- 07 Jan, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 31 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 15 Dec, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 10 Dec, 2025 2 commits
-
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 03 Dec, 2025 3 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
-
- 14 Nov, 2025 1 commit
-
-
Paweł Gadziński authored
* fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * add notes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * small fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 13 Nov, 2025 1 commit
-
-
Evgeny Tsykunov authored
Fix amax computation using output_t data in normalization Signed-off-by:Evgeny <etsykunov@nvidia.com>
-
- 12 Nov, 2025 3 commits
-
-
Sudhakar Singh authored
* enable applying rope offsets in backwared Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add tests for rope offsets for thd/bshd/sbhd formats Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
wenjh authored
-
wenjh authored
-
- 10 Nov, 2025 1 commit
-
-
Teddy Do authored
* move triton to common and change paths Signed-off-by:
tdophung <tdophung@nvidia.com> * Formatting Signed-off-by:
tdophung <tdophung@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com>
-
- 08 Nov, 2025 1 commit
-
-
wenjh authored
-
- 07 Nov, 2025 2 commits
-
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * depracted compile time warning + \warning -> \deprecated Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* Fix cuDNN backend selection for more case. Add CG as a option as well Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuDNN checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add more checks Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cuddn version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix error message Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add check for window size Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Nov, 2025 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 31 Oct, 2025 1 commit
-
-
Oleg Goncharov authored
Deleted unused header Signed-off-by:Oleg Goncharov <ogoncharov@nvidia.com>
-
- 30 Oct, 2025 2 commits
-
-
Oleg Goncharov authored
* Separated gated and dequantize kernels Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Separated quantize, dequantize and gated functions Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed lint issues Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed persistent lint issues Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added missing compute capability 10.0 check for Quantize FP8 TMA kernels Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the issue which was added again by autofix Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed files description. Completely removed non-identity activations from the NVFP4 transpose test suite Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Removed unsupported template arguments in NVFP4 quantize Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed undefined symbol error Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Fixed condition Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> * Fixed CUDA version check Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed arch conditions order Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Clean up Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Small fix Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Small fix Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Fixes per the PR review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Fix Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Split quantize helper into two (FWD and BWD) functions Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Moved activation functions from cast.cu. Removed cast.cu from the fast-math compilation list Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Enabled fast math for activations by default Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Disabled fast math for activations by default Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Signed-off-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Phuong Nguyen authored
fix max jobs Signed-off-by:Phuong Nguyen <phuonguyen@nvidia.com>
-
- 27 Oct, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove nvidia-mathdx dep Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix SR Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add comment Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 25 Oct, 2025 1 commit
-
-
Charlene Yang authored
* add max_score for fused/unfused F16 non-CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * calculate max per head instead of max over all heads Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fused attn max_score shape Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert FE to github Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FE to 1.15.0-rc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reduce ew kernels; fix causal masks; add more tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix to tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove logic for flash-attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add CP support for p2p/a2a/all_gather Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor improvements of implementation/tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: add thd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add thd to UnfusedDPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * more fixes for lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update to FE 1.15 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove unneeded changes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable unfused for thd + pad_between_seqs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable thd for unfused until bug is fixed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix all_gather Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix all gather Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rename max_score to max_logit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix all_gather Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix all_gather Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable fused attn + thd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 24 Oct, 2025 1 commit
-
-
jberchtold-nvidia authored
fix checks in unoptimized non-rht fp4 quantize kernel Signed-off-by:Jeremy Berchtold <jberchtold@nvidia.com>
-
- 23 Oct, 2025 1 commit
-
-
Przemyslaw Tredak authored
* Added sm_120f to the build Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Change the arch specific handling Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Support for CUDA<12.9 Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Moved through the rest of the files Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Common cases Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Remove pure 100 from the list Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * CMake changes, (not yet working) Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Do not pass the arch-specific thing from build_tools Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Moved some of the files to arch-specific compilation Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix and also changing the order of compilation to hopefully get the compilation time lower Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Fix for the files overwriting custom compile properties Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Actually make this whole thing work Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add space to the error message Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> * Apply suggestions from code review Co-authored-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> * Fixes from review Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Changing the naming to be more intuitive Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add missing cassert include for device-side asserts Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by:
Oleg Goncharov <64355998+Oleg-Goncharov@users.noreply.github.com>
-
- 21 Oct, 2025 1 commit
-
-
Zhongbo Zhu authored
* pipeclean, fix nvfp4 padding of 32 alignment Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * numerical test passed Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix CI failure with test_cast_master_weights_to_fp8 (in a hacky way) Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * found CUDA mis-aligned address error in training in multi-swizzle, hack the vec_load_size to 1 to unblock Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * leave comments about alignment issue Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fused bulk alloc nvfp4 Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix RHT sign mask CPU overhead Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * resolve comments Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Remove incorrect logic that treats 0-D tensor as uninitialized Tensor shape logic still requires treating 0-D tensor as uninitialized. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix invalid conversion from tensor to int Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 18 Oct, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Support wheel build for cuda 13 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes for cu13 runtime, format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add documentation Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better error handling Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix jax sdist Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Modify function names Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Oct, 2025 2 commits
-
-
Alp Dener authored
Make `CanonicalizeGemmInput()` support non-TN layout FP8 GEMM on Blackwell with column-wise/transposed data (#2233) Modified CanonicalizeGemmInput() logic to pull from column-wise data for FP8 GEMM on Blackwell when row-wise is not available. Signed-off-by:Alp Dener <adener@nvidia.com>
-
Tim Geypens authored
Signed-off-by:Tim Geypens <tim.geypens@gmail.com>
-
- 16 Oct, 2025 2 commits
-
-
yuguo authored
-
tabuchixiangcai3 authored
Signed-off-by:Tangao <2205747538@qq.com>
-
- 15 Oct, 2025 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
yuguo authored
-
- 14 Oct, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Initial API change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change all imports and api Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix typo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix recipe tets Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix more tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix docs, tests, and make Jax change as well Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change internal uses of fp8_autocast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address nits Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * rename file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CG function, and small test fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change instances of make_graphed_callables internally Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix distributed tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix test and add more docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Cleanup test imports and minimize internal file imports Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Make is_bf16_available public Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better docs and better api Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * fix nvfp4 test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-