- 03 Mar, 2026 2 commits
- 24 Feb, 2026 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 18 Feb, 2026 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 17 Feb, 2026 2 commits
-
-
jberchtold-nvidia authored
* initial debug of inspect ffi Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * writing binary dumps of tensors works Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * loading works Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add tensor statistics Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add cuda error check and tests Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Ad __init__.py to debug folder Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address greptile comments Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Gate tests behind fp8 support Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Hemil Desai authored
- Include `build_tools/` in the source distribution via `MANIFEST.in` so that cached builds from `uv` (and `pip`) can resolve `setup.py`'s top-level imports `setup.py` imports from `build_tools` at the top level: ```python from build_tools.build_ext import CMakeExtension, get_build_ext from build_tools.te_version import te_version from build_tools.utils import cuda_archs, cuda_version, ... ``` The `__legacy__` build backend in `pyproject.toml` adds the source root to `sys.path`, so these imports work when building directly from the source tree. However, `build_tools/` is not included in the sdist because: 1. `MANIFEST.in` did not list it 2. `build_tools/` is not discovered by `find_packages()` (it's a standalone directory at the repo root, not under `transformer_engine/`) When `uv` caches the sdist and later builds a wheel from it, the sdist is extracted to a temporary directory where `build_tools/` is absent, causing a `ModuleNotFoundError`. Passing `--no-cache` to `uv` works around this by forcing a fresh build from the full source tree. Added `build_tools` to `MANIFEST.in`: ```diff recursive-include transformer_engine/common/include *.* +recursive-include build_tools *.py *.txt ``` - [x] `python setup.py sdist` produces a tarball that contains `build_tools/` ``` $ tar tzf dist/transformer_engine-*.tar.gz | grep build_tools transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/ transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/VERSION.txt transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/__init__.py transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/build_ext.py transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/jax.py transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/pytorch.py transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/te_version.py transformer_engine-2.13.0.dev0+82f7ebeb/build_tools/utils.py ``` Signed-off-by:
Hemil Desai <hemild@nvidia.com> Co-authored-by:
Claude Opus 4.6 <noreply@anthropic.com>
-
- 13 Feb, 2026 2 commits
-
-
Teddy Do authored
* [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding more stuff missing from cherry picky jeremy PR for inspecting Signed-off-by:
tdophung <tdophung@nvidia.com> * fix some tracing issues when intergating to maxtext Signed-off-by:
tdophung <tdophung@nvidia.com> * Have sort_chunks_by_index handle situations where input buffer is larger than num tokens Signed-off-by:
tdophung <tdophung@nvidia.com> * remove unnecessary assert and comments Signed-off-by:
JAX Toolbox <jax@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove Jeremy's PR for inspect ffi Signed-off-by:
JAX Toolbox <jax@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * untouch the amax file, also change comment on te Signed-off-by:
JAX Toolbox <jax@nvidia.com> --------- Signed-off-by:
tdophung <tdophung@nvidia.com> Signed-off-by:
JAX Toolbox <jax@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
JAX Toolbox <jax@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Make grouped weights opt-in Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Change varname Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Feb, 2026 6 commits
-
-
vcherepanov-nv authored
* Remove nvshmem usage Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Renamings Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * NCCL dependency Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Check for not yet allocated workspace Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address greptile comments Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Add a comment per greptile Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Fix a typo Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> * Display human-readable cuBLASMp error message Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> --------- Signed-off-by:
Vladimir Cherepanov <vcherepanov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Harikrishna KP authored
fix: correct copy-paste error messages in FusedSGD Signed-off-by:Mr-Neutr0n <64578610+Mr-Neutr0n@users.noreply.github.com>
-
Kim, Jin (Jay@SKT) authored
* Add sigmoid GLU Signed-off-by:
Kim, Jin <jinn.kim@sk.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
Kim, Jin <jinn.kim@sk.com> * Add test for GLU op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix incorrect reshape Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Apply suggestion from @timmoon10 Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Add omitted tests for GLU op Signed-off-by:
Kim, Jin <jinn.kim@sk.com> * Add GLU activation type support in JAX extension Signed-off-by:
Kim, Jin <jinn.kim@sk.com> * [PyTorch] Add Sigmoid activation for GLU support in numerics test (#2656) Signed-off-by:
Kim, Jin <jinn.kim@sk.com> --------- Signed-off-by:
Kim, Jin <jinn.kim@sk.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Tim Moon authored
* Add ops for MoE grouped MLP Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move testing utility functions to util submodule Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak docs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Change order of tensor compatibility checks in noop_cat Review suggestion from @ptrendx. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for GLU interleaving in clamped SwiGLU Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
Oleg Goncharov authored
* Added GEMM-ready preswizzling option Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Santosh Bhavani authored
fix: handle nvidia namespace packages where __file__ is None Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 11 Feb, 2026 4 commits
-
-
Kirthi Shankar Sivamani authored
* NVFP4 GroupedQuantize Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Co-authored-by:
Zhongbo Zhu <zhongboz@nvidia.com> * fix fp4 Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> * Remove unnecessary file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Zhongbo Zhu <zhongboz@nvidia.com> Co-authored-by:
Zhongbo Zhu <zhongboz@nvidia.com>
-
Kirthi Shankar Sivamani authored
* PyTorch-Python GroupedTensor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/pytorch/tensor/storage/grouped_tensor.py Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove mxfp8 gq test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix recipe tests and FP8 weights Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix device test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Disable grouped weights for unsupported recipes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
Lifu Zhang authored
* Fix on TE to support Mcore Vision Encoder CUDA Graph Signed-off-by:
Lifu Zhang <lifuz@login-lyris02.lyris.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactoring code Signed-off-by:
Lifu Zhang <lifuz@login-lyris02.lyris.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Lifu Zhang <lifuz@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by:
Lifu Zhang <lifuz@login-lyris02.lyris.clusters.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Faradawn Yang authored
* fix broken link of quickstart guide Signed-off-by:
Faradawn Yang <73060648+faradawn@users.noreply.github.com> * Update README.rst Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Signed-off-by:
Faradawn Yang <73060648+faradawn@users.noreply.github.com> * moved getting started guide to first and moved jax out of pytorch section Signed-off-by:
Faradawn Yang <73060648+faradawn@users.noreply.github.com> * Update README.rst Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Faradawn Yang <73060648+faradawn@users.noreply.github.com> --------- Signed-off-by:
Faradawn Yang <73060648+faradawn@users.noreply.github.com> Co-authored-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 10 Feb, 2026 2 commits
-
-
Jacket authored
Signed-off-by:Kaining Zhong <kainingz@nvidia.com>
-
Przemyslaw Tredak authored
* Fix the compilation warnings for the PyTorch extension Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Apply suggestion from @greptile-apps[bot] Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Signed-off-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 09 Feb, 2026 3 commits
-
-
Pingtian Li authored
* add grad reduce api for cuda graph hook Signed-off-by:
Pingtian Li <pingtianl@nvidia.com> * fix code consistency Signed-off-by:
Pingtian Li <pingtianl@nvidia.com> --------- Signed-off-by:
Pingtian Li <pingtianl@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Paweł Gadziński authored
fix Signed-off-by:Pawel Gadzinski <pgadzinski@nvidia.com>
-
jberchtold-nvidia authored
* expand troubleshooting docs Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Update README.rst Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update README.rst Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update README.rst Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> Co-authored-by:
greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
-
- 07 Feb, 2026 1 commit
-
-
Charlene Yang authored
bucket max_b with more granularity when >512 Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
- 06 Feb, 2026 2 commits
-
-
Oleg Goncharov authored
* Rebased to main Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed the year to 2026 Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added compilation guards Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added BWD pass Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added dbias and dact tests. Refactoring. Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added grouped MXFP8 DACT and ACT API and tests Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed a typo Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Fixes per the review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * More fixes from the review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Fixes per the review Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Relaxed requirement for last dim from mod128 to mod32 Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Added alignment checks when tensor descriptors are modified Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
vthumbe1503 <vthumbe@nvidia.com>
-
Jacket authored
Signed-off-by:Kaining Zhong <kainingz@nvidia.com>
-
- 04 Feb, 2026 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
jberchtold-nvidia authored
* Update README.rst Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update README.rst Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> * Update README.rst Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com> --------- Signed-off-by:
jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
-
- 03 Feb, 2026 3 commits
-
-
Paweł Gadziński authored
* init Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * year update in license Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Oleg Goncharov authored
* Fixed scaling-factor computation for FP32 to match the reference implementation. Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * Uncommented the tuned kernel path Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Oleg Goncharov <ogoncharov@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Vadim Markovtsev authored
* Support building with headers from nvidia wheels There are two changes: 1. `import nvidia` returns a namespace package with `__file__` equal to `None` 2. Add the way to force headers from nvidia wheels. Without that envvar, it's practically impossible with CUDA installed system-wide. I successfully built the package with torch using the following `uv` configuration: ``` [tool.uv.extra-build-dependencies] "transformer-engine-torch" = [ "ninja", "nvidia-cuda-crt==13.0.88", "nvidia-cuda-cccl==13.0.85", { requirement = "torch", match-runtime = true }, { requirement = "pytorch-triton", match-runtime = true }, { requirement = "nvidia-cusolver", match-runtime = true }, { requirement = "nvidia-curand", match-runtime = true }, { requirement = "nvidia-cublas", match-runtime = true }, { requirement = "nvidia-cusparse", match-runtime = true }, { requirement = "nvidia-cudnn-cu13", match-runtime = true }, { requirement = "nvidia-nvtx", match-runtime = true }, { requirement = "nvidia-cuda-nvrtc", match-runtime = true }, { requirement = "nvidia-cuda-runtime", match-runtime = true }, ] ``` Signed-off-by:Vadim Markovtsev <vadim@poolside.ai> * Apply suggestion from @ksivaman Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Vadim Markovtsev <vadim@poolside.ai> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Feb, 2026 1 commit
-
-
Paweł Gadziński authored
* Code drop: Update recipes documentation and remove custom recipes from low precision training Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Fix SVG css import path for diagrams Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Refactor low_precision_training docs: remove optimizers, fix imports, add GPU checks Changes: - Remove optimizer code from all recipe examples (keep only forward/backward) - Fix Format imports (use Format.E4M3 instead of string 'E4M3') - Fix params_dtype for PyTorch examples (add params_dtype=torch.bfloat16) - Add GPU capability assertions before START blocks for blockwise/mxfp8/nvfp4 - Fix JAX imports (Float8CurrentScaling from common.recipe, NVFP4BlockScaling) - Add global_shard_guard for TransformerLayer examples in JAX - Fix fused_layers_jax.py return tuple unpacking - Update memory_usage JAX examples with dynamic GPU measurement - Remove memory_usage_3_jax (JAX doesn't support FP8 weight storage) - Update performance_considerations.rst for JAX differences - Delete unused .out files and fp8_autocast_jax.py Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix JAX memory usage .out files with correct output Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * responded to comments Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * applied suggestions form greptile Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * year change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * jax compute capability fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 30 Jan, 2026 2 commits
-
-
Paweł Gadziński authored
* version change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * ifx Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 29 Jan, 2026 1 commit
-
-
Paweł Gadziński authored
fix wheel Signed-off-by:Pawel Gadzinski <pgadzinski@nvidia.com>
-
- 28 Jan, 2026 2 commits
-
-
Paweł Gadziński authored
* code drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add FP8 scale support and fix alignment for grouped GEMM - Add FP8 scale_inv pointer handling in nvte_grouped_gemm for proper FP8 GEMM - Fix random padding in tests to ensure 16-byte alignment for all dtypes - Reorder GroupedGemmSetupWorkspace members for natural alignment - Remove debug prints Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Grouped GEMM: code cleanup and NULL C support - Remove unused alignment parameter from GroupedGemmSetupWorkspace::from_buffers - Simplify select_grouped_operand by removing dead code branches - Add GroupedOperandSelection.tensor field to avoid passing tensor separately - Extract set_fp8_scale_pointers and init_matrix_layouts helpers - Add safety check for FP8 on Hopper column-wise fallback - Support NULL C tensor when beta=0 (uses D as placeholder) - Remove unused get_scale_inv() from test - Add use_null_c test parameter and test case - Fix documentation: alpha/beta are single element tensors only Signed-off-by:
Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Grouped GEMM: per-matrix alpha/beta support - Change alpha/beta from single values to per-matrix arrays - Validate alpha/beta have exactly num_tensors elements - Update kernel to index alpha_ptr[idx] and beta_ptr[idx] - Move alpha/beta validation to validate_grouped_gemm_inputs - Update tests to use per-matrix alpha/beta arrays - Update documentation Signed-off-by:
Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix alpha/beta numel - use SimpleTensor::numel() Signed-off-by:
Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Refactor: move grouped GEMM to separate file and cleanup API Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Require Blackwell (SM100) and cuBLAS 13.1+ for grouped GEMM Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/gemm/config.h Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changed Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * suggestions Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactored hopper tensor selection Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Piotr Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
Paweł Gadziński authored
* jjit bug fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix' Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 27 Jan, 2026 1 commit
-
-
jberchtold-nvidia authored
* Use "nyu-mll/glue" instead of "glue" for encoder datasets to fix 404 error Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * rename mnist dataset path Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * add dataset manifest Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 26 Jan, 2026 1 commit
-
-
Santosh Bhavani authored
* fix(examples): te_llama compatibility with HuggingFace transformers >= 4.57 The te_llama.py example was failing with HuggingFace transformers 4.57+ due to API changes in how decoder layer outputs are handled. Changes: - Handle case where hidden_states is passed as a tuple (older HF versions) - Return tensor directly instead of wrapped in tuple (HF 4.57+ expects this) - Fix regex pattern to use raw string (fixes SyntaxWarning) Error fixed: AttributeError: 'tuple' object has no attribute 'contiguous' Tested with: - transformer_engine 2.5.0 - transformers 4.57.3 - PyTorch container nvcr.io/nvidia/pytorch:25.08-py3 Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * docs(te_llama): add requirements.txt Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> * fix(docs): add missing notebook output names Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com> --------- Signed-off-by:
Santosh Bhavani <santosh.bhavani@live.com>
-