- 08 Jul, 2025 1 commit
-
-
Jan Bielak authored
* Change pre_forward to pre_first_forward Signed-off-by:
Jan Bielak <jbielak@nvidia.com> * Fix passing invalid recipe with fp8 disabled Signed-off-by:
Jan Bielak <jbielak@nvidia.com> --------- Signed-off-by:
Jan Bielak <jbielak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 16 Oct, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Upgrade pylint and first round formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * round 2 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * round 3 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Format and fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Paddle lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Reviews Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * FIxes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * More linting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Run formatter Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Paddle lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 10 Aug, 2024 1 commit
-
-
Tim Moon authored
* Add op for in-place add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add op for in-place add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add op that adds extra output to fuser Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused op for GEMM+bias+add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused op for dgrad+add Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @ptrendx Output tensor dtype and device take precedence over weight tensor in linear functional API. Move some index calculation to fuser constructor. Avoid some unnecessary dereferences. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update transformer_engine/pytorch/ops/fuser.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 02 Aug, 2024 1 commit
-
-
Przemyslaw Tredak authored
* Link attention docs to the main docs and fix errors reported by Sphinx Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Lower the version of nbsphinx Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * More fixes Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Change the URL of example_attention.py to GitHub Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * More fixes in the attention tutorial Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 21 Jul, 2024 1 commit
-
-
Tim Moon authored
* Update sequential container constructor to handle modules in plain dicts Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid initializing Sequential with dicts Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 09 Jul, 2024 1 commit
-
-
Tim Moon authored
* Add basic infrastructure for Sequential module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add FP8 support in linear op Runs, but need to validate. Runtime errors with non-FP8 params and FP8 compute, or FP8 params and non-FP8 compute. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add reshape op and unit test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unfused linear op Test does not pass with FP8. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug unfused linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add test for linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add separate abstract classes for unfused and fused ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Consolidate unfused ops in submodule Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add linear-bias fused op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use fused cast-transpose in linear ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable GEMM+bias fusion with FP32 activations Not supported by cuBLAS. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add parallel unit test for unfused linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor parallel tests to reduce job launches Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add all-reduce, all-gather, and reduce-scatter ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug multi-GPU FP8 test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for FP8 scale updates Still need to implement amax reductions. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add license boilerplate Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fuse GEMM+bias in row TP Add documentation for unfused ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename pipeline to fuser Expand documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Preserve cached FP8 transpose between ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add option for fused wgrad accumulation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Directly output FP8 from linear if needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix cuDNN front-end commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use updated FP8 tensor API for transpose caching Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use updated API for FP8 scale updates Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tests for non-default FP8 recipes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename UnfusedOperation to BasicOperation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit test to check amax reduction with fusable op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Operator autograd state no longer needs to be initialized Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial functional implementation of linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug fused linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove autograd context from functional linear impl Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use functional linear impl in fused linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename subdirectory from "fuser" to "ops" Avoid confusion with kernel fusers and graph compilers. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update with Float8Tensor changes in #820 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unnecessary CPU overheads Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Correctly pass FP8 metadata from next op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add convenience functions to manipulate Sequential class Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Clear saved tensor data in linear op after bprop Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix Pylint error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix test name in QA script Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run distributed tests even when only 1 GPU is available Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only run distributed tests with 2 GPUs if there are >=2 GPUs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @sudhakarsingh27 and @ksivaman Fix spelling of "fusible". Avoid "input" name in internal APIs. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update transformer_engine/pytorch/ops/__init__.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-