- 30 May, 2025 1 commit
-
-
Evgeny Tsykunov authored
* Quantizer update Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> * Update import Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Introduce _update_weight_quantizers and _get_weight_tensors/_get_weight_quantizers Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Add test Signed-off-by:
Evgeny <etsykunov@nvidia.com> * Move _quantizer to the QuantizedTensorBase Signed-off-by:
Evgeny <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix import Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Signed-off-by:
Evgeny <etsykunov@nvidia.com> Co-authored-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com>
-
- 19 May, 2025 1 commit
-
-
Evgeny Tsykunov authored
* Check tensor-recipe compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Tensor class in recipe, checking for *Base Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Extend recipe __repr__ with recipe_type Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Warn about recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Enable dynamic recipe change: clear fp8 workspace Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * TE 1.x checkpoint compatibility Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Disable warning for recipe wrappers Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Test recipe change Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use QuantizedTensorBase Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix circular import Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Revert previous circular import fix Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * Fix pytorch imports in common Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Let quantizer know about the recipe Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix imports Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 25 Mar, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 08 Mar, 2025 1 commit
-
-
Zhongbo Zhu authored
* check in per-tensor current scaling full recipe Signed-off-by:
zhongboz <zhongboz@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
zhongboz <zhongboz@nvidia.com> setup basics of current scaling quantizer in python level Signed-off-by:
zhongboz <zhongboz@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
zhongboz <zhongboz@nvidia.com> add test case for current scaling dequantize Signed-off-by:
zhongboz <zhongboz@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
zhongboz <zhongboz@nvidia.com> finish linear layer fwd bwd test, determined error with bf16 Signed-off-by:
zhongboz <zhongboz@nvidia.com> [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by:
zhongboz <zhongboz@nvidia.com> achieved zero tolerance for Linear by specify gemm use_split_accumulator config Signed-off-by:
zhongboz <zhongboz@nvidia.com> enable layernormlinear with current scaling, pass bitwise test Signed-off-by:
zhongboz <zhongboz@nvidia.com> refactor test case code Signed-off-by:
zhongboz <zhongboz@nvidia.com> make current scaling quantizers distrbuted, pass distributed linear&layernormlinear tests Signed-off-by:
zhongboz <zhongboz@nvidia.com> bug fix: use cached fp8 recipe in backward Signed-off-by:
zhongboz <zhongboz@nvidia.com> fix layernorm_mlp with current scaling, fix activation_helper with current scaling Signed-off-by:
zhongboz <zhongboz@nvidia.com> support detailed numerical settings from recipe to quantization kernel Signed-off-by:
zhongboz <zhongboz@nvidia.com> resolving MR comments Signed-off-by:
zhongboz <zhongboz@nvidia.com> recipe naming Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve mr comments, remove IS_CURRENT_SCALING template from kernels Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve mr comments, make current scaling c++ test cases Signed-off-by:
zhongboz <zhongboz@nvidia.com> * add current scaling to test_numerics.py, skip act recomp and grouped linear Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmark for quantizer Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add benchmarks for linear layer Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bug fix, typo Signed-off-by:
zhongboz <zhongboz@nvidia.com> * resolve more mr comments Signed-off-by:
zhongboz <zhongboz@nvidia.com> * avoid potential race condition by not using from_blob to construct amax tensor in C++ Signed-off-by:
zhongboz <zhongboz@nvidia.com> * resolve more comments Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Debug linter warnings and license check Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug import error in FP8 tensor test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug compilation error with CUDA 12.1 for Turing Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolve mr comments, fix activation cast fusion Signed-off-by:
zhongboz <zhongboz@nvidia.com> * resolve comments, add NVTEQuantizationParams for compute scale Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove is_current_scaling check totally from common folder Signed-off-by:
zhongboz <zhongboz@nvidia.com> * remove benchmarks, will contribute in another repo Signed-off-by:
zhongboz <zhongboz@nvidia.com> * adjust cs default recipe config Signed-off-by:
zhongboz <zhongboz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adjust comments in test Signed-off-by:
zhongboz <zhongboz@nvidia.com> * Remove current scaling mode from core lib Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor current-scaling-specific logic in core C++ lib Move amax and scale update functions out of casting functions, and put into dedicated current-scaling source file. Add general API for accessing quantization config object. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add missing header in C++ tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable test config with FP8 transpose on Blackwell Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix compilation error in C++ test Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
zhongboz <zhongboz@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
zhongboz <zhongboz@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 Jul, 2024 1 commit
-
-
Tim Moon authored
* Add basic infrastructure for Sequential module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add FP8 support in linear op Runs, but need to validate. Runtime errors with non-FP8 params and FP8 compute, or FP8 params and non-FP8 compute. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add reshape op and unit test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unfused linear op Test does not pass with FP8. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug unfused linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add test for linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add separate abstract classes for unfused and fused ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Consolidate unfused ops in submodule Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add linear-bias fused op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use fused cast-transpose in linear ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable GEMM+bias fusion with FP32 activations Not supported by cuBLAS. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add parallel unit test for unfused linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor parallel tests to reduce job launches Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add all-reduce, all-gather, and reduce-scatter ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug multi-GPU FP8 test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for FP8 scale updates Still need to implement amax reductions. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add license boilerplate Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fuse GEMM+bias in row TP Add documentation for unfused ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename pipeline to fuser Expand documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Preserve cached FP8 transpose between ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add option for fused wgrad accumulation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Directly output FP8 from linear if needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix cuDNN front-end commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use updated FP8 tensor API for transpose caching Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use updated API for FP8 scale updates Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tests for non-default FP8 recipes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename UnfusedOperation to BasicOperation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit test to check amax reduction with fusable op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Operator autograd state no longer needs to be initialized Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial functional implementation of linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug fused linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove autograd context from functional linear impl Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use functional linear impl in fused linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename subdirectory from "fuser" to "ops" Avoid confusion with kernel fusers and graph compilers. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update with Float8Tensor changes in #820 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unnecessary CPU overheads Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Correctly pass FP8 metadata from next op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add convenience functions to manipulate Sequential class Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Clear saved tensor data in linear op after bprop Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix Pylint error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix test name in QA script Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run distributed tests even when only 1 GPU is available Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only run distributed tests with 2 GPUs if there are >=2 GPUs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @sudhakarsingh27 and @ksivaman Fix spelling of "fusible". Avoid "input" name in internal APIs. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update transformer_engine/pytorch/ops/__init__.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 14 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove interval arg from recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove usage of interval and use explicit kwarg for testing recipes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Cleanup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 May, 2024 1 commit
-
-
Tim Moon authored
Update FP8 recipe test to handle recipe changes Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 01 May, 2024 1 commit
-
-
Jinze Xue authored
* Handle the scaling factor when amax is too tiny that leads to an infinite scale Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * revert formatting changes Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * fix comments Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * Apply review suggestion Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * Apply review suggestion Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * Apply review suggestion Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> * apply review suggestion Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * add test_recipe.py to qa/L0_pytorch_unittest/test.sh; fix unittest for is_first_microbatch=False Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * revert changes to update_weight_scale_inv Signed-off-by:
Jinze Xue <jinzex@nvidia.com> * Debug test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Jinze Xue <jinzex@nvidia.com> Signed-off-by:
Jinze Xue <155670984+jinzex@users.noreply.github.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jinze Xue <jinzex@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 08 Feb, 2024 1 commit
-
-
Tim Moon authored
* Implement fused kernel for FP8 scale update Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused kernel for amax and scale update Add unit test. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Replace paddle.fluid imports with paddle.base Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move fused kernel to core library Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use FP8 update kernel in Paddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug FP8 scale update in Paddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix lint errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Paddle test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make update kernel in-place for PyTorch Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Revert cudnn-frontend commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-