"...git@developer.sourcefind.cn:kecinstone/2024-pra-vllm.git" did not exist on "66785cc05c05c7f19f319533c23d1998b9d80bf9"
- 25 Aug, 2025 1 commit
-
-
wenjh authored
-
- 15 Aug, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 Aug, 2025 1 commit
-
-
yuguo authored
-
- 29 Jul, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Add verbosity only for failing tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Prune some tests and preinit recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Prune further tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix multitensor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Minor fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix a100 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Jul, 2025 1 commit
-
-
Charlene Yang authored
* exclude 9.10.0/.1 for certain configs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix kv_channels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add get_backend to tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add init files Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix numerics and cuda graph tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove prints Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor changes after renaming Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix import structure and rename get_attention_backends Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs and benchmarks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get backend calls Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix get backend calls" This reverts commit 653cbb51c697bc2f975416bb3aac1d85f76c36dc. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "fix docs and benchmarks" This reverts commit 98cd52e04ff7c53e26b412195f5744e39f7ed0e9. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix docs, benchmarks and pre-commit ci Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix dpa/mha flash attn selection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix rng states Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix backend selection on Ampere Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix issues from last merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update tests/pytorch/utils.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove initialization of rng_states to None Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * redefine ModelConfig Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typo Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ModelConfig Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix seed for CP tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update tests/pytorch/test_sanity.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move fixture from utils to individual tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 16 Jul, 2025 1 commit
-
-
Paweł Gadziński authored
* some initial code Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * onnx support Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * mxfp8 support Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixed returning layernorm etc Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * formatting Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lint fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * license fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * tests passing Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refactor Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * lint Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * added pip install to test.sh Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update transformer_engine/pytorch/export.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * float8currentscaling quantizer exception Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added to wheels Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * onnx versions Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * installations in tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
root <root@prenyx0221.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
root <pgadzinski@nvidia.com> * fixes Signed-off-by:
root <pgadzinski@nvidia.com> * fixes Signed-off-by:
root <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update setup.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * onnxscript version chnage Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CI Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@gmail.com> * Update build.yml Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update pytorch.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Signed-off-by:
root <root@prenyx0221.a51.clusters.nvidia.com> Signed-off-by:
root <pgadzinski@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Pawel Gadzinski <pgadzinski@gmail.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
root <root@prenyx0221.a51.clusters.nvidia.com> Co-authored-by:
Pawel Gadzinski <pgadzinski@gmail.com>
-
- 15 Jul, 2025 1 commit
-
-
yuguo authored
-
- 10 Jul, 2025 1 commit
-
-
Autumn1998 authored
* add router fusion Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix ci Signed-off-by:
tongliu <tongliu@nvidia.com> * fix ci with cuda 12.3 Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix CI sm89/80 Signed-off-by:
tongliu <tongliu@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
tongliu <tongliu@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
tongliu <tongliu@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
- 09 Jul, 2025 1 commit
-
-
Tim Moon authored
* Add tests for loading previously-generated checkpoints Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use `NVTE_` prefix for envvar Review suggestion from @ksivaman Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 04 Jun, 2025 2 commits
- 20 May, 2025 1 commit
-
-
Peter St. John authored
* Use an empty torch tensor to indicate no fp8 information in extra_state Signed-off-by:
Peter St. John <pstjohn@nvidia.com> * Add huggingface from_pretrained / save_pretrained tests Adds integration tests to ensure models containing TransformerLayer objects can be saved and loaded using the from_pretrained and save_pretrained methods. Signed-off-by:
Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 May, 2025 1 commit
-
-
Tim Moon authored
* Disable verbose debug logs in CI Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable log_cli option Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 08 May, 2025 1 commit
-
-
yuguo authored
-
- 17 Apr, 2025 1 commit
-
-
linxiddd authored
* [QA] Add error handling - Standardize test failure handling using the unified 'test_fail' function and 'error_exit' function Signed-off-by:
Linxi Ding <linxid@nvidia.com> * Add XML log generation for pytest results - Add `--junitxml` option to pytest command to generate JUnit XML format logs Signed-off-by:
Linxi Ding <linxid@nvidia.com> * Add $XML_LOG_DIR Signed-off-by:
Linxi Ding <linxid@nvidia.com> * mkdir Signed-off-by:
Linxi Ding <linxid@nvidia.com> * Update qa/L0_pytorch_unittest/test.sh Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Linxi Ding <linxid@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Apr, 2025 1 commit
-
-
Paweł Gadziński authored
* test change Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * test fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * small changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * small changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clear Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * base Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Apr, 2025 1 commit
-
-
kwyss-nvidia authored
* Add GEMM logic for blockwise quantized tensors. GEMM test cases included in pytorch integration. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update NVTE_BLOCK_SCALING for GEMM. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Gate feature on CUDA 12.9 Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Gemm typo. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove unecessary type converter change. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reflect epilogue availability and test supported epilogues. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * GEMM simplifications from recipe branch. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Format py code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update GEMM DGelu tests to match support depending on output dtype. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Force pow2Scales in GEMM Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add GEMM test to pytorch test suite. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add copyright to GEMM test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update import for GEMM test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add license. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update test gemm supported predicate. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use sgemm like interfaces and naming. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Rewrite GEMM comment. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR Feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Refactor GEMM param canonicalization Configure A and B matrices separately. Have separate code path for each scaling mode. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Prune number of tests. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> --------- Signed-off-by:
Keith Wyss <kwyss@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 04 Apr, 2025 1 commit
-
-
kwyss-nvidia authored
* Blockwise float8 quantizer and quantized tensor class. The classes are configurable for 128x128 blocksize and 1x128 blocksize via setting block_scaling_dim == 2,1 respectively. Scale tensors are stored in a format emenable for matrix multiplication, however the integration of matmul is deferred as a separate story. Fusions of quantization and DBIAS or activation functions are not yet implemented, and the dequantization is currently implemented in torch. Tests for quantization are included in C++ and pytorch layers, with exact comparison to reference quantizer behavior as well as an attempt to hit interesting branches through the API such as tensor creation in pytorch and CPP and dequantization of row and columnwise usage. Two CUDA kernels for quantization are included, and are direct ports of equivalents in the kitchen repository, where a subchannel recipe has been used for end to end training. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Apply linting changes. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Alignment for 1D scaling for GEMM edge case. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Change API name. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix merge conflict with name change. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use common tensor map API. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Change API to use two scaling mode enums. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix typo. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update some call sites. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Tests for torch tensor API surface. Since the quantized tensor is a tensor subclass, these tests exercise torch hooks. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reuse scale calculation between quantizer refs. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Save memory by dropping reference to saved tensors. Issues previously observed are solved. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove constexpr parameters from kernel. Code size is reduced with fewer constexpr params. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Merge conflict from rebase. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add shape implementations for block scaling. nvte_shape was added upstream. Logic added for block scaled fp8. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Move benchmark to te_playground Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove amax_epsilon and pow_2_scales from tensor. Hardcodes the default values. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Lint changes. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fixup MR changes that broke. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Safer ifdef in kernel. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Documentation prose. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Reuse compute_scale function from Current Scaling. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Bugfix on inf_value scale refactor. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Remove qopt calls from test. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update pytest list. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Add copyright to reference scale calc. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use ptx.cuh functions instead of cde. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update shape logic with allocation and reuse shape. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Usage defaults MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Copyright and header guard. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Updating torch dispatch code. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Fix exception type. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Use TypeInfo Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * MR feedback. Signed-off-by:
Keith Wyss <kwyss@nvidia.com> * Update CS scale update test to use updated ref impl Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update JAX scaling mode enum Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Skip tests on Lovelace Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Keith Wyss <kwyss@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 25 Mar, 2025 1 commit
-
-
Charlene Yang authored
* skip cuDNN 9.8 for KV caching Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert from max_seqlen_kv to max_sequence_length for InferenceParams Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rename test_paged_attn to test_kv_cache Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove redundant None returns in bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add debug flags when no backend is found Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skip kv_cache_accuracy tests for cuDNN 9.8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * truncate length of cu_seqlens for consistency with q/k/v shape Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add back padding_brcm for fused attn tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * re-enable kv_cache_accuracy test for 9.8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cuDNN search dir Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fixes based on review Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove extra empty line Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 22 Mar, 2025 2 commits
-
-
Kunlun Li authored
* Enable fp8_primary_weights for current scaling Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use different cast_master_weights_to_fp8 functions depending on the type of quantizer Signed-off-by:
kunlunl <kunlunl@nvidia.com> * All amaxes of model_weights should participate in reduce-max Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Clear _high_precision_init_val automatically in cast_master_weights_to_fp8 function Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Merge all all-reduce on amaxes into one NCCL kernel Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add unit tests for multi_tensor_compute_scale_and_scale_inv and preserve_high_precision_init_val Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Fix conflicts Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add unit test for cast_master_weights_to_fp8 Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use mock group to initialize fp8_autocast to avoid reduction of amax_history by fp8_autocast_exit Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Remove with_computing_amax and with_computing_scale Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Move replace_raw_data from QuantizedTensor to utils.py Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Remove allow_empty_output argument from nvte_compute_amax and set it always be true Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Rename import guard of recipe_common.cuh to be align with other import guards Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add unit test for replace_raw_data Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add test_replace_raw_data into qa/L0_pytorch_unittest/test.sh Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Minor changes in comments Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Add randomness to the unit test of replace_raw_data Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * (Maybe need revert) Add tex.quantize_to_fragment Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * (Maybe needsto rrevert) Use nvte_quantize_noop in quantize_to_fragment Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix lint error Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Move high_precision_init_val test and replace_raw_data test to test_sanity.py Signed-off-by:
kunlunl <kunlunl@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove test_fp8_model_init.py and test_replace_raw_data.py Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Remove cast_master_weights_to_fp8 and replace_raw_data from __all__ of tensor.__init__.py Signed-off-by:
kunlunl <kunlunl@nvidia.com> * Move FP8 casting logic back from C++ tex funcs to Python Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unimplemented function from header Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
kunlunl <kunlunl@nvidia.com> Signed-off-by:
Kunlun Li <94586211+kunlunl@users.noreply.github.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
Tim Moon authored
* Do not suppress MXFP8 norm in Python wrapper func Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Support FP8 current scaling in tex norm functions Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use single envvar to enable cuDNN MXFP8 norm kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Debug compilation error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix compilation error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix full-tile requirement for MXFP8 norm kernels Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused imports Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add missing imports Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 18 Mar, 2025 1 commit
-
-
Charlene Yang authored
* add paged attention; test_kv_cache_accuray and test_paged_attn pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove unnecessary change from last commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test_fused_attn pass Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unnecessary import in test_numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add license for test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add to L0 test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update license for test_paged_attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update kv_cache_manager license Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix build issue from previous merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: minor fix/preparation for inference/cuda graph Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, bshd/sbhd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, thd, no CG Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: non-paged, thd, CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: non-paged, using paged kernel Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: restructure kernels Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: paged, CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: padding + BRCM Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: restructure IP, clean up Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix non-CG, fused Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix last commit Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: unfused, non-CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: flash-attn, non-CG Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: flash_attn_with_kvcache Signed-off-by:
Charlene Yang <charleney@nvidia.com> * commit two files missed by bcef6b34 Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: thd_bshd_bshd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix last commit Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix 1c31b68d Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: add bshd_2sbhd, sbhd_2bshd Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: some cleanup Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: all qkv_format combinations and merge CM files Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: some lint fixes Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: add docstring for IP Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix sequences_pre Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: minor fixes for multi-layer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: initial multi-layer test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: minor clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: switch to flash_attn_varlen_func Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix unfused for separate q/kv format Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix fused for separate q/kv formats Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: flash attn + TELayer + 2 layers Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: unfused + TL + 2layers Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: all modules/backend Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: minor cleanup Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: FlashAttention on Hopper with 2.7.3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: FlashAttention + v3 from 39e7179 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: FlashAttention + v3 + FP8 + WIP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: add backend support table Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: separate use_flash_attention_2 and _3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: tweaks to paged attn script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * WIP: enable/disable certain cases for fused attn Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: small fixes for lint and cg Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: minor fixes for attn/infer Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: fix CP Signed-off-by:
Charlene Yang <charleney@nvidia.com> * WIP: readd page info to FADescriptor_v1 Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweak to test_numerics.py Signed-off-by:
Charlene Yang <charleney@nvidia.com> * fix 9.5/9.7 sq/skv + mask logic Signed-off-by:
Charlene Yang <charleney@nvidia.com> * clean up Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * more minor fixes for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test page_size=1 for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix t3hd/th3d strides Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix ckpt recompute and fa3 k_scale Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * raise dynamo recompile limit for test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove thunder test from L0 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix FA selection logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA3 q_descale shape Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove page_table from IP.step() returns Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FP8 FlashAttn DPA fp8_dpa tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor tweaks Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA3 note and L3 test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove redundant import in test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * adopt new FA3 APIs from FA2.7.3+/hopper for CP and non-CP Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * relax tols for TransformerLayers Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix merge 2 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FA import comments Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * relax tols for Ampere Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fa3 version and reduce messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update FA3 to its latest commit on main Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add default values to IP and assertion to graph.py Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add more comments in attention Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use custom_cache_manager instead of cache_manager Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 17 Mar, 2025 1 commit
-
-
linxiddd authored
* [QA] Add error handling -Standardize test failure handling using the unified 'test_fail' function and 'error_exit' function. Signed-off-by:
Linxi Ding <linxid@nvidia.com> * Update script to use explicit python3, pip3, and python3 -m pytest calls - Change pip to pip3. - Change python to python3. - Change pytest to python3 -m pytest. Signed-off-by:
Linxi Ding <linxid@nvidia.com> --------- Signed-off-by:
Linxi Ding <linxid@nvidia.com>
-
- 13 Mar, 2025 1 commit
-
-
Tim Moon authored
* Explicitly use python3 and pip3 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run pre-commit as Python module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Replace some missed references to "python" or "pip" Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 09 Mar, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Verified TE2.0 with offloading Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skipping tests for Ampere and removed child class preparing Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * offloading support for MXFP8 dtype Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changed quantized tensor detection mechanism Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * Fix mxfp8 offload, lint errors, and var name Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Supported disabling offloading for quantized tensors Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * bug fix Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed bugs Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-preos01.a51.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added support for None in list of Quantized data tensors Signed-off-by:
root <root@prenyx0095.a51.clusters.nvidia.com> * Hopper backward compatibility cleanup Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Coding style nit Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added guards Signed-off-by:
Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Mar, 2025 1 commit
-
-
Tim Moon authored
Move Lightning-Thunder integration test to L1 Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
- 04 Mar, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 28 Feb, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Enforce torch 2.0 and run attn tests with torch.compile Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * replace torch.compile with jit_fuser Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 26 Feb, 2025 1 commit
-
-
Selvaraj Anandaraj authored
* Added parallel cross entropy loss implementation using online softmax Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added tests Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added reshape of loss output Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added to test list Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Added Triton dependency Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Added copyright Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Fixed lint errors Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update setup.py Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> * Fixed lint and triton failure Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Removed flattening for scalars Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * Skip tests on Blackwell due to TE CI caveat Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added reason arg Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Do not register Triton dependency with setuptools Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Signed-off-by:
Selvaraj Anandaraj <anandaraj@wisc.edu> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 24 Feb, 2025 1 commit
-
-
Paweł Gadziński authored
* non-exit tests Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Feb, 2025 1 commit
-
-
Przemek Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 02 Jan, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Dec, 2024 1 commit
-
-
Charlene Yang authored
* add swa (left,0) + padding + brcm support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * final fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * upgrade to FE 1.9-rc Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip thd + CP + fused attn tests for cuDNN 9.6+ due to different stats shapes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 18 Oct, 2024 1 commit
-
-
Tim Moon authored
* Reorganize PyTorch L1 tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move ONNX tests to L1 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move FA version test to L3 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Limit parallel build jobs in FA version test Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 16 Oct, 2024 1 commit
-
-
Tim Moon authored
* Build custom ORT ops before running ONNX tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove ONNX from context parallelism tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Export ONNX ops that do compute in FP32 Matches internal impl of TE kernels. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add build script for custom ORT ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 08 Oct, 2024 1 commit
-
-
Charlene Yang authored
* add qkv descales to FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix sbhd shapes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * force the same dtype when comparing FA3 and cuDNN FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Revert "force the same dtype when comparing FA3 and cuDNN FP8" This reverts commit 19e7f877026a19a32d2f02c6c9de20df4ae2e064. Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force the same dtype when comparing FA3 and cuDNN FP8 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add try/except for FA3 when custom qkv descales are not supported Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace FA3 installation warning with a debug logging message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * avoid varlen_func for FP8 and improve messaging Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add SWA support for FA3 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change preference reason for FP8 logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 29 Aug, 2024 1 commit
-
-
Xin Yao authored
* remove dtype from args * update docs with permutation ops --------- Signed-off-by:Xin Yao <xiny@nvidia.com>
-
- 12 Aug, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 09 Jul, 2024 1 commit
-
-
Tim Moon authored
* Add basic infrastructure for Sequential module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add FP8 support in linear op Runs, but need to validate. Runtime errors with non-FP8 params and FP8 compute, or FP8 params and non-FP8 compute. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add reshape op and unit test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unfused linear op Test does not pass with FP8. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug unfused linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add test for linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add separate abstract classes for unfused and fused ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Consolidate unfused ops in submodule Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add linear-bias fused op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use fused cast-transpose in linear ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable GEMM+bias fusion with FP32 activations Not supported by cuBLAS. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add parallel unit test for unfused linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Refactor parallel tests to reduce job launches Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add all-reduce, all-gather, and reduce-scatter ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unused file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug multi-GPU FP8 test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add support for FP8 scale updates Still need to implement amax reductions. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add license boilerplate Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fuse GEMM+bias in row TP Add documentation for unfused ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename pipeline to fuser Expand documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Tweak documentation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Preserve cached FP8 transpose between ops Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add option for fused wgrad accumulation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Directly output FP8 from linear if needed Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix cuDNN front-end commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use updated FP8 tensor API for transpose caching Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use updated API for FP8 scale updates Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add tests for non-default FP8 recipes Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename UnfusedOperation to BasicOperation Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit test to check amax reduction with fusable op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Operator autograd state no longer needs to be initialized Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Initial functional implementation of linear op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug fused linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove autograd context from functional linear impl Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use functional linear impl in fused linear+bias op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename subdirectory from "fuser" to "ops" Avoid confusion with kernel fusers and graph compilers. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update with Float8Tensor changes in #820 Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove unnecessary CPU overheads Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Correctly pass FP8 metadata from next op Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix linter errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add convenience functions to manipulate Sequential class Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Clear saved tensor data in linear op after bprop Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix Pylint error Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix test name in QA script Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update name of PyTorch extensions module Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run distributed tests even when only 1 GPU is available Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only run distributed tests with 2 GPUs if there are >=2 GPUs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Review suggestions from @sudhakarsingh27 and @ksivaman Fix spelling of "fusible". Avoid "input" name in internal APIs. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update transformer_engine/pytorch/ops/__init__.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-