- 16 Jun, 2025 1 commit
-
-
yuguo authored
-
- 13 Jun, 2025 1 commit
-
-
yuguo authored
-
- 12 Jun, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Same intention of commit 3e38a2ea . This commit is to improve acc. Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 11 Jun, 2025 1 commit
-
-
yuguo authored
-
- 10 Jun, 2025 1 commit
-
-
yuguo authored
-
- 09 Jun, 2025 1 commit
-
-
yuguo authored
-
- 06 Jun, 2025 1 commit
-
-
wenjh authored
quantize_transpose_vector_blockwise function use lds exceeding 64kb when input type is fp32. But max size of lds in dcu is 64kb, thus we use lds as bfp16 for workaround. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 05 Jun, 2025 1 commit
-
-
yuguo authored
-
- 04 Jun, 2025 1 commit
-
-
yuguo authored
-
- 28 May, 2025 1 commit
-
-
wenjh authored
-
- 27 May, 2025 3 commits
- 26 May, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Use ocp fp8. Workaround: test_cast_float8blockwise.cu link wrong std::max Signed-off-by:wenjh <wenjh@sugon.com>
-
- 23 May, 2025 1 commit
-
-
yuguo authored
-
- 22 May, 2025 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 21 May, 2025 2 commits
- 20 May, 2025 3 commits
- 16 May, 2025 2 commits
-
-
Selvaraj Anandaraj authored
* Added token ignoring for CE loss Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added tests Signed-off-by:
root <root@cw-dfw-h100-004-210-013.cm.cluster> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
Selvaraj Anandaraj <selvaraja@cw-dfw-cs-001-login-01.cm.cluster> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
jberchtold-nvidia authored
* [JAX] Update flax module param initialization to support logical partitioning axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix ffn1 intermediate result being replicated Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Lint Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Add documentation and assert when logical_axes=None Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix bias in LayerNormMLP flax module Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> * Fix layer tests to not use nn_partitioning and instead use nn.with_logical_axes Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com> --------- Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 15 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* Cleanup runtime library loading Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better comments and logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix catching stray builds Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix missing fw case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor grammar Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix duplicate SO for editable installs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Better comment for build ext Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Improve error msg Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 14 May, 2025 3 commits
-
-
Peter St. John authored
Signed-off-by:
Peter St. John <pstjohn@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
wenjh authored
Add rules of cuda_runtime.h, cuda_driver.h and cuda_nvml.h to hip. Signed-off-by:wenjh <wenjh@sugon.com>
-
Kirthi Shankar Sivamani authored
* rm unused swizzle extensions Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix swizzle Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Consistent namespaces and first refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format and lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * transformer_engine Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental perm change Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 May, 2025 4 commits
-
-
Evgeny Tsykunov authored
* Set sequence_parallel before super().__init__() in norm modules Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> * getattr(self, sequence_parallel, None) -> self.sequence_parallel Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> --------- Signed-off-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com> Co-authored-by:
Evgeny Tsykunov <etsykunov@etsykunov-mlt.client.nvidia.com>
-
Charlene Yang authored
* disable sm89 and cuDNN < 9.11 for KV caching Signed-off-by:
Charlene Yang <charleney@nvidia.com> * disable some numerics tests Signed-off-by:
Charlene Yang <charleney@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Charlene Yang <charleney@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* Disallow kwargs for pybind extensions and release GIL if possible Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Wrap nvte_* calls Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
yuguo authored
-
- 12 May, 2025 1 commit
-
-
jberchtold-nvidia authored
This reverts commit 5bee81e2 . Signed-off-by:
Jeremy Berchtold <jberchtold@nvidia.com>
-
- 11 May, 2025 1 commit
-
-
Kirthi Shankar Sivamani authored
* First pass refactor Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * first pass Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * core compiles Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Include cuda dirs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Compiles Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move grad outside autocast Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix kv cache Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change src file name in cmake Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move the kernels too Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move comment Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Move comments around Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * more movement Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * move Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 May, 2025 1 commit
-
-
Tim Moon authored
* Avoid spurious warning with non-FP8 GroupedLinear Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use `QuantizedTensorBase` Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 08 May, 2025 2 commits
-
-
Paweł Gadziński authored
* features drop Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * Update transformer_engine/debug/features/utils/stats_computation.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/disable_fp8_layer.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/log_fp8_tensor_stats.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/utils/stats_buffer.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/disable_fp8_gemm.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * Update transformer_engine/debug/features/per_tensor_scaling.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * changes Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * temporarily removed saturations Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/debug/features/_test_dummy_feature.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * docs fix Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> --------- Signed-off-by:
Pawel Gadzinski <pgadzinski@nvidia.com> Signed-off-by:
Paweł Gadziński <62263673+pggPL@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
-
yuguo authored
-