- 06 Jan, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Deterministic FA, bump minimum supported version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix MQA/GQA Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Address review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 05 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 03 Jan, 2024 3 commits
-
-
Przemyslaw Tredak authored
Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Sangkug Lym authored
* Provide pre-computed max sequence to remove unnecessary kernels and D2H copies Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Tweak comments Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 02 Jan, 2024 1 commit
-
-
Hongbin Liu authored
avoid redundant computation for cu_seqlens Signed-off-by:
Hongbin Liu <hongbinl@nvidia.com> Co-authored-by:
Hongbin Liu <hongbinl@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 18 Dec, 2023 1 commit
-
-
Alp Dener authored
* Linear and LayerNormLinear weight and bias buffer cleanup at the end of init when there is no parameter split Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed typo in tensor name Signed-off-by:
Alp Dener <adener@nvidia.com> * fixed typo in tensor name Signed-off-by:
Alp Dener <adener@nvidia.com> --------- Signed-off-by:
Alp Dener <adener@nvidia.com>
-
- 16 Dec, 2023 2 commits
-
-
cyanguwa authored
* add sliding window to FA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix forward logic Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change bert test to causal as unfused does not support padding Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix FlashAttention for v2-2.3 versions Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * verify FA swa works Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix mask related restrictions and duplicate code after merge Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix swa test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add docstring for get_swa func Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move repeated code into a function Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert mask change Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add determinism filter and fix FA warning message Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add message for determinism filter Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify check_set_window_size() Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix check_set_window_size in transformer layers Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix indent Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
Tim Moon authored
* Update fp8_meta amax when copying into Float8Tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Avoid amax when copying between Float8Tensors with fp8_metas Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 15 Dec, 2023 3 commits
-
-
Przemyslaw Tredak authored
* Disable dynamo for Fused Attention Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Added test Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
Kunlun Li authored
Avoid redeclaration error of "nv_bfloat16" when Compiling in CUDA12.1. Error log: /usr/local/cuda/include/cuda_fp16.hpp(2724): error: invalid redeclaration of type name "nv_bfloat16" (declared at line 2837 of /usr/local/cuda/include/cuda_bf16.hpp) Signed-off-by:Kunlun Li <94586211+kunlunl@users.noreply.github.com>
-
Fabian Joswig authored
[fix] fixed micro batched inference with RoPE Signed-off-by:
Fabian Joswig <fabian.joswig@deepl.com> Co-authored-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 13 Dec, 2023 1 commit
-
-
Marks101 authored
Signed-off-by:Markus Schnoes <markus.schnoes@gmx.de>
-
- 12 Dec, 2023 2 commits
-
-
cyanguwa authored
disable pylint bare except Signed-off-by:Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
cyanguwa authored
* fix onnx/dynamo error Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move changes to pytorch/__init__ using try/except Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Update transformer_engine/pytorch/__init__.py Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
- 07 Dec, 2023 2 commits
-
-
cyanguwa authored
* Integrate cuDNN frontend v1 to fused attention and miscellaneous fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax/paddle for unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax/pytorch lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify stride generation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix and/or logic in get_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix flag_max512 and test_numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove v.contiguous() since get_qkv_layout covers it Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip fp8 tests for sm89 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further fix jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert mask type to comma-separated list Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix last two commits Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * integrate v1/pre-release-5 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * cleanup prerelease5 integration and fix FA2.1 commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force dropout to 0 if not training Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * testing bias/alibi and padding+causal; add alibi to unfused DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set flag_arb to false when non determinism is not allowed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * followup on prev commit; remove redundant python env var setting Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: minor tweaks for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * prepare for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix determinism logic for fused attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add bias to bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix gpt_checkpointing/dpa_accuracy problem Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix some seg fault issues Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add failure notes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove use of non-deter var for backend selection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix for lint and CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix workspace size in bwd and uncomment bias test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get_alibi and remove check_support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update tests status Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove workspace_opt from FADescriptor_v1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable arbitrary backend + post scale bias in Jax; waiting on PR 525 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up bhsd order Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * swap bias/rng_state order in aux_ctx_tensor and add bias to aux_ctx_tensor in _qkvpacked/_kvpacked API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove support for padding_causal + cross for max512 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change alibi bias to float32 for bias_1_4/5 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further clean up tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix thd fwd output shape for FlashAttention and add backend info for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix definition of workspace limit when dbias is present Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further tweak DP_WORKSPACE_LIMIT definition Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disallow alibi+no_mask for sdpa flash and update alibi tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update jax/paddle after PR525 and fix DP_WORKSPACE_LIMIT for dbias Jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable dbias for non-hopper archs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix layernorm lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remode unused arg for lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove build dir in setup.py Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change selection logic to prefer fused attn on sm90 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix distributed jax test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix h and s order in header Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to cudnn fe v1 branch Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove manual setting of workopt path due to dbias after v1 update Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix paddle CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add post_scale_bias and alibi to sdpa flash support matrix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix support matrix in header files Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move headers back to .cu and change seed/offset to int64 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update Megatron commit in L1 test and remove all prints in fused attn test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix L1 Megatron test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fp8 arg in L1 Megatron script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * print only when debug flag is on Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove checkpointing loading to avoid loading other tests results Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
Tim Moon authored
* Float8Tensor uses cached transpose if available Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix bug with non-2D transpose Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Custom pickling for Float8Tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test for pickling Float8Tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflict Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Review suggestions from @sudhakarsingh27 Avoid FP8 casts when copying between Float8Tensors. Make make_like a class function. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add unit test for checkpointing model with FP8 params Debugged pickling and copy functions. Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 04 Dec, 2023 1 commit
-
-
Marks101 authored
* [PyTorch] TransformerLayer: add parallel_attention_mlp to support Falcon models Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * [PyTorch] add test for parallel_attention_mlp to test_numerics Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * [PyTorch] TorchGPT: fix dropout for parallel_attention_mlp Now uses nn.functional.dropout because depending on the path there are one or two dropouts. Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * [PyTorch] test_gpt_accuracy: fix spelling in construction of TorchGPT Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> --------- Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 01 Dec, 2023 2 commits
-
-
Tim Moon authored
Fix incorrect variable name in LayerNormMLP backward Signed-off-by:Tim Moon <tmoon@nvidia.com>
-
LadyRick authored
[PyTorch] fix amax calculate during fp8 calibration Signed-off-by:ladyrick <ladyrick@qq.com>
-
- 30 Nov, 2023 1 commit
-
-
Deepak Narayanan authored
wgrad should be zero'ed out if a weight parameter is shared among multiple layers Signed-off-by:Deepak Narayanan <dnarayanan@nvidia.com>
-
- 28 Nov, 2023 2 commits
-
-
Marks101 authored
* [PyTorch] Linear: fix computation for wgrad if sequence_parallel=True Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * Remove buggy gather_along_last_dim Signed-off-by:
Tim Moon <tmoon@nvidia.com> * [PyTorch] Linear: fix line length Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> * Simplify logic for saving input tensor for Linear backward Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Markus Schnoes <markus.schnoes@gmx.de> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com>
-
Deepak Narayanan authored
Getting warnings of the following form using ToT TE: ``` /usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/attention.py:852: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() data_ptr = grad_outputs[0].storage().data_ptr() ``` Signed-off-by:Deepak Narayanan <2724038+deepakn94@users.noreply.github.com>
-
- 17 Nov, 2023 2 commits
-
-
cyanguwa authored
* disable FAv2.1 if causal+cross attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove comment and add warning Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * include both causal and padding+causal Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add a space Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-
Kirthi Shankar Sivamani authored
* Delay caching of transposes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Review comment Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 15 Nov, 2023 1 commit
-
-
cyanguwa authored
* fix condition checks related to FA head_dim Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force q,k,v contiguous when RoPE is in use Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * Expand FA version Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 13 Nov, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Improve PyTorch memory usage Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 09 Nov, 2023 1 commit
-
-
Sangkug Lym authored
* Make user buffer name configurable Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Apply suggestions from code review Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Fix duplicate argument Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix autograd Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 08 Nov, 2023 1 commit
-
-
Selvaraj Anandaraj authored
* Returning an empty tensor of param dtype for wgrad Signed-off-by:
Selvaraj Anandaraj <selvaraja@computelab-frontend-4-ub22.nvidia.com> * lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Selvaraj Anandaraj <selvaraja@computelab-frontend-4-ub22.nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Selvaraj Anandaraj <selvaraja@computelab-frontend-4-ub22.nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Nov, 2023 1 commit
-
-
Xiaowei Ren authored
fix bwd error with FA v2 Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 31 Oct, 2023 1 commit
-
-
Tim Moon authored
* Experimental FP8 tensor Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add fp8 tensor to ci test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments and tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Minor changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Default to FP8 usage Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Naming changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix transpose caching Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Debug transpose caching Handle case where transpose cache is updated externally. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Rename FP8GlobalStateManager.with_fp8_parameters Signed-off-by:
Tim Moon <tmoon@nvidia.com> * remove Float8Tensor from import API Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Avoid caching FP8 transposes if not required Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix import error in FP8 tensor tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix tranpose caching and checkpointing bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Improve caching and fix distopt case Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update transformer_engine/pytorch/float8_tensor.py Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> * Remove recursive logic Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix cache reset bug Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Store FP8 attributes in dict Easier for multiple tensors to share, e.g. detached tensors. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure scale_inv is 1D tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make sure scale_inv is 1D tensor Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fixes and detach recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Set default fp8 data type Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Sudhakar Singh <sudhakars@nvidia.com> Co-authored-by:
Przemyslaw Tredak <ptrendx@gmail.com>
-
- 24 Oct, 2023 1 commit
-
-
Tim Moon authored
* Do not include logging macros in installed C headers Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug logging macros Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug C++ tests Use Google style for header includes. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CUDA driver macros Incorporating changes from #389. Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use core error checking macros in PyTorch extensions Hack to get around macro redefinition warning. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix missing arg when getting CUDA driver error string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse logging header in frameworks Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com>
-
- 23 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* initial test fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Drop eval for selective checkpointing tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove redundant recompute for FA Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CI fix; Decouple fused attention and numerics tests Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 20 Oct, 2023 2 commits
-
-
Tim Moon authored
Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Przemyslaw Tredak authored
* Ability to check cuDNN version from Python Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> * Modify the fused attention test to not use the CUDNN_VERSION env variable which is specific to NGC containers Signed-off-by:
Przemek Tredak <ptredak@nvidia.com> --------- Signed-off-by:
Przemek Tredak <ptredak@nvidia.com>
-
- 17 Oct, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
Improve docs Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 12 Oct, 2023 2 commits
-
-
Kirthi Shankar Sivamani authored
* Add class for RNG state tracker. Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix docs for checkpoint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * docs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
* Debug PyTorch and Paddle tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only run Paddle layer tests with cuDNN fMHA on supported archs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug PyTorch fMHA tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reduce JAX FP8 GEMM sizes Avoid split-k kernels on Ada. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable JAX fused self-attention test on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run supported fused attention tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run supported fused attention JAX tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Enable Paddle fused attention on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update reference scale calculation in TensorFlow test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Restore backend support to reference FP8 attention impl in PyT test Review suggestion from @cyanguwa Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflicts Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Paddle tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Loosen tolerances for Paddle attention tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Assume causal mask implies equal seqlens in Paddle attention tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 11 Oct, 2023 2 commits
-
-
Xiaowei Ren authored
* rename set_context_parallel_running to set_context_parallel_group Signed-off-by:
xren <xren@nvidia.com> * bug fix Signed-off-by:
xren <xren@nvidia.com> --------- Signed-off-by:
xren <xren@nvidia.com>
-
Kirthi Shankar Sivamani authored
Inference params support Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-