- 14 Jun, 2024 2 commits
-
-
Kirthi Shankar Sivamani authored
* Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Apply formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Kirthi Shankar Sivamani authored
* Initial config test Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * remove linters, fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix clang-format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove lint Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Adjust config Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * use config file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * adjust pylintrc Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * pre-format fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Python only Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Add FA module Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Update CI configs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * CRLF -> LF Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * format Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * revert accidental formatting changes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * try with sudo Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * cpp formatting Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix pylint error properly Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * some review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * lint fixes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * add fp8 attn include in the correct file Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * autofix PRs Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 07 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
* Remove interval arg from recipe Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Remove usage of interval and use explicit kwarg for testing recipes Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2024 1 commit
-
-
Kirthi Shankar Sivamani authored
Cleanup Signed-off-by:Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 May, 2024 1 commit
-
-
Charlene Yang authored
* add THD support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add seq_offsets_o and use new offset calculation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * addition to previous commit; fix unit test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add None for offset_o gradient Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: test padding between sequences Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: fix tests for padding between sequences Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix tests for sbhd/bshd layouts; clean up Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update cudnn-frontend and add tests for max_seqlen_q=1 and d=256 for inference Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * test sbhd/bshd layouts for sq1, d256 inference case Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * replace wording from accumulative to cumulative Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add offset tensors to custom fp8 mha tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add version control for cuDNN Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add sm>=90 constraint for thd support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix cuDNN support for sq=1, d=256 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint and minor tweak for fp8 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * modify cudnn version and restrict MQA/GQA support for THD Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add notes for seq offset tensors Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add dummy tensor to pass jax build Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add dummy tensor to pass paddle build Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 17 May, 2024 1 commit
-
-
Shijie authored
* support main_grad Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * update main_grad and fuse_wgrad_accumulation Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix ci errors Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor change Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 12 Apr, 2024 1 commit
-
-
Sangkug Lym authored
* Add LN margin to inference Signed-off-by:
Sangkug Lym <slym@nvidia.com> * cleanup Signed-off-by:
Sangkug Lym <slym@nvidia.com> * Fix symbolic func registration Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * Fix grads Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Sangkug Lym <slym@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 21 Feb, 2024 1 commit
-
-
Shijie authored
* use separate qkv Signed-off-by:
jaywan <jaywan@nvidia.com> add support for GQA Signed-off-by:
jaywan <jaywan@nvidia.com> minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> change rtol Signed-off-by:
Shijie Wang <jaywan@nvidia.com> fix reshape issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> add rmsnorm and rotary position embedding Signed-off-by:
Shijie Wang <jaywan@nvidia.com> update rmsnorm Signed-off-by:
Shijie Wang <jaywan@nvidia.com> refactor layernorm and rmsnorm Signed-off-by:
Shijie Wang <jaywan@nvidia.com> support swiglu Signed-off-by:
Shijie Wang <jaywan@nvidia.com> add fused rope Signed-off-by:
Shijie Wang <jaywan@nvidia.com> minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> add rope api to __init__ Signed-off-by:
Shijie Wang <jaywan@nvidia.com> minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> fix fp8 dtype issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * simplify ut cases Signed-off-by:
jaywan <jaywan@nvidia.com> * Update transformer_engine/paddle/layer/attention.py Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by:
Shijie <505749828@qq.com> * fix name issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
jaywan <jaywan@nvidia.com> Signed-off-by:
Shijie <505749828@qq.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 08 Feb, 2024 1 commit
-
-
Tim Moon authored
* Implement fused kernel for FP8 scale update Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Add fused kernel for amax and scale update Add unit test. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Replace paddle.fluid imports with paddle.base Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Move fused kernel to core library Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use FP8 update kernel in Paddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug FP8 scale update in Paddle Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix lint errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Paddle test failures Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Make update kernel in-place for PyTorch Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Revert cudnn-frontend commit Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Signed-off-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 30 Jan, 2024 1 commit
-
-
Tim Moon authored
* Replace paddle.fluid imports with paddle.base Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Remove paddle.fluid usage from tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 26 Jan, 2024 1 commit
-
-
Shijie authored
* use separate qkv Signed-off-by:
jaywan <jaywan@nvidia.com> * add support for GQA Signed-off-by:
jaywan <jaywan@nvidia.com> * minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * change rtol Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix reshape issue Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
jaywan <jaywan@nvidia.com> Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 12 Jan, 2024 1 commit
-
-
Tian Zheng authored
* Actively free tensor in bwd Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * - Add inplace support for fp8 casting - Allow skipping weight update in fp8 meta update Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support weight caching for Linear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for LayernormLinear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for LayerNormMLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add weight caching for Transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add PP unittests Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 11 Jan, 2024 1 commit
-
-
Tian Zheng authored
* Add SP for linear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP for LayerNormLinear Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP for LayerNormMLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP API for transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add sequence_parallel attr Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add SP unittests for Transformer and Attention Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix compatibility with PaddleNLP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Copyright Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 03 Jan, 2024 1 commit
-
-
Przemyslaw Tredak authored
Signed-off-by:Przemek Tredak <ptredak@nvidia.com>
-
- 07 Dec, 2023 1 commit
-
-
cyanguwa authored
* Integrate cuDNN frontend v1 to fused attention and miscellaneous fixes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax/paddle for unit tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax/pytorch lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * simplify stride generation Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix and/or logic in get_backend Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix flag_max512 and test_numerics Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove v.contiguous() since get_qkv_layout covers it Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * skip fp8 tests for sm89 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further fix jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * revert mask type to comma-separated list Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix last two commits Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * integrate v1/pre-release-5 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * cleanup prerelease5 integration and fix FA2.1 commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * force dropout to 0 if not training Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix Jax CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * testing bias/alibi and padding+causal; add alibi to unfused DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * set flag_arb to false when non determinism is not allowed Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * followup on prev commit; remove redundant python env var setting Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * WIP: minor tweaks for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * prepare for tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix determinism logic for fused attn Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add bias to bwd Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix gpt_checkpointing/dpa_accuracy problem Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix some seg fault issues Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add failure notes Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove use of non-deter var for backend selection Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix for lint and CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix workspace size in bwd and uncomment bias test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix get_alibi and remove check_support Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update tests status Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove workspace_opt from FADescriptor_v1 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable arbitrary backend + post scale bias in Jax; waiting on PR 525 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * clean up bhsd order Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * swap bias/rng_state order in aux_ctx_tensor and add bias to aux_ctx_tensor in _qkvpacked/_kvpacked API Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove support for padding_causal + cross for max512 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change alibi bias to float32 for bias_1_4/5 tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further clean up tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix thd fwd output shape for FlashAttention and add backend info for DPA Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix definition of workspace limit when dbias is present Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * further tweak DP_WORKSPACE_LIMIT definition Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disallow alibi+no_mask for sdpa flash and update alibi tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update jax/paddle after PR525 and fix DP_WORKSPACE_LIMIT for dbias Jax tests Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * disable dbias for non-hopper archs Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix layernorm lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remode unused arg for lint Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove build dir in setup.py Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * change selection logic to prefer fused attn on sm90 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix distributed jax test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix h and s order in header Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update to cudnn fe v1 branch Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove manual setting of workopt path due to dbias after v1 update Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix paddle CI Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * add post_scale_bias and alibi to sdpa flash support matrix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix support matrix in header files Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * move headers back to .cu and change seed/offset to int64 Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * update Megatron commit in L1 test and remove all prints in fused attn test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix L1 Megatron test Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * fix fp8 arg in L1 Megatron script Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * print only when debug flag is on Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * remove checkpointing loading to avoid loading other tests results Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> --------- Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Signed-off-by:
cyanguwa <8636796+cyanguwa@users.noreply.github.com>
-
- 23 Nov, 2023 1 commit
-
-
Tian Zheng authored
* Add async TP comm Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add set CUDA_DEVICE_MAX_CONNECTIONS warning for tp overlap Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 21 Nov, 2023 1 commit
-
-
Shijie authored
* fix cudnn FA softmax shape Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * set inplace rng_state Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 08 Nov, 2023 1 commit
-
-
zlsh80826 authored
* Deprecate QKV_INTERLEAVED use in JAX Signed-off-by:
Reese Wang <rewang@nvidia.com> * Deprecate QKV_INTERLEAVED use in Paddle Signed-off-by:
Reese Wang <rewang@nvidia.com> * Enhance qkv enum mappings Signed-off-by:
rewang <rewang@nvidia.com> * Fix LD_LIBRARY_PATH issue Signed-off-by:
rewang <rewang@nvidia.com> * Arbitrary seqlen kernels only support self attention currently Signed-off-by:
rewang <rewang@nvidia.com> --------- Signed-off-by:
Reese Wang <rewang@nvidia.com> Signed-off-by:
rewang <rewang@nvidia.com>
-
- 24 Oct, 2023 2 commits
-
-
Kirthi Shankar Sivamani authored
* paddle documentation Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * minor fix Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * review comments Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
Tim Moon authored
* Do not include logging macros in installed C headers Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug logging macros Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug C++ tests Use Google style for header includes. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update CUDA driver macros Incorporating changes from #389. Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Use core error checking macros in PyTorch extensions Hack to get around macro redefinition warning. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix missing arg when getting CUDA driver error string Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reuse logging header in frameworks Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Jan Bielak <jbielak@nvidia.com>
-
- 12 Oct, 2023 1 commit
-
-
Tim Moon authored
* Debug PyTorch and Paddle tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Only run Paddle layer tests with cuDNN fMHA on supported archs Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug PyTorch fMHA tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Reduce JAX FP8 GEMM sizes Avoid split-k kernels on Ada. Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Disable JAX fused self-attention test on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run supported fused attention tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Run supported fused attention JAX tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Enable Paddle fused attention on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Update reference scale calculation in TensorFlow test Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Restore backend support to reference FP8 attention impl in PyT test Review suggestion from @cyanguwa Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Fix merge conflicts Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Debug Paddle tests on Ada Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Loosen tolerances for Paddle attention tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Assume causal mask implies equal seqlens in Paddle attention tests Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Tim Moon <tmoon@nvidia.com>
-
- 03 Oct, 2023 1 commit
-
-
Shijie authored
* fix mask conversion and rng_state Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * refactor fused attn Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * use CUB to do prefix sum Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fuse dropout add Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor changes Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * optimize kernel Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * Debug merge errors Signed-off-by:
Tim Moon <tmoon@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com>
-
- 23 Sep, 2023 1 commit
-
-
Kirthi Shankar Sivamani authored
* Change scaling factor from E8M0 to E8M23 Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> * fix formula Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 22 Sep, 2023 1 commit
-
-
zlsh80826 authored
* Eliminate amax_and_scale_update bubbles Signed-off-by:
rewang <rewang@nvidia.com> * Add CUDA check Signed-off-by:
rewang <rewang@nvidia.com> --------- Signed-off-by:
rewang <rewang@nvidia.com>
-
- 06 Sep, 2023 1 commit
-
-
Tian Zheng authored
* Add recompute Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support recompute core attention Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix transformer layer recompute Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add doc Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve recompute test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve performance of stack backtrace Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve code stype Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix code style Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 01 Sep, 2023 1 commit
-
-
Tian Zheng authored
* Add control of attention dropout and hidden dropout RNG state Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI error Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 26 Aug, 2023 1 commit
-
-
Tian Zheng authored
* [Paddle] Add TP, DP, PP, FSDP Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Minor fix Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI failure Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove set_nccl_overlap_warning_if_tp Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Improve variable naming Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Refactor FP8 Buffer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Stylic changes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP32 parallel training Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix numel performance issue Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Squashed commit of the following: commit 79e2e5fd774e67dcdda9aae01a9f31a6479c5d70 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 14:39:16 2023 +0000 Add TP test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> commit 1d40ad60540490f97ed82ba877cc6eda8902cbf6 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 14:22:25 2023 +0000 Fix tp_size when disabled Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> commit 6632f735a0c8251862355fc74622af59fae3a509 Author: Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Date: Sun Aug 20 05:52:18 2023 +0000 Add TP for attention and transformer layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add shape check Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add FSDP check for stage 1,2,3 Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Review changes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix group_sharding test Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support NVTE_FUSE_ATTN Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix CI errors Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 17 Aug, 2023 1 commit
-
-
Shijie authored
* Add nn.layer: softmax, attention, transformer Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * code refactor Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * code refactor Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * update docs and set dropout=0.1 Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * Update transformer_engine/paddle/layer/attention.py Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Signed-off-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 02 Aug, 2023 1 commit
-
-
Tian Zheng authored
Refactor fp8 state Signed-off-by:Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 01 Aug, 2023 1 commit
-
-
Tian Zheng authored
* Add FP8 support - Add FP8 recipe - Add FP8 path for nn layers - Add MNIST FP8 example Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Update README Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix LayerNormMLP FP8 backward Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP8 training in float32 accumulation Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix FP8 checkpointing for non forward execution cases (same as #323) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Refactors and improvements for better code stype, readability and organization Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove unnecassary pylint override Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 20 Jul, 2023 1 commit
-
-
Shijie authored
* add flash attn tests Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * update flash attn Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix random seed Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 19 Jul, 2023 1 commit
-
-
Tian Zheng authored
* Add Linear layer (FP16) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> - Add BF16 training example - Add fp8_autocast (only supports non-fp8 for now) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove FP8 stuff Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Simplify Linear layer forward Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add LayerNorm layer (BF16) Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add LayerNormLinear layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Store weights in BF16 Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add LayerNormMLP layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add BF16 MNIST example Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Remove in-place cast for compatibility with Paddle AMP mechanism Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * README correction Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add Paddle op as a backend option Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix code format Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix dtype change between iterations Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Move forward function out of base layer Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Use Paddle nvtx bindings Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com>
-
- 06 Jul, 2023 1 commit
-
-
Shijie authored
* add more ops Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * add skipif Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * fix bug Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor change Signed-off-by:
Shijie Wang <jaywan@nvidia.com> * minor change on coding style Signed-off-by:
Shijie Wang <jaywan@nvidia.com> --------- Signed-off-by:
Shijie Wang <jaywan@nvidia.com>
-
- 22 Jun, 2023 1 commit
-
-
Tian Zheng authored
* Add cast_transpose Add gelu, gelu_fp8 Add cast_transpose_bgrad_dgelu Add layernorm_fwd and layernorm_fwd_fp8 Add layernorm_bwd Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix missing header Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-
- 06 Jun, 2023 1 commit
-
-
Tian Zheng authored
* First step of PaddlePaddle integration - Add build option for paddle - Add basic test framework - Add 3 basic operators: cast_from_fp8, cast_to_fp8, gemm Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Support paddle build Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Add paddle build support for new building framework Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Clean up build process for Paddle stub file Signed-off-by:
Tim Moon <tmoon@nvidia.com> * Minor fixes Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix pylint "wrong-import-order" warning Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Fix review comments Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> * Skip BF16 GEMM tests for unsupported arch Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> --------- Signed-off-by:
Tian Zheng <tizheng@nvidia.com> Signed-off-by:
Tian Zheng (Engrg-Hardware 1) <tizheng@nvidia.com> Signed-off-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <tmoon@nvidia.com> Co-authored-by:
Tim Moon <4406448+timmoon10@users.noreply.github.com> Co-authored-by:
Kirthi Shankar Sivamani <ksivamani@nvidia.com>
-