"vscode:/vscode.git/clone" did not exist on "dcc92d0ab6c4ce022162a23566d44f673251eee4"
change softmax_lse correction of CP to FP32 (#1546)
* fix recompilation of out and lse correction in p2p+bshd/sbhd Signed-off-by:Xiaowei Ren <xren@nvidia.com> * fix recompilation of get_seq_chunk_ids_for_reordering Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix recomplilation of reorder_seq_chunks_for_a2a Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * recover a change Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * minor change to softmax_lse correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cache cu_seqlens for BSHD/SBHD format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * do not need to allocate out buffer for BSHD/SBHD Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code refactoring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * refactor init out correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix a docstring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * typo fix Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * code refactoring Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * fix init out correct dtype Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add pad_between_seqs to DPA API Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add pad_between_seqs to the API of MHA and transformer layer Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * add pad_between_seqs to the API of MHA and transformer layer Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do not cast partial lse to FP64 for correction Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * do lse correction in FP32 with THD format Signed-off-by:
Xiaowei Ren <xren@nvidia.com> * use log1pf and expf Signed-off-by:
Xiaowei Ren <xren@nvidia.com> --------- Signed-off-by:
Xiaowei Ren <xren@nvidia.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Showing
Please register or sign in to comment