Unverified Commit 2a95efd3 authored by Xiaowei Ren's avatar Xiaowei Ren Committed by GitHub
Browse files

CP implementation refinement for BSHD/SBHD format (#1523)



* fix recompilation of out and lse correction in p2p+bshd/sbhd
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* fix recompilation of get_seq_chunk_ids_for_reordering
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* fix recomplilation of reorder_seq_chunks_for_a2a
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* recover a change
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* typo fix
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* minor change to softmax_lse correction
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* cache cu_seqlens for BSHD/SBHD format
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* do not need to allocate out buffer for BSHD/SBHD
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* code refactoring
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* minor fix
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* refactor init out correction
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* fix a docstring
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* typo fix
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* code refactoring
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* fix init out correct dtype
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* add pad_between_seqs to DPA API
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* add pad_between_seqs to the API of MHA and transformer layer
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

* add pad_between_seqs to the API of MHA and transformer layer
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>

---------
Signed-off-by: default avatarXiaowei Ren <xren@nvidia.com>
Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
parent 2ad5da95
This diff is collapsed.
......@@ -546,6 +546,7 @@ class TransformerLayer(torch.nn.Module):
max_seqlen_q: Optional[int] = None,
max_seqlen_kv: Optional[int] = None,
fast_zero_fill: bool = True,
pad_between_seqs: Optional[bool] = None,
) -> torch.Tensor:
"""
Transformer Layer: attention block and a feedforward network (MLP)
......@@ -637,6 +638,9 @@ class TransformerLayer(torch.nn.Module):
inference_params: InferenceParams, default = None
Inference parameters that are passed to the main model in order
to efficiently calculate and store the context during inference.
pad_between_seqs: Optional[bool], default = `None`
If None, inferred from qkv_format, cu_seqlens and cu_seqlens_padded.
If true, there are padding tokens between individual sequences in a packed batch.
"""
if self_attn_mask_type is None:
......@@ -697,6 +701,7 @@ class TransformerLayer(torch.nn.Module):
max_seqlen_q=max_seqlen_q,
max_seqlen_kv=max_seqlen_kv,
fast_zero_fill=fast_zero_fill,
pad_between_seqs=pad_between_seqs,
)
if self.apply_residual_connection_post_layernorm and not self.output_layernorm:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment