[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753)

Signed-off-by: shen-shanshan <467638484@qq.com>

[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753)
Signed-off-by: shen-shanshan <467638484@qq.com>
77e6dcbb · Shanshan Shen · GitHub · 70c73df6 · 77e6dcbb · 77e6dcbb
Unverified Commit 77e6dcbb authored Mar 04, 2026 by Shanshan Shen Committed by GitHub Mar 03, 2026
Show whitespace changes
Inline Side-by-side

Showing with 8 additions and 1 deletion

docs/design/custom_op.md docs/design/custom_op.md +2 -0

vllm/model_executor/models/deepencoder.py vllm/model_executor/models/deepencoder.py +6 -1

No files found.
--- a/docs/design/custom_op.md
+++ b/docs/design/custom_op.md
@@ -54,6 +54,8 @@ For example:
 --8<-- "vllm/model_executor/layers/attention/mm_encoder_attention.py:mm_encoder_attn"

 --8<-- "vllm/model_executor/layers/mla.py:multi_head_latent_attention"
+
+--8<-- "vllm/model_executor/models/deepencoder.py:rel_pos_attention"
 ```

 **2. Activation:**

--- a/vllm/model_executor/models/deepencoder.py
+++ b/vllm/model_executor/models/deepencoder.py
@@ -18,6 +18,7 @@ import torch.nn as nn
 import torch.nn.functional as F
 from transformers import CLIPVisionConfig

+from vllm.model_executor.custom_op import PluggableLayer
 from vllm.model_executor.layers.attention import MMEncoderAttention
 from vllm.model_executor.layers.conv import Conv2dLayer
 from vllm.model_executor.layers.quantization import QuantizationConfig
@@ -263,9 +264,13 @@ class Block(nn.Module):
        return x


-class RelPosAttention(nn.Module):
+# --8<-- [start:rel_pos_attention]
+@PluggableLayer.register("rel_pos_attention")
+class RelPosAttention(PluggableLayer):
    """Multi-head Attention block with relative position embeddings."""

+    # --8<-- [end:rel_pos_attention]
+
    def __init__(
        self,
        dim: int,