Unverified Commit 0674d1fe authored by Wangbei25's avatar Wangbei25 Committed by GitHub
Browse files

[PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293)


Signed-off-by: default avatarWangbei25 <wangbei41@huawie.com>
Signed-off-by: default avatarWangbei25 <wangbei41@huawei.com>
Co-authored-by: default avatarWangbei25 <wangbei41@huawie.com>
parent 30108fc8
...@@ -51,11 +51,8 @@ For example: ...@@ -51,11 +51,8 @@ For example:
**1. Attention:** **1. Attention:**
```python ```python
--8<-- "vllm/model_executor/layers/attention/mm_encoder_attention.py:mm_encoder_attn"
--8<-- "vllm/model_executor/layers/mla.py:multi_head_latent_attention" --8<-- "vllm/model_executor/layers/mla.py:multi_head_latent_attention"
--8<-- "vllm/model_executor/models/deepencoder.py:rel_pos_attention"
``` ```
**2. Activation:** **2. Activation:**
...@@ -170,6 +167,16 @@ For example: ...@@ -170,6 +167,16 @@ For example:
--8<-- "vllm/model_executor/layers/rotary_embedding/common.py:apply_rotary_emb" --8<-- "vllm/model_executor/layers/rotary_embedding/common.py:apply_rotary_emb"
``` ```
**12. Encoder:**
```python
--8<-- "vllm/model_executor/models/deepencoder2.py:qwen2_decoder"
--8<-- "vllm/model_executor/layers/attention/mm_encoder_attention.py:mm_encoder_attn"
--8<-- "vllm/model_executor/models/deepencoder.py:rel_pos_attention"
```
## Guidelines for Implementing a New CustomOp ## Guidelines for Implementing a New CustomOp
### Implement a New CustomOp in vLLM ### Implement a New CustomOp in vLLM
......
...@@ -14,14 +14,20 @@ import torch ...@@ -14,14 +14,20 @@ import torch
import torch.nn as nn import torch.nn as nn
import transformers import transformers
from vllm.model_executor.custom_op import PluggableLayer
class CustomQwen2Decoder(nn.Module):
# --8<-- [start:qwen2_decoder]
@PluggableLayer.register("qwen2_decoder")
class CustomQwen2Decoder(PluggableLayer):
""" """
Qwen2 visual encoder Qwen2 visual encoder
non-causal attention + causal attention non-causal attention + causal attention
token_type_ids :0=non-causal, 1=causal token_type_ids :0=non-causal, 1=causal
""" """
# --8<-- [end:qwen2_decoder]
def __init__( def __init__(
self, self,
decoder_layer: int = 24, decoder_layer: int = 24,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment