Unverified Commit ced68066 authored by Mick's avatar Mick Committed by GitHub
Browse files

doc: Support a new vLM (#3405)


Co-authored-by: default avatarzhaochenyang20 <zhaochen20@outlook.com>
parent b8318aec
......@@ -48,12 +48,23 @@
- InternLM2ForRewardModel
- `python -m sglang.launch_server --model-path internlm/internlm2-7b-reward --is-embedding --trust-remote-code`
## How to Support a New Model
## How to Support a New Language Model
To support a new model in SGLang, you only need to add a single file under [SGLang Models Directory](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/models).
You can learn from existing model implementations and create new files for the new models.
For most models, you should be able to find a similar model to start with (e.g., starting from Llama).
## How to Support a New vision LLM
To support a new vision-language model (vLM) in SGLang, there are several key components in addition to the standard LLM.
1. **Register your new model as multimodal**: Extend `is_multimodal_model` in [`model_config.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/configs/model_config.py) to return True for your model.
2. **Process Images**: Create a new `ImageProcessor` class that inherits from `BaseImageProcessor` and register this processor as your model's dedicated processor. See [`image_processor.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/image_processor.py) for more details.
3. **Handle Image Tokens**: Implement a `pad_input_ids` function for your new model, in which image tokens in the prompt should be expanded and replaced with image-hashes, so that SGLang can recognize different images for `RadixAttention`.
4. Replace Multi-headed `Attention` of ViT with SGLang's `VisionAttention`.
You can refer [Qwen2VL](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/qwen2_vl.py) or other vLMs. These models demonstrate how to properly handle both visual and textual inputs.
### Test the correctness
#### Interactive debugging
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment