"examples/profiling/nx_bench.sh" did not exist on "0d6504434befdf609d34709891eecf85f27e0934"
Unverified Commit ced68066 authored by Mick's avatar Mick Committed by GitHub
Browse files

doc: Support a new vLM (#3405)


Co-authored-by: default avatarzhaochenyang20 <zhaochen20@outlook.com>
parent b8318aec
...@@ -48,12 +48,23 @@ ...@@ -48,12 +48,23 @@
- InternLM2ForRewardModel - InternLM2ForRewardModel
- `python -m sglang.launch_server --model-path internlm/internlm2-7b-reward --is-embedding --trust-remote-code` - `python -m sglang.launch_server --model-path internlm/internlm2-7b-reward --is-embedding --trust-remote-code`
## How to Support a New Model ## How to Support a New Language Model
To support a new model in SGLang, you only need to add a single file under [SGLang Models Directory](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/models). To support a new model in SGLang, you only need to add a single file under [SGLang Models Directory](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/models).
You can learn from existing model implementations and create new files for the new models. You can learn from existing model implementations and create new files for the new models.
For most models, you should be able to find a similar model to start with (e.g., starting from Llama). For most models, you should be able to find a similar model to start with (e.g., starting from Llama).
## How to Support a New vision LLM
To support a new vision-language model (vLM) in SGLang, there are several key components in addition to the standard LLM.
1. **Register your new model as multimodal**: Extend `is_multimodal_model` in [`model_config.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/configs/model_config.py) to return True for your model.
2. **Process Images**: Create a new `ImageProcessor` class that inherits from `BaseImageProcessor` and register this processor as your model's dedicated processor. See [`image_processor.py`](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/managers/image_processor.py) for more details.
3. **Handle Image Tokens**: Implement a `pad_input_ids` function for your new model, in which image tokens in the prompt should be expanded and replaced with image-hashes, so that SGLang can recognize different images for `RadixAttention`.
4. Replace Multi-headed `Attention` of ViT with SGLang's `VisionAttention`.
You can refer [Qwen2VL](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/qwen2_vl.py) or other vLMs. These models demonstrate how to properly handle both visual and textual inputs.
### Test the correctness ### Test the correctness
#### Interactive debugging #### Interactive debugging
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment