Unverified Commit fcc11e5e authored by Yichao Cheng's avatar Yichao Cheng Committed by GitHub
Browse files

update support new models doc (#9096)


Signed-off-by: default avatarXinyuan Tong <xinyuantong.cs@gmail.com>
Co-authored-by: default avatarXinyuan Tong <xinyuantong.cs@gmail.com>
parent 5190ba7f
......@@ -21,8 +21,8 @@ standard LLM support:
in [model_config.py](https://github.com/sgl-project/sglang/blob/0ab3f437aba729b348a683ab32b35b214456efc7/python/sglang/srt/configs/model_config.py#L561)
to return `True` for your model.
2. **Register a new chat-template**
See [conversation.py](https://github.com/sgl-project/sglang/blob/86a779dbe9e815c02f71ea82574608f6eae016b5/python/sglang/srt/conversation.py)
2. **Register a new chat-template**:
Only when your default chat-template is unable to accept images as input: Register a new chat template in [conversation.py](https://github.com/sgl-project/sglang/tree/main/python/sglang/srt/conversation.py) and the corresponding matching function.
3. **Multimodal Data Processor**:
Define a new `Processor` class that inherits from `BaseMultimodalProcessor` and register this processor as your
......@@ -35,16 +35,18 @@ standard LLM support:
expanded (if necessary) and padded with multimodal-data-hashes so that SGLang can recognize different multimodal data
with `RadixAttention`.
5. **Adapt to Vision Attention**:
5. **Handle Image Feature Extraction**:
Implement a `get_image_feature` function for your new model, which extracts image features from raw image data and converts them into the embeddings used by the language model.
6. **Adapt to Vision Attention**:
Adapt the multi-headed `Attention` of ViT with SGLang’s `VisionAttention`.
You can refer to [Qwen2VL](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/qwen2_vl.py) or
other mllm implementations. These models demonstrate how to correctly handle both multimodal and textual inputs.
You should test the new MLLM locally against Hugging Face models. See the [
`mmmu`](https://github.com/sgl-project/sglang/tree/main/benchmark/mmmu) benchmark for an example.
## Testing and Debugging
## Test the Correctness
Please note all your testing and benchmarking results in PR description.
### Interactive Debugging
......@@ -65,14 +67,21 @@ should give the same text output and very similar prefill logits:
To ensure the new model is well maintained, add it to the test suite by including it in the `ALL_OTHER_MODELS` list in
the [test_generation_models.py](https://github.com/sgl-project/sglang/blob/main/test/srt/models/test_generation_models.py)
file, test the new model on your local machine and report the results on demonstrative benchmarks (GSM8K, MMLU, MMMU,
MMMU-Pro, etc.) in your PR.
MMMU-Pro, etc.) in your PR. \\
For VLMs, also include a test in `test_vision_openai_server_{x}.py` (e.g. [test_vision_openai_server_a.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_vision_openai_server_a.py), [test_vision_openai_server_b.py](https://github.com/sgl-project/sglang/blob/main/test/srt/test_vision_openai_server_b.py)).
This is the command to test a new model on your local machine:
This is an example command to run to test a new model on your local machine:
```bash
ONLY_RUN=Qwen/Qwen2-1.5B python3 -m unittest test_generation_models.TestGenerationModels.test_others
```
### Benchmark
- **(Required) MMMU**: follow MMMU benchmark [README.md](https://github.com/sgl-project/sglang/blob/main/benchmark/mmmu/README.md) to get SGLang vs. HF Transformer accuracy comparison. The accuracy score from SGLang run should not be much lower than that from HF Transformer run. Similarly, follow https://docs.sglang.ai/developer_guide/benchmark_and_profiling.html to get performance comparison: TTFT and throughput must meet or exceed baselines (e.g., HF Transformer).
- **(Optional) Other evals**: If you ran other evals, please note the results in PR description.
## Port a Model from vLLM to SGLang
The [vLLM Models Directory](https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models) is a valuable
......@@ -126,6 +135,9 @@ ModelRegistry.models.update(import_new_model_classes())
launch_server(server_args)
```
## Documentation
Add to table of supported models in [generative_models.md](https://github.com/sgl-project/sglang/blob/main/docs/supported_models/generative_models.md) or [multimodal_language_models.md](https://github.com/sgl-project/sglang/blob/main/docs/supported_models/multimodal_language_models.md)
---
By following these guidelines, you can add support for new language models and multimodal large language models in
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment