README.md

# <div align="center"><strong>vllm-omni</strong></div>
## 简介
vLLM 最初是为支持文本生成任务的大型语言模型而设计的。vLLM-Omni 是一个框架，它将 vLLM 的支持扩展到全模态模型推理和服务的领域。
## 项目特色
vLLM-Omni 速度很快，具备以下特点：

利用 vLLM 的高效 KV 缓存管理，实现最先进的 AR 支持
流水线式阶段执行重叠以实现高吞吐量性能
基于 OmniConnector 的完全解耦和跨阶段的动态资源分配
vLLM-Omni 灵活易用，可与以下产品配合使用：

异构管道抽象用于管理复杂的模型工作流程
与流行的 Hugging Face 模型无缝集成
支持分布式推理的张量、管道、数据和专家并行性
流媒体输出
兼容 OpenAI 的 API 服务器
vLLM-Omni 可无缝支持 HuggingFace 上大多数流行的开源模型，包括：

全模态模型（例如 Qwen2.5-Omni、Qwen3-Omni）
多模态生成模型（例如 Qwen-Image）
## 支持模型结构列表

| 模型名                                                             | 参数量                            | Template            |
| ----------------------------------------------------------------- | -------------------------------- | ------------------- |
| [Qwen2.5-Omni](https://huggingface.co/collections/Qwen/qwen25-omni)                       | 3B/7B                            | qwen2_5_omni        |
| [Qwen3-Omni](https://huggingface.co/collections/Qwen/qwen3-omni)                         | 30B-A3B                          | qwen3_omni          |
| [Qwen3-TTS](https://huggingface.co/collections/Qwen/qwen3-tts)                          | 0.6B/1.7B                        | qwen3_tts           |
| [Qwen-Image](https://huggingface.co/collections/Qwen/qwen-image)                         | -                                | qwen_image          |
| [GLM-Image](https://huggingface.co/zai-org/GLM-Image)                         | -                                | glm_image           |
| [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image)                      | -                                | z_image             |
| [Wan2.2](https://huggingface.co/Video-Reason/VBVR-Wan2.2)                           | 5B/A14B                          | wan2_2              |
| [Ovis-Image](https://huggingface.co/collections/AIDC-AI/ovis-image)                       | -                                | ovis_image          |
| [LongCat-Image](https://huggingface.co/meituan-longcat/LongCat-Image)           | -                                | longcat_image       |
| [Stable Diffusion 3](https://huggingface.co/collections/stabilityai/stable-diffusion-3)          | 3.5-medium                       | sd3                 |
| [Stable Audio Open](https://huggingface.co/stabilityai/stable-audio-open-1.0)           | 1.0                              | stable_audio        |
| [FLUX.1-dev](https://huggingface.co/collections/black-forest-labs/flux1)            | -                                | flux                |
| [FLUX.2-klein](https://huggingface.co/collections/black-forest-labs/flux2)          | 4B/9B                            | flux2_klein         |
持续更新中...

> **[!NOTE]**
vllm-omni是对vllm框架的拓展，严格依赖具体的vllm版本，如果版本没有对齐，可能遇到一些错误，可以考虑更换版本，或者查看vllm-omni项目的后续PR是否有解决方案
安装vllm-omni包以后只是拓展了vllm对多模态的支持程度，在DCU上vllm-omni支持的模型能否推理，具体还是要看vllm本身是否能够支持
> **已知问题及解决方案**


## 使用源码编译方式安装
### 环境准备

`-v 路径`、`docker_name`和`imageID`根据实际情况修改

####  Docker

基于光源基础镜像环境：镜像下载地址：[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch)

```bash
docker pull harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220
docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash

cd /your_code_path/vllm-omni
pip install -e . --no-build-isolation
```

## 参考资料

- [README](README_origin.md)
- [LLaMA-Factory](https://github.com/vllm-project/vllm-omni)