add README.md

fcba5892 · weishb · c1cacde6 · fcba5892 · fcba5892
Commit fcba5892 authored Mar 25, 2026 by weishb
Show whitespace changes
Inline Side-by-side

Showing with 156 additions and 90 deletions

README.md README.md +63 -90

README_origin.md README_origin.md +93 -0

No files found.
--- a/README.md
+++ b/README.md
-<p align="center">
+# <div align="center"><strong>vllm-omni</strong></div>
-  <picture>
+## 简介
-    <source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/logos/vllm-omni-logo.png">
+vLLM 最初是为支持文本生成任务的大型语言模型而设计的。vLLM-Omni 是一个框架，它将 vLLM 的支持扩展到全模态模型推理和服务的领域。
-    <img alt="vllm-omni" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/logos/vllm-omni-logo.png" width=55%>
+## 项目特色
-  </picture>
+vLLM-Omni 速度很快，具备以下特点：
-</p>
-<h3 align="center">
+利用 vLLM 的高效 KV 缓存管理，实现最先进的 AR 支持
-Easy, fast, and cheap omni-modality model serving for everyone
+流水线式阶段执行重叠以实现高吞吐量性能
-</h3>
+基于 OmniConnector 的完全解耦和跨阶段的动态资源分配
+vLLM-Omni 灵活易用，可与以下产品配合使用：
-<p align="center">
-| <a href="https://vllm-omni.readthedocs.io/en/latest/"><b>Documentation</b></a> | <a href="https://discuss.vllm.ai"><b>User Forum</b></a> | <a href="https://slack.vllm.ai"><b>Developer Slack</b></a> | <a href="docs/assets/WeChat.jpg"><b>WeChat</b></a> |
+异构管道抽象用于管理复杂的模型工作流程
-</p>
+与流行的 Hugging Face 模型无缝集成
+支持分布式推理的张量、管道、数据和专家并行性
---
+流媒体输出
+兼容 OpenAI 的 API 服务器
-*Latest News* 🔥
+vLLM-Omni 可无缝支持 HuggingFace 上大多数流行的开源模型，包括：
- [2026/02] We released [0.14.0](https://github.com/vllm-project/vllm-omni/releases/tag/v0.14.0) - This is the first **stable release** of vLLM-Omni that expands Omni’s diffusion / image-video generation and audio / TTS stack, improves distributed execution and memory efficiency, and broadens platform/backend coverage (GPU/ROCm/NPU/XPU). It also brings meaningful upgrades to serving APIs, profiling & benchmarking, and overall stability. Please check our latest [paper](https://arxiv.org/abs/2602.02204) for architecture design and performance results.
+全模态模型（例如 Qwen2.5-Omni、Qwen3-Omni）
- [2026/01] We released [0.12.0rc1](https://github.com/vllm-project/vllm-omni/releases/tag/v0.12.0rc1) - a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm), please check our latest [design](https://docs.google.com/presentation/d/1qv4qMW1rKAqDREMXiUDLIgqqHQe7TDPj/edit?usp=sharing&ouid=110473603432222024453&rtpof=true&sd=true).
+多模态生成模型（例如 Qwen-Image）
- [2025/11] vLLM community officially released [vllm-project/vllm-omni](https://github.com/vllm-project/vllm-omni) in order to support omni-modality models serving.
+## 支持模型结构列表
---
+| 模型名                                                             | 参数量                            | Template            |
+| ----------------------------------------------------------------- | -------------------------------- | ------------------- |
-## About
+| [Qwen2.5-Omni](https://huggingface.co/collections/Qwen/qwen25-omni)                       | 3B/7B                            | qwen2_5_omni        |
+| [Qwen3-Omni](https://huggingface.co/collections/Qwen/qwen3-omni)                         | 30B-A3B                          | qwen3_omni          |
-[vLLM](https://github.com/vllm-project/vllm) was originally designed to support large language models for text-based autoregressive generation tasks. vLLM-Omni is a framework that extends its support for omni-modality model inference and serving:
+| [Qwen3-TTS](https://huggingface.co/collections/Qwen/qwen3-tts)                          | 0.6B/1.7B                        | qwen3_tts           |
+| [Qwen-Image](https://huggingface.co/collections/Qwen/qwen-image)                         | -                                | qwen_image          |
- **Omni-modality**: Text, image, video, and audio data processing
+| [GLM-Image](https://huggingface.co/zai-org/GLM-Image)                         | -                                | glm_image           |
- **Non-autoregressive Architectures**: extend the AR support of vLLM to Diffusion Transformers (DiT) and other parallel generation models
+| [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image)                      | -                                | z_image             |
- **Heterogeneous outputs**: from traditional text generation to multimodal outputs
+| [Wan2.2](https://huggingface.co/Video-Reason/VBVR-Wan2.2)                           | 5B/A14B                          | wan2_2              |
+| [Ovis-Image](https://huggingface.co/collections/AIDC-AI/ovis-image)                       | -                                | ovis_image          |
-<p align="center">
+| [LongCat-Image](https://huggingface.co/meituan-longcat/LongCat-Image)           | -                                | longcat_image       |
-  <picture>
+| [Stable Diffusion 3](https://huggingface.co/collections/stabilityai/stable-diffusion-3)          | 3.5-medium                       | sd3                 |
-    <img alt="vllm-omni" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/omni-modality-model-architecture.png" width=55%>
+| [Stable Audio Open](https://huggingface.co/stabilityai/stable-audio-open-1.0)           | 1.0                              | stable_audio        |
-  </picture>
+| [FLUX.1-dev](https://huggingface.co/collections/black-forest-labs/flux1)            | -                                | flux                |
-</p>
+| [FLUX.2-klein](https://huggingface.co/collections/black-forest-labs/flux2)          | 4B/9B                            | flux2_klein         |
+持续更新中...
-vLLM-Omni is fast with:
+> **[!NOTE]**
- State-of-the-art AR support by leveraging efficient KV cache management from vLLM
+vllm-omni是对vllm框架的拓展，严格依赖具体的vllm版本，如果版本没有对齐，可能遇到一些错误，可以考虑更换版本，或者查看vllm-omni项目的后续PR是否有解决方案
- Pipelined stage execution overlapping for high throughput performance
+安装vllm-omni包以后只是拓展了vllm对多模态的支持程度，在DCU上vllm-omni支持的模型能否推理，具体还是要看vllm本身是否能够支持
- Fully disaggregation based on OmniConnector and dynamic resource allocation across stages
+> **已知问题及解决方案**
-vLLM-Omni is flexible and easy to use with:
+## 使用源码编译方式安装
- Heterogeneous pipeline abstraction to manage complex model workflows
+### 环境准备
- Seamless integration with popular Hugging Face models
- Tensor, pipeline, data and expert parallelism support for distributed inference
+`-v 路径`、`docker_name`和`imageID`根据实际情况修改
- Streaming outputs
- OpenAI-compatible API server
+####  Docker
-vLLM-Omni seamlessly supports most popular open-source models on HuggingFace, including:
+基于光源基础镜像环境：镜像下载地址：[https://sourcefind.cn/#/image/dcu/pytorch](https://sourcefind.cn/#/image/dcu/pytorch)
- Omni-modality models (e.g. Qwen-Omni)
+```bash
- Multi-modality generation models (e.g. Qwen-Image)
+docker pull harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220
+docker run -it --shm-size 200g --network=host --name {docker_name} --privileged --device=/dev/kfd --device=/dev/dri --device=/dev/mkfd --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal/:/opt/hyhal/:ro {imageID} bash
-## Getting Started
+cd /your_code_path/vllm-omni
-Visit our [documentation](https://vllm-omni.readthedocs.io/en/latest/) to learn more.
+pip install -e . --no-build-isolation
- [Installation](https://vllm-omni.readthedocs.io/en/latest/getting_started/installation/)
- [Quickstart](https://vllm-omni.readthedocs.io/en/latest/getting_started/quickstart/)
- [List of Supported Models](https://vllm-omni.readthedocs.io/en/latest/models/supported_models/)
-## Contributing
-We welcome and value any contributions and collaborations.
-Please check out [Contributing to vLLM-Omni](https://vllm-omni.readthedocs.io/en/latest/contributing/) for how to get involved.
-## Citation
-If you use vLLM-Omni for your research, please cite our [paper](https://arxiv.org/abs/2602.02204):
-```bibtex
-@article{yin2026vllmomni,
-  title={vLLM-Omni: Fully Disaggregated Serving for Any-to-Any Multimodal Models},
-  author={Peiqi Yin, Jiangyun Zhu, Han Gao, Chenguang Zheng, Yongxiang Huang, Taichang Zhou, Ruirui Yang, Weizhi Liu, Weiqing Chen, Canlin Guo, Didan Deng, Zifeng Mo, Cong Wang, James Cheng, Roger Wang, Hongsheng Liu},
-  journal={arXiv preprint arXiv:2602.02204},
-  year={2026}
-}
 ```
-## Join the Community
+## 参考资料
-Feel free to ask questions, provide feedbacks and discuss with fellow users of vLLM-Omni in `#sig-omni` slack channel at [slack.vllm.ai](https://slack.vllm.ai) or vLLM user forum at [discuss.vllm.ai](https://discuss.vllm.ai).
-## Star History
-[![Star History Chart](https://api.star-history.com/svg?repos=vllm-project/vllm-omni&type=date&legend=top-left)](https://www.star-history.com/#vllm-project/vllm-omni&type=date&legend=top-left)
-## License
-Apache License 2.0, as found in the [LICENSE](./LICENSE) file.
+- [README](README_origin.md)
+- [LLaMA-Factory](https://github.com/vllm-project/vllm-omni)
\ No newline at end of file
--- a/README_origin.md
+++ b/README_origin.md
+<p align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/logos/vllm-omni-logo.png">
+    <img alt="vllm-omni" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/logos/vllm-omni-logo.png" width=55%>
+  </picture>
+</p>
+<h3 align="center">
+Easy, fast, and cheap omni-modality model serving for everyone
+</h3>
+<p align="center">
+| <a href="https://vllm-omni.readthedocs.io/en/latest/"><b>Documentation</b></a> | <a href="https://discuss.vllm.ai"><b>User Forum</b></a> | <a href="https://slack.vllm.ai"><b>Developer Slack</b></a> | <a href="docs/assets/WeChat.jpg"><b>WeChat</b></a> |
+</p>
+---
+*Latest News* 🔥
+- [2026/02] We released [0.14.0](https://github.com/vllm-project/vllm-omni/releases/tag/v0.14.0) - This is the first **stable release** of vLLM-Omni that expands Omni’s diffusion / image-video generation and audio / TTS stack, improves distributed execution and memory efficiency, and broadens platform/backend coverage (GPU/ROCm/NPU/XPU). It also brings meaningful upgrades to serving APIs, profiling & benchmarking, and overall stability. Please check our latest [paper](https://arxiv.org/abs/2602.02204) for architecture design and performance results.
+- [2026/01] We released [0.12.0rc1](https://github.com/vllm-project/vllm-omni/releases/tag/v0.12.0rc1) - a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm), please check our latest [design](https://docs.google.com/presentation/d/1qv4qMW1rKAqDREMXiUDLIgqqHQe7TDPj/edit?usp=sharing&ouid=110473603432222024453&rtpof=true&sd=true).
+- [2025/11] vLLM community officially released [vllm-project/vllm-omni](https://github.com/vllm-project/vllm-omni) in order to support omni-modality models serving.
+---
+## About
+[vLLM](https://github.com/vllm-project/vllm) was originally designed to support large language models for text-based autoregressive generation tasks. vLLM-Omni is a framework that extends its support for omni-modality model inference and serving:
+- **Omni-modality**: Text, image, video, and audio data processing
+- **Non-autoregressive Architectures**: extend the AR support of vLLM to Diffusion Transformers (DiT) and other parallel generation models
+- **Heterogeneous outputs**: from traditional text generation to multimodal outputs
+<p align="center">
+  <picture>
+    <img alt="vllm-omni" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/omni-modality-model-architecture.png" width=55%>
+  </picture>
+</p>
+vLLM-Omni is fast with:
+- State-of-the-art AR support by leveraging efficient KV cache management from vLLM
+- Pipelined stage execution overlapping for high throughput performance
+- Fully disaggregation based on OmniConnector and dynamic resource allocation across stages
+vLLM-Omni is flexible and easy to use with:
+- Heterogeneous pipeline abstraction to manage complex model workflows
+- Seamless integration with popular Hugging Face models
+- Tensor, pipeline, data and expert parallelism support for distributed inference
+- Streaming outputs
+- OpenAI-compatible API server
+vLLM-Omni seamlessly supports most popular open-source models on HuggingFace, including:
+- Omni-modality models (e.g. Qwen-Omni)
+- Multi-modality generation models (e.g. Qwen-Image)
+## Getting Started
+Visit our [documentation](https://vllm-omni.readthedocs.io/en/latest/) to learn more.
+- [Installation](https://vllm-omni.readthedocs.io/en/latest/getting_started/installation/)
+- [Quickstart](https://vllm-omni.readthedocs.io/en/latest/getting_started/quickstart/)
+- [List of Supported Models](https://vllm-omni.readthedocs.io/en/latest/models/supported_models/)
+## Contributing
+We welcome and value any contributions and collaborations.
+Please check out [Contributing to vLLM-Omni](https://vllm-omni.readthedocs.io/en/latest/contributing/) for how to get involved.
+## Citation
+If you use vLLM-Omni for your research, please cite our [paper](https://arxiv.org/abs/2602.02204):
+```bibtex
+@article{yin2026vllmomni,
+  title={vLLM-Omni: Fully Disaggregated Serving for Any-to-Any Multimodal Models},
+  author={Peiqi Yin, Jiangyun Zhu, Han Gao, Chenguang Zheng, Yongxiang Huang, Taichang Zhou, Ruirui Yang, Weizhi Liu, Weiqing Chen, Canlin Guo, Didan Deng, Zifeng Mo, Cong Wang, James Cheng, Roger Wang, Hongsheng Liu},
+  journal={arXiv preprint arXiv:2602.02204},
+  year={2026}
+}
+```
+## Join the Community
+Feel free to ask questions, provide feedbacks and discuss with fellow users of vLLM-Omni in `#sig-omni` slack channel at [slack.vllm.ai](https://slack.vllm.ai) or vLLM user forum at [discuss.vllm.ai](https://discuss.vllm.ai).
+## Star History
+[![Star History Chart](https://api.star-history.com/svg?repos=vllm-project/vllm-omni&type=date&legend=top-left)](https://www.star-history.com/#vllm-project/vllm-omni&type=date&legend=top-left)
+## License
+Apache License 2.0, as found in the [LICENSE](./LICENSE) file.