Commit ce14b9ba authored by weishb's avatar weishb
Browse files

在vllm-0.15.1版本支持qwen3-tts模型

parent 3a84b95b
...@@ -18,14 +18,14 @@ Qwen3-TTS 覆盖10种主要语言(中文、英文、日文、韩文、德文 ...@@ -18,14 +18,14 @@ Qwen3-TTS 覆盖10种主要语言(中文、英文、日文、韩文、德文
## 环境依赖 ## 环境依赖
| 软件 | 版本 | | 软件 | 版本 |
| :------: | :------: | | :------: | :------: |
| DTK | 25.04.2 | | DTK | 26.04 |
| python | 3.10.12 | | python | 3.10.12 |
| transformers | 4.57.3 | | transformers | 4.57.6 |
| vllm | 0.9.2+das.opt2.dtk25042 | | vllm | 0.15.1+das.opt1.alpha.dtk2604 |
| torchaudio | 2.5.1+das.opt1.dtk25042.20251127.g10a9ffcd | | torchaudio | torchaudio-2.9.0+das.opt1.dtk2604.20260206.g275d08c2 |
| transformer_engine | 2.5.0+das.opt1.dtk25042 | | vllm-omni | 0.15.1+fix1 |
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226 推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220
```bash ```bash
docker run -it \ docker run -it \
...@@ -42,7 +42,7 @@ docker run -it \ ...@@ -42,7 +42,7 @@ docker run -it \
-u root \ -u root \
-v /opt/hyhal/:/opt/hyhal/:ro \ -v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \ -v /path/your_code_data/:/path/your_code_data/ \
harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226 bash harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220 bash
``` ```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。 更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
...@@ -53,12 +53,11 @@ pip install -r requirements.txt ...@@ -53,12 +53,11 @@ pip install -r requirements.txt
镜像内其他环境配置 镜像内其他环境配置
``` ```
1.重新安装torchaudio pip uninstall vllm
pip uninstall torchaudio pip install vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl --no-deps
pip install torchaudio-2.5.1+das.opt1.dtk25042.20251127.g10a9ffcd-cp310-cp310-manylinux_2_28_x86_64.whl pip install vllm_omni-0.15.1+fix1-py3-none-any.whl
pip install torchaudio-2.9.0+das.opt1.dtk2604.20260206.g275d08c2-cp310-cp310-linux_x86_64.whl --no-deps
2.解压vllm.zip到/usr/local/lib/python3.10/dist-packages直接覆盖需要修改的文件 pip install pycountry
unzip -o vllm.zip -d /usr/local/lib/python3.10/dist-packages
``` ```
...@@ -87,103 +86,83 @@ python test_model_12hz_base.py ...@@ -87,103 +86,83 @@ python test_model_12hz_base.py
VoiceDesign VoiceDesign
```bash ```bash
## serve启动 ## serve启动
VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server \ vllm-omni serve Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
--model Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-VoiceDesign \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
--served-model-name qwen3-tts \ --omni \
--host 0.0.0.0 \ --port 8001 \
--port 8000 \ --trust-remote-code \
--trust-remote-code \ --enforce-eager \
--dtype bfloat16 \ --served-model-name qwen3-tts
--disable-async-output-proc
## client访问 ## client访问
curl -sS http://127.0.0.1:8000/v1/audio/speech \ curl -X POST http://localhost:8001/v1/audio/speech \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-o output.wav \ -d '{
-d '{ "model":"qwen3-tts",
"model":"qwen3-tts", "input": "其实我真的有发现,我是一个特别善于观察别人情绪的人。",
"text":"哥哥,你回来啦,人家等了你好久好久了,要抱抱!", "voice": "vivian",
"task_type":"VoiceDesign", "instructions": "用特别开心的语气说",
"language":"Auto", "task_type": "VoiceDesign",
"instruct":"体现撒娇稚嫩的萝莉女声,音调偏高且起伏明显,营造出黏人、做作又刻意卖萌的听觉效果。", "x_vector_only_mode": null,
"generation_params":{ "ref_text": null,
"max_new_tokens":4096, "ref_audio": null}' \
"do_sample":true, --output output_design.wav
"top_k":50,
"top_p":1.0,
"temperature":0.9
},
"response_format":"wav"
}'
``` ```
CustomVoice CustomVoice
```bash ```bash
## serve启动 ## serve启动
VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server \ vllm-omni serve Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-CustomVoice \
--model Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-CustomVoice \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
--served-model-name qwen3-tts \ --omni \
--host 0.0.0.0 \ --port 8001 \
--port 8000 \ --trust-remote-code \
--trust-remote-code \ --enforce-eager \
--dtype bfloat16 \ --served-model-name qwen3-tts
--disable-async-output-proc
## client访问 ## client访问
curl -sS http://127.0.0.1:8000/v1/audio/speech \ curl -X POST http://localhost:8001/v1/audio/speech \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-o output_customvoice.wav \ -d '{
-d '{ "model":"qwen3-tts",
"model":"qwen3-tts", "input": "你好,我是通义千问",
"text":"哥哥,你回来啦,人家等了你好久好久了,要抱抱!", "voice": "vivian",
"task_type":"CustomVoice", "language": "Chinese",
"speaker":"eric", "task_type": "CustomVoice",
"language":"Auto", "x_vector_only_mode": null,
"instruct":"", "ref_text": null,
"generation_params":{ "ref_audio": null}' \
"max_new_tokens":4096, --output output_custom.wav
"do_sample":true,
"top_k":50,
"top_p":1.0,
"temperature":0.9
},
"response_format":"wav"
}'
``` ```
VoiceClone VoiceClone
```bash ```bash
## serve启动 ## serve启动
VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server \ vllm-omni serve Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-Base \
--model Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-Base \ --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
--served-model-name qwen3-tts \ --omni \
--host 0.0.0.0 \ --port 8001 \
--port 8000 \ --trust-remote-code \
--trust-remote-code \ --enforce-eager \
--dtype bfloat16 \ --served-model-name qwen3-tts
--disable-async-output-proc
## client访问 ## client访问
curl -sS http://127.0.0.1:8000/v1/audio/speech \ REF_AUDIO_B64=$(base64 -w 0 /output_audio/output.wav)
-H "Content-Type: application/json" \ curl -X POST http://localhost:8001/v1/audio/speech \
-o output_clone_icl.wav \ -H "Content-Type: application/json" \
-d '{ --output output_base.wav \
"model":"qwen3-tts", -d @- <<JSON
"text":"今天的风很温柔,我们一起出去走走吧。", {
"task_type":"Base", "model": "qwen3-tts",
"language":"Auto", "input": "哥哥,欢迎回家,要抱抱",
"ref_audio":"/path/to/ref.wav", "language": "Chinese",
"ref_text":"参考音频对应的文本内容", "task_type": "Base",
"x_vector_only_mode":false, "x_vector_only_mode": false,
"generation_params":{ "ref_text": "哥哥,你回来啦,人家等了你好久好久了,要抱抱!",
"max_new_tokens":4096, "ref_audio": "data:audio/wav;base64,${REF_AUDIO_B64}",
"do_sample":true, "max_new_tokens": 256
"top_k":50, }
"top_p":1.0, JSON
"temperature":0.9
},
"response_format":"wav"
}'
``` ```
...@@ -199,9 +178,9 @@ curl -sS http://127.0.0.1:8000/v1/audio/speech \ ...@@ -199,9 +178,9 @@ curl -sS http://127.0.0.1:8000/v1/audio/speech \
## 预训练权重 ## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址| | 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:| |:-----:|:----------:|:----------:|:---------------------:|:----------:|
| Qwen3-TTS-12Hz-1.7B-VoiceDesign | 1.7B | K100AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign)| | Qwen3-TTS-12Hz-1.7B-VoiceDesign | 1.7B | K100AI,K500SM_AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign)|
| Qwen3-TTS-12Hz-1.7B-CustomVoice | 1.7B | K100AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)| | Qwen3-TTS-12Hz-1.7B-CustomVoice | 1.7B | K100AI,K500SM_AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)|
| Qwen3-TTS-12Hz-1.7B-Base | 1.7B | K100AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-Base)| | Qwen3-TTS-12Hz-1.7B-Base | 1.7B | K100AI,K500SM_AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-Base)|
## 源码仓库及问题反馈 ## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/weishb/qwen3-tts_pytorch - https://developer.sourcefind.cn/codes/weishb/qwen3-tts_pytorch
......
...@@ -11,4 +11,4 @@ appCategory=语音合成 ...@@ -11,4 +11,4 @@ appCategory=语音合成
# 框架类型 # 框架类型
frameType=vllm,pytorch frameType=vllm,pytorch
# 加速卡类型 # 加速卡类型
accelerateType=K100AI accelerateType=K100AI,K500SM_AI
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment