Commit ce14b9ba authored by weishb's avatar weishb
Browse files

在vllm-0.15.1版本支持qwen3-tts模型

parent 3a84b95b
......@@ -18,14 +18,14 @@ Qwen3-TTS 覆盖10种主要语言(中文、英文、日文、韩文、德文
## 环境依赖
| 软件 | 版本 |
| :------: | :------: |
| DTK | 25.04.2 |
| DTK | 26.04 |
| python | 3.10.12 |
| transformers | 4.57.3 |
| vllm | 0.9.2+das.opt2.dtk25042 |
| torchaudio | 2.5.1+das.opt1.dtk25042.20251127.g10a9ffcd |
| transformer_engine | 2.5.0+das.opt1.dtk25042 |
| transformers | 4.57.6 |
| vllm | 0.15.1+das.opt1.alpha.dtk2604 |
| torchaudio | torchaudio-2.9.0+das.opt1.dtk2604.20260206.g275d08c2 |
| vllm-omni | 0.15.1+fix1 |
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226
推荐使用镜像:harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220
```bash
docker run -it \
......@@ -42,7 +42,7 @@ docker run -it \
-u root \
-v /opt/hyhal/:/opt/hyhal/:ro \
-v /path/your_code_data/:/path/your_code_data/ \
harbor.sourcefind.cn:5443/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.2-1226-das1.7-py3.10-20251226 bash
harbor.sourcefind.cn:5443/dcu/admin/base/custom:vllm0.15.1-ubuntu22.04-dtk26.04-0130-py3.10-20260220 bash
```
更多镜像可前往[光源](https://sourcefind.cn/#/service-list)下载使用。
......@@ -53,12 +53,11 @@ pip install -r requirements.txt
镜像内其他环境配置
```
1.重新安装torchaudio
pip uninstall torchaudio
pip install torchaudio-2.5.1+das.opt1.dtk25042.20251127.g10a9ffcd-cp310-cp310-manylinux_2_28_x86_64.whl
2.解压vllm.zip到/usr/local/lib/python3.10/dist-packages直接覆盖需要修改的文件
unzip -o vllm.zip -d /usr/local/lib/python3.10/dist-packages
pip uninstall vllm
pip install vllm-0.15.1+das.opt1.alpha.dtk2604-cp310-cp310-linux_x86_64.whl --no-deps
pip install vllm_omni-0.15.1+fix1-py3-none-any.whl
pip install torchaudio-2.9.0+das.opt1.dtk2604.20260206.g275d08c2-cp310-cp310-linux_x86_64.whl --no-deps
pip install pycountry
```
......@@ -87,103 +86,83 @@ python test_model_12hz_base.py
VoiceDesign
```bash
## serve启动
VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server \
--model Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
--served-model-name qwen3-tts \
--host 0.0.0.0 \
--port 8000 \
vllm-omni serve Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
--stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
--omni \
--port 8001 \
--trust-remote-code \
--dtype bfloat16 \
--disable-async-output-proc
--enforce-eager \
--served-model-name qwen3-tts
## client访问
curl -sS http://127.0.0.1:8000/v1/audio/speech \
curl -X POST http://localhost:8001/v1/audio/speech \
-H "Content-Type: application/json" \
-o output.wav \
-d '{
"model":"qwen3-tts",
"text":"哥哥,你回来啦,人家等了你好久好久了,要抱抱!",
"task_type":"VoiceDesign",
"language":"Auto",
"instruct":"体现撒娇稚嫩的萝莉女声,音调偏高且起伏明显,营造出黏人、做作又刻意卖萌的听觉效果。",
"generation_params":{
"max_new_tokens":4096,
"do_sample":true,
"top_k":50,
"top_p":1.0,
"temperature":0.9
},
"response_format":"wav"
}'
"input": "其实我真的有发现,我是一个特别善于观察别人情绪的人。",
"voice": "vivian",
"instructions": "用特别开心的语气说",
"task_type": "VoiceDesign",
"x_vector_only_mode": null,
"ref_text": null,
"ref_audio": null}' \
--output output_design.wav
```
CustomVoice
```bash
## serve启动
VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server \
--model Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-CustomVoice \
--served-model-name qwen3-tts \
--host 0.0.0.0 \
--port 8000 \
vllm-omni serve Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-CustomVoice \
--stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
--omni \
--port 8001 \
--trust-remote-code \
--dtype bfloat16 \
--disable-async-output-proc
--enforce-eager \
--served-model-name qwen3-tts
## client访问
curl -sS http://127.0.0.1:8000/v1/audio/speech \
curl -X POST http://localhost:8001/v1/audio/speech \
-H "Content-Type: application/json" \
-o output_customvoice.wav \
-d '{
"model":"qwen3-tts",
"text":"哥哥,你回来啦,人家等了你好久好久了,要抱抱!",
"task_type":"CustomVoice",
"speaker":"eric",
"language":"Auto",
"instruct":"",
"generation_params":{
"max_new_tokens":4096,
"do_sample":true,
"top_k":50,
"top_p":1.0,
"temperature":0.9
},
"response_format":"wav"
}'
"input": "你好,我是通义千问",
"voice": "vivian",
"language": "Chinese",
"task_type": "CustomVoice",
"x_vector_only_mode": null,
"ref_text": null,
"ref_audio": null}' \
--output output_custom.wav
```
VoiceClone
```bash
## serve启动
VLLM_USE_V1=0 python -m vllm.entrypoints.openai.api_server \
--model Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-Base \
--served-model-name qwen3-tts \
--host 0.0.0.0 \
--port 8000 \
vllm-omni serve Qwen3-TTS/Qwen3-TTS-12Hz-1.7B-Base \
--stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
--omni \
--port 8001 \
--trust-remote-code \
--dtype bfloat16 \
--disable-async-output-proc
--enforce-eager \
--served-model-name qwen3-tts
## client访问
curl -sS http://127.0.0.1:8000/v1/audio/speech \
REF_AUDIO_B64=$(base64 -w 0 /output_audio/output.wav)
curl -X POST http://localhost:8001/v1/audio/speech \
-H "Content-Type: application/json" \
-o output_clone_icl.wav \
-d '{
"model":"qwen3-tts",
"text":"今天的风很温柔,我们一起出去走走吧。",
"task_type":"Base",
"language":"Auto",
"ref_audio":"/path/to/ref.wav",
"ref_text":"参考音频对应的文本内容",
"x_vector_only_mode":false,
"generation_params":{
"max_new_tokens":4096,
"do_sample":true,
"top_k":50,
"top_p":1.0,
"temperature":0.9
},
"response_format":"wav"
}'
--output output_base.wav \
-d @- <<JSON
{
"model": "qwen3-tts",
"input": "哥哥,欢迎回家,要抱抱",
"language": "Chinese",
"task_type": "Base",
"x_vector_only_mode": false,
"ref_text": "哥哥,你回来啦,人家等了你好久好久了,要抱抱!",
"ref_audio": "data:audio/wav;base64,${REF_AUDIO_B64}",
"max_new_tokens": 256
}
JSON
```
......@@ -199,9 +178,9 @@ curl -sS http://127.0.0.1:8000/v1/audio/speech \
## 预训练权重
| 模型名称 | 权重大小 | DCU型号 | 最低卡数需求 |下载地址|
|:-----:|:----------:|:----------:|:---------------------:|:----------:|
| Qwen3-TTS-12Hz-1.7B-VoiceDesign | 1.7B | K100AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign)|
| Qwen3-TTS-12Hz-1.7B-CustomVoice | 1.7B | K100AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)|
| Qwen3-TTS-12Hz-1.7B-Base | 1.7B | K100AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-Base)|
| Qwen3-TTS-12Hz-1.7B-VoiceDesign | 1.7B | K100AI,K500SM_AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign)|
| Qwen3-TTS-12Hz-1.7B-CustomVoice | 1.7B | K100AI,K500SM_AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice)|
| Qwen3-TTS-12Hz-1.7B-Base | 1.7B | K100AI,K500SM_AI | 1 | [Modelscope](https://www.modelscope.cn/models/Qwen/Qwen3-TTS-12Hz-1.7B-Base)|
## 源码仓库及问题反馈
- https://developer.sourcefind.cn/codes/weishb/qwen3-tts_pytorch
......
......@@ -11,4 +11,4 @@ appCategory=语音合成
# 框架类型
frameType=vllm,pytorch
# 加速卡类型
accelerateType=K100AI
accelerateType=K100AI,K500SM_AI
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
File suppressed by a .gitattributes entry or the file's encoding is unsupported.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment