中文English
# IndexTTS-vLLM
## Introduction This project re-implements the GPT model's inference from [index-tts](https://github.com/index-tts/index-tts) using the vllm library, accelerating the inference process of index-tts. Inference speed improvement (Index-TTS-v1/v1.5) on a single RTX 4090: - RTF (Real-Time Factor) for a single request: ≈0.3 -> ≈0.1 - GPT model decode speed for a single request: ≈90 tokens/s -> ≈280 tokens/s - Concurrency: With `gpu_memory_utilization` set to 0.25 (approx. 5GB VRAM), it can handle a concurrency of around 16 without pressure (refer to `simple_test.py` for the benchmark script). ## Update Log - **[2025-09-22]** Added support for vllm v1. Compatibility with IndexTTS2 is in progress. - **[2025-09-28]** Supported web UI inference for IndexTTS2 and organized the weight files for easier deployment! \0.0/ ; However, the current version doesn't seem to accelerate the GPT of IndexTTS2, which is under investigation. - **[2025-09-29]** Resolved the issue of ineffective GPT model inference acceleration for IndexTTS2. - **[2025-10-09]** Compatible with IndexTTS2 API calls, please refer to [API](#api); APIs for v1/1.5 and the OpenAI-compatible interfaces may still have bugs, to be fixed later. - **[2025-10-19]** Supported vllm inference for qwen0.6bemo4-merge. ## TODO list - Concurrency optimization for V2 API: Currently, only the gpt2 model inference is parallel, while other modules run serially. The s2mel inference has a large overhead (requiring 25 DiT iterations), which significantly impacts concurrency performance. - Acceleration of s2mel inference. ## Usage Steps ### 1. Clone this project ```bash git clone https://github.com/Ksuriuri/index-tts-vllm.git cd index-tts-vllm ``` ### 2. Create and activate a conda environment ```bash conda create -n index-tts-vllm python=3.12 conda activate index-tts-vllm ``` ### 3. Install PyTorch PyTorch version 2.8.0 is required (corresponding to vllm 0.10.2). For specific installation instructions, please refer to the [PyTorch official website](https://pytorch.org/get-started/locally/). ### 4. Install dependencies ```bash pip install -r requirements.txt ``` ### 5. Download model weights #### Automatic Download (Recommended) Download the corresponding version of the model weights to the `checkpoints/` directory: ```bash # Index-TTS modelscope download --model kusuriuri/Index-TTS-vLLM --local_dir ./checkpoints/Index-TTS-vLLM # IndexTTS-1.5 modelscope download --model kusuriuri/Index-TTS-1.5-vLLM --local_dir ./checkpoints/Index-TTS-1.5-vLLM # IndexTTS-2 modelscope download --model kusuriuri/IndexTTS-2-vLLM --local_dir ./checkpoints/IndexTTS-2-vLLM ``` #### Manual Download - ModelScope: [Index-TTS](https://www.modelscope.cn/models/kusuriuri/Index-TTS-vLLM) | [IndexTTS-1.5](https://www.modelscope.cn/models/kusuriuri/Index-TTS-1.5-vLLM) | [IndexTTS-2](https://www.modelscope.cn/models/kusuriuri/IndexTTS-2-vLLM) #### Convert original weights yourself (Optional, not recommended) You can use `convert_hf_format.sh` to convert the official weight files yourself: ```bash bash convert_hf_format.sh /path/to/your/model_dir ``` ### 6. Launch the web UI! Run the corresponding version (the first launch may take longer due to CUDA kernel compilation for bigvgan): ```bash # Index-TTS 1.0 python webui.py # IndexTTS-1.5 python webui.py --version 1.5 # IndexTTS-2 python webui_v2.py ``` ## API An API interface is encapsulated using FastAPI. Here is an example of how to start it: ```bash # Index-TTS-1.0/1.5 python api_server.py # IndexTTS-2 python api_server_v2.py ``` ### Startup Parameters - `--model_dir`: Required, path to the model weights. - `--host`: Server IP address, defaults to `0.0.0.0`. - `--port`: Server port, defaults to `6006`. - `--gpu_memory_utilization`: vllm GPU memory utilization rate, defaults to `0.25`. ### API Request Examples - For v1/1.5, please refer to `api_example.py`. - For v2, please refer to `api_example_v2.py`. ### OpenAI API - Added `/audio/speech` API path for compatibility with the OpenAI interface. - Added `/audio/voices` API path to get the list of voices/characters. For details, see: [createSpeech](https://platform.openai.com/docs/api-reference/audio/createSpeech) ## New Features - **v1/v1.5:** Supports multi-character audio mixing: You can input multiple reference audios, and the TTS output voice will be a mix of these reference audios. (Inputting multiple reference audios may lead to an unstable output voice; you can try multiple times to get a satisfactory voice and then use it as a reference audio). ## Performance Word Error Rate (WER) Results for IndexTTS and Baseline Models on the [**seed-test**](https://github.com/BytedanceSpeech/seed-tts-eval) | model | zh | en | | ----------------------- | ----- | ----- | | Human | 1.254 | 2.143 | | index-tts (num_beams=3) | 1.005 | 1.943 | | index-tts (num_beams=1) | 1.107 | 2.032 | | index-tts-vllm | 1.12 | 1.987 | Maintains the performance of the original project. ## Concurrency Test Refer to [`simple_test.py`](simple_test.py). The API service must be started first.