change readme.md

3013e781 · wangsen · ededcd02 · 3013e781
Commit 3013e781 authored Dec 04, 2025 by wangsen
Hide whitespace changes
Inline Side-by-side

Showing with 60 additions and 105 deletions

README.md README.md +60 -105

No files found.
--- a/README.md
+++ b/README.md
-<div align="center">
+```markdown
-<img src="https://github.com/user-attachments/assets/a4ccbc60-5248-4dca-8cec-09a6385c6d0f" width="768" height="192">
+# ClearerVoice-Studio 快速部署与运行指南（国产 DCU 环境）
-</div>
+## 1. 启动 Docker 容器（DCU 专用）
-<strong>ClearerVoice-Studio</strong> is an open-source, AI-powered speech processing toolkit designed for researchers, developers, and end-users. It provides capabilities of speech enhancement, speech separation, speech super-resolution, target speaker extraction, and more. The toolkit provides state-of-the-art pre-trained models, along with training and inference scripts, all accessible from this repository.
+```bash
-#### 👉🏻[HuggingFace Demo](https://huggingface.co/spaces/alibabasglab/ClearVoice)👈🏻  | 👉🏻[ModelScope Demo](https://modelscope.cn/studios/iic/ClearerVoice-Studio) ｜ 👉🏻[SpeechScore Demo](https://huggingface.co/spaces/alibabasglab/SpeechScore)👈🏻 ｜ 👉🏻[Paper](https://arxiv.org/abs/2506.19398)👈🏻 
+# 拉取镜像
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10
---
-![GitHub Repo stars](https://img.shields.io/github/stars/modelscope/ClearerVoice-Studio) Please leave your ⭐ on our GitHub to support this community project！
+# 启动容器（必须带全以下参数）
+docker run -it \
-记得点击右上角的星星⭐来支持我们一下，您的支持是我们更新模型的最大动力！
+  --network=host \
+  --ipc=host \
-## News :fire:
+  --shm-size=64G \
- Upcoming: More tasks will be added to ClearVoice.
+  --device=/dev/kfd \
- [2025.6] Add an interface for [ClearVoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice) that allows passing a Numpy array into the model and receiving its output as a NumPy array. It allows a more flexible call of the models during a training or inference pipeline. Please check out [`demo_Numpy2Numpy.py`](https://github.com/modelscope/ClearerVoice-Studio/blob/main/clearvoice/demo_Numpy2Numpy.py).
+  --device=/dev/mkfd \
- [2025.5] Updated speechscore with more non-intrusive metrics: NISQA and DISTILL_MOS
+  --device=/dev/dri \
- [2025.4] Updated pip installation for [ClearVoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice). Now you can simply type `pip install clearvoice` to use all the pretrained models in ClearVoice, see project description in PyPi [link](https://pypi.org/project/clearvoice/).
+  -v /opt/hyhal:/opt/hyhal \
- [2025.4] Added a training script for speech super-resolution, supporting both retraining and fine-tuning of models. For details, refer to the documentation [here](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train/speech_super_resolution).
+  --group-add video \
- [2025.4] Added data generation scripts for training/finetuning speech enhancement models. The scripts generate either noisy speech or noisy-reverberant speech. Please check [here](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train/data_generation/speech_enhancement).
+  --cap-add=SYS_PTRACE \
- [2025.1] ClearVoice demo is ready for try on both [HuggingFace](https://huggingface.co/spaces/alibabasglab/ClearVoice) and [ModelScope](https://modelscope.cn/studios/iic/ClearerVoice-Studio). However, HuggingFace has limited GPU usage, and ModelScope has more GPU usage quota.
+  --security-opt seccomp=unconfined \
- [2025.1] ClearVoice now offers **speech super-resolution**, also known as bandwidth extension. This feature improves the perceptual quality of speech by converting low-resolution audio (with an effective sampling rate of at least 16,000 Hz) into high-resolution audio with a sampling rate of 48,000 Hz. A full upscaled **LJSpeech-1.1-48kHz dataset** can be downloaded from [HuggingFace](https://huggingface.co/datasets/alibabasglab/LJSpeech-1.1-48kHz) and [ModelScope](https://modelscope.cn/datasets/iic/LJSpeech-1.1-48kHz).
+  image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10 \
- [2025.1] ClearVoice now supports more audio formats including **"wav", "aac", "ac3", "aiff", "flac", "m4a", "mp3", "ogg", "opus", "wma", "webm"**, etc. It also supports both mono and stereo channels with 16-bit or 32-bit precisions. A latest version of [ffmpeg](https://github.com/FFmpeg/FFmpeg) is required for audio codecs.  
+  /bin/bash
- [2024.12] Upload pre-trained models on ModelScope. User now can download the models from either [ModelScope](https://www.modelscope.cn/models/iic/ClearerVoice-Studio/summary) or [Huggingface](https://huggingface.co/alibabasglab)  
+```
- [2024.11] Our FRCRN speech denoiser has been used over **3.0 million** times on [ModelScope](https://modelscope.cn/models/iic/speech_frcrn_ans_cirm_16k)
- [2024.11] Our MossFormer speech separator has been used over **2.5 million** times on [ModelScope](https://modelscope.cn/models/iic/speech_mossformer_separation_temporal_8k)
- [2024.11] Release of this repository
+## 2. 下载并安装 ClearerVoice-Studio
-### 🌟 Why Choose ClearerVoice-Studio?
+```bash
+git clone https://github.com/modelscope/ClearerVoice-Studio.git
- **Pre-Trained Models:** Includes cutting-edge pre-trained models, fine-tuned on extensive, high-quality datasets. No need to start from scratch!
+cd ClearerVoice-Studio
- **Ease of Use:** Designed for seamless integration with your projects, offering a simple yet flexible interface for inference and training.
+pip install -r requirements.txt
- **Comprehensive Features:** Combines advanced algorithms for multiple speech processing tasks in one platform.
- **Community-Driven:** Built for researchers, developers, and enthusiasts to collaborate and innovate together.
+cd clearvoice
+pip install --editable .
-## Contents of this repository
+cd ..
-This repository is organized into three main components: **[ClearVoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice)**, **[Train](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train)**, and **[SpeechScore](https://github.com/modelscope/ClearerVoice-Studio/tree/main/speechscore)**.
+```
-### 1. **ClearVoice [[Readme](https://github.com/modelscope/ClearerVoice-Studio/blob/main/clearvoice/README.md)][[文档](https://github.com/modelscope/ClearerVoice-Studio/blob/main/clearvoice/README.md)]**  
+## 3. 快速测试
-ClearVoice offers a user-friendly  solution for speech processing tasks such as speech denoising, separation, super-resolution, audio-visual target speaker extraction, and more. It is designed as a unified inference platform leveraged pre-trained models (e.g., [FRCRN](https://arxiv.org/abs/2206.07293), [MossFormer](https://arxiv.org/abs/2302.11824)), all trained on extensive datasets. If you're looking for a tool to improve speech quality, ClearVoice is the perfect choice. Simply click on [`ClearVoice`](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice) and follow our detailed instructions to get started.
+```bash
-### 2. **Train**  
+# 设置 HuggingFace 国内镜像（强烈建议）
-For advanced researchers and developers, we provide model finetune and training scripts for all the tasks offerred in ClearVoice and more:
+export HF_ENDPOINT=https://hf-mirror.com
- **Task 1: [Speech enhancement](train/speech_enhancement)** (16kHz & 48kHz)
+# 运行官方测试脚本
- **Task 2: [Speech separation](train/speech_separation)** (8kHz & 16kHz)
+python test.py
- **Task 2: [Speech super-resolution](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train/speech_super_resolution)** (48kHz) 
+```
- **Task 4: [Target speaker extraction](train/target_speaker_extraction)** 
-  - **Sub-Task 1: Audio-only Speaker Extraction Conditioned on a Reference Speech** (8kHz)
+看到 `Inference completed successfully` 或类似输出即表示成功。
-  - **Sub-Task 2: Audio-visual Speaker Extraction Conditioned on Face (Lip) Recording** (16kHz)
-  - **Sub-Task 3: Audio-visual Speaker Extraction Conditioned on Body Gestures** (16kHz)
-  - **Sub-Task 4: Neuro-steered Speaker Extraction Conditioned on EEG Signals** (16kHz)
+## 参考资料
-Contributors are welcomed to include more model architectures and tasks!
+- 官方仓库：https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice  
+- 中文教程：https://stable-learn.com/zh/clearvoice-studio-tutorial  
-### 3. **SpeechScore [[Readme](https://github.com/modelscope/ClearerVoice-Studio/blob/main/speechscore/README.md)][[文档](https://github.com/modelscope/ClearerVoice-Studio/blob/main/speechscore/README.md)]**  
-<a href="https://github.com/modelscope/ClearerVoice-Studio/tree/main/speechscore">`SpeechScore`<a/> is a speech quality assessment toolkit. We include it here to evaluate different model performance. SpeechScore includes many popular speech metrics:
+## 小贴士
- Signal-to-Noise Ratio (SNR)
+- 模型下载慢请务必设置 `HF_ENDPOINT=https://hf-mirror.com`
- Perceptual Evaluation of Speech Quality (PESQ)
- Short-Time Objective Intelligibility (STOI)
- Deep Noise Suppression Mean Opinion Score (DNSMOS)
- Scale-Invariant Signal-to-Distortion Ratio (SI-SDR)
- and many more quality benchmarks  
-## Contact
-If you have any comments or questions about ClearerVoice-Studio, feel free to raise an issue in this repository or contact us directly at:
- email: {shengkui.zhao, zexu.pan}@alibaba-inc.com
-Alternatively, welcome to join our DingTalk group to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat group. 
-<p align="center">
-  <table>
-    <tr>
-      <td style="text-align:center;">
-        <a href="./asset/QR.jpg"><img alt="ClearVoice in DingTalk" src="https://img.shields.io/badge/ClearVoice-DingTalk-d9d9d9"></a>
-      </td>
-    </tr>
-    <tr>
-       <td style="text-align:center;">
-      <img alt="Light" src="./asset/dingtalk.png" width="68%" />
-      </td>
-    </tr>
-  </table>
-</p>
-## Friend Links
-Checkout some awesome Github repositories from Speech Lab of Institute for Intelligent Computing, Alibaba Group.
-<p align="center">
-<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
-        <img alt="Demo" src="https://img.shields.io/badge/Repo | Space-InspireMusic?labelColor=&label=InspireMusic&color=green"></a>
-<a href="https://github.com/modelscope/FunASR" target="_blank">
-        <img alt="Github" src="https://img.shields.io/badge/Repo | Space-FunASR?labelColor=&label=FunASR&color=green"></a>
-<a href="https://github.com/FunAudioLLM" target="_blank">
-        <img alt="Demo" src="https://img.shields.io/badge/Repo | Space-FunAudioLLM?labelColor=&label=FunAudioLLM&color=green"></a>
-<a href="https://github.com/modelscope/3D-Speaker" target="_blank">
-        <img alt="Demo" src="https://img.shields.io/badge/Repo | Space-3DSpeaker?labelColor=&label=3D-Speaker&color=green"></a>
-</p>
-## Acknowledge
-ClearerVoice-Studio contains third-party components and code modified from some open-source repos, including: <br>
-[Speechbrain](https://github.com/speechbrain/speechbrain), [ESPnet](https://github.com/espnet), [TalkNet-ASD
-](https://github.com/TaoRuijie/TalkNet-ASD)