Commit 3013e781 authored by wangsen's avatar wangsen
Browse files

change readme.md

parent ededcd02
Pipeline #3090 failed with stages
in 0 seconds
<div align="center"> ```markdown
<img src="https://github.com/user-attachments/assets/a4ccbc60-5248-4dca-8cec-09a6385c6d0f" width="768" height="192"> # ClearerVoice-Studio 快速部署与运行指南(国产 DCU 环境)
</div>
## 1. 启动 Docker 容器(DCU 专用)
<strong>ClearerVoice-Studio</strong> is an open-source, AI-powered speech processing toolkit designed for researchers, developers, and end-users. It provides capabilities of speech enhancement, speech separation, speech super-resolution, target speaker extraction, and more. The toolkit provides state-of-the-art pre-trained models, along with training and inference scripts, all accessible from this repository.
```bash
#### 👉🏻[HuggingFace Demo](https://huggingface.co/spaces/alibabasglab/ClearVoice)👈🏻 | 👉🏻[ModelScope Demo](https://modelscope.cn/studios/iic/ClearerVoice-Studio) | 👉🏻[SpeechScore Demo](https://huggingface.co/spaces/alibabasglab/SpeechScore)👈🏻 | 👉🏻[Paper](https://arxiv.org/abs/2506.19398)👈🏻 # 拉取镜像
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10
---
![GitHub Repo stars](https://img.shields.io/github/stars/modelscope/ClearerVoice-Studio) Please leave your ⭐ on our GitHub to support this community project! # 启动容器(必须带全以下参数)
docker run -it \
记得点击右上角的星星⭐来支持我们一下,您的支持是我们更新模型的最大动力! --network=host \
--ipc=host \
## News :fire: --shm-size=64G \
- Upcoming: More tasks will be added to ClearVoice. --device=/dev/kfd \
- [2025.6] Add an interface for [ClearVoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice) that allows passing a Numpy array into the model and receiving its output as a NumPy array. It allows a more flexible call of the models during a training or inference pipeline. Please check out [`demo_Numpy2Numpy.py`](https://github.com/modelscope/ClearerVoice-Studio/blob/main/clearvoice/demo_Numpy2Numpy.py). --device=/dev/mkfd \
- [2025.5] Updated speechscore with more non-intrusive metrics: NISQA and DISTILL_MOS --device=/dev/dri \
- [2025.4] Updated pip installation for [ClearVoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice). Now you can simply type `pip install clearvoice` to use all the pretrained models in ClearVoice, see project description in PyPi [link](https://pypi.org/project/clearvoice/). -v /opt/hyhal:/opt/hyhal \
- [2025.4] Added a training script for speech super-resolution, supporting both retraining and fine-tuning of models. For details, refer to the documentation [here](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train/speech_super_resolution). --group-add video \
- [2025.4] Added data generation scripts for training/finetuning speech enhancement models. The scripts generate either noisy speech or noisy-reverberant speech. Please check [here](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train/data_generation/speech_enhancement). --cap-add=SYS_PTRACE \
- [2025.1] ClearVoice demo is ready for try on both [HuggingFace](https://huggingface.co/spaces/alibabasglab/ClearVoice) and [ModelScope](https://modelscope.cn/studios/iic/ClearerVoice-Studio). However, HuggingFace has limited GPU usage, and ModelScope has more GPU usage quota. --security-opt seccomp=unconfined \
- [2025.1] ClearVoice now offers **speech super-resolution**, also known as bandwidth extension. This feature improves the perceptual quality of speech by converting low-resolution audio (with an effective sampling rate of at least 16,000 Hz) into high-resolution audio with a sampling rate of 48,000 Hz. A full upscaled **LJSpeech-1.1-48kHz dataset** can be downloaded from [HuggingFace](https://huggingface.co/datasets/alibabasglab/LJSpeech-1.1-48kHz) and [ModelScope](https://modelscope.cn/datasets/iic/LJSpeech-1.1-48kHz). image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.5.1-ubuntu22.04-dtk25.04.2-py3.10 \
- [2025.1] ClearVoice now supports more audio formats including **"wav", "aac", "ac3", "aiff", "flac", "m4a", "mp3", "ogg", "opus", "wma", "webm"**, etc. It also supports both mono and stereo channels with 16-bit or 32-bit precisions. A latest version of [ffmpeg](https://github.com/FFmpeg/FFmpeg) is required for audio codecs. /bin/bash
- [2024.12] Upload pre-trained models on ModelScope. User now can download the models from either [ModelScope](https://www.modelscope.cn/models/iic/ClearerVoice-Studio/summary) or [Huggingface](https://huggingface.co/alibabasglab) ```
- [2024.11] Our FRCRN speech denoiser has been used over **3.0 million** times on [ModelScope](https://modelscope.cn/models/iic/speech_frcrn_ans_cirm_16k)
- [2024.11] Our MossFormer speech separator has been used over **2.5 million** times on [ModelScope](https://modelscope.cn/models/iic/speech_mossformer_separation_temporal_8k)
- [2024.11] Release of this repository ## 2. 下载并安装 ClearerVoice-Studio
### 🌟 Why Choose ClearerVoice-Studio? ```bash
git clone https://github.com/modelscope/ClearerVoice-Studio.git
- **Pre-Trained Models:** Includes cutting-edge pre-trained models, fine-tuned on extensive, high-quality datasets. No need to start from scratch! cd ClearerVoice-Studio
- **Ease of Use:** Designed for seamless integration with your projects, offering a simple yet flexible interface for inference and training. pip install -r requirements.txt
- **Comprehensive Features:** Combines advanced algorithms for multiple speech processing tasks in one platform.
- **Community-Driven:** Built for researchers, developers, and enthusiasts to collaborate and innovate together. cd clearvoice
pip install --editable .
## Contents of this repository cd ..
This repository is organized into three main components: **[ClearVoice](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice)**, **[Train](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train)**, and **[SpeechScore](https://github.com/modelscope/ClearerVoice-Studio/tree/main/speechscore)**. ```
### 1. **ClearVoice [[Readme](https://github.com/modelscope/ClearerVoice-Studio/blob/main/clearvoice/README.md)][[文档](https://github.com/modelscope/ClearerVoice-Studio/blob/main/clearvoice/README.md)]** ## 3. 快速测试
ClearVoice offers a user-friendly solution for speech processing tasks such as speech denoising, separation, super-resolution, audio-visual target speaker extraction, and more. It is designed as a unified inference platform leveraged pre-trained models (e.g., [FRCRN](https://arxiv.org/abs/2206.07293), [MossFormer](https://arxiv.org/abs/2302.11824)), all trained on extensive datasets. If you're looking for a tool to improve speech quality, ClearVoice is the perfect choice. Simply click on [`ClearVoice`](https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice) and follow our detailed instructions to get started.
```bash
### 2. **Train** # 设置 HuggingFace 国内镜像(强烈建议)
For advanced researchers and developers, we provide model finetune and training scripts for all the tasks offerred in ClearVoice and more: export HF_ENDPOINT=https://hf-mirror.com
- **Task 1: [Speech enhancement](train/speech_enhancement)** (16kHz & 48kHz) # 运行官方测试脚本
- **Task 2: [Speech separation](train/speech_separation)** (8kHz & 16kHz) python test.py
- **Task 2: [Speech super-resolution](https://github.com/modelscope/ClearerVoice-Studio/tree/main/train/speech_super_resolution)** (48kHz) ```
- **Task 4: [Target speaker extraction](train/target_speaker_extraction)**
- **Sub-Task 1: Audio-only Speaker Extraction Conditioned on a Reference Speech** (8kHz) 看到 `Inference completed successfully` 或类似输出即表示成功。
- **Sub-Task 2: Audio-visual Speaker Extraction Conditioned on Face (Lip) Recording** (16kHz)
- **Sub-Task 3: Audio-visual Speaker Extraction Conditioned on Body Gestures** (16kHz)
- **Sub-Task 4: Neuro-steered Speaker Extraction Conditioned on EEG Signals** (16kHz) ## 参考资料
Contributors are welcomed to include more model architectures and tasks! - 官方仓库:https://github.com/modelscope/ClearerVoice-Studio/tree/main/clearvoice
- 中文教程:https://stable-learn.com/zh/clearvoice-studio-tutorial
### 3. **SpeechScore [[Readme](https://github.com/modelscope/ClearerVoice-Studio/blob/main/speechscore/README.md)][[文档](https://github.com/modelscope/ClearerVoice-Studio/blob/main/speechscore/README.md)]**
<a href="https://github.com/modelscope/ClearerVoice-Studio/tree/main/speechscore">`SpeechScore`<a/> is a speech quality assessment toolkit. We include it here to evaluate different model performance. SpeechScore includes many popular speech metrics: ## 小贴士
- Signal-to-Noise Ratio (SNR) - 模型下载慢请务必设置 `HF_ENDPOINT=https://hf-mirror.com`
- Perceptual Evaluation of Speech Quality (PESQ)
- Short-Time Objective Intelligibility (STOI)
- Deep Noise Suppression Mean Opinion Score (DNSMOS)
- Scale-Invariant Signal-to-Distortion Ratio (SI-SDR)
- and many more quality benchmarks
## Contact
If you have any comments or questions about ClearerVoice-Studio, feel free to raise an issue in this repository or contact us directly at:
- email: {shengkui.zhao, zexu.pan}@alibaba-inc.com
Alternatively, welcome to join our DingTalk group to share and discuss algorithms, technology, and user experience feedback. You may scan the following QR codes to join our official chat group.
<p align="center">
<table>
<tr>
<td style="text-align:center;">
<a href="./asset/QR.jpg"><img alt="ClearVoice in DingTalk" src="https://img.shields.io/badge/ClearVoice-DingTalk-d9d9d9"></a>
</td>
</tr>
<tr>
<td style="text-align:center;">
<img alt="Light" src="./asset/dingtalk.png" width="68%" />
</td>
</tr>
</table>
</p>
## Friend Links
Checkout some awesome Github repositories from Speech Lab of Institute for Intelligent Computing, Alibaba Group.
<p align="center">
<a href="https://github.com/FunAudioLLM/InspireMusic" target="_blank">
<img alt="Demo" src="https://img.shields.io/badge/Repo | Space-InspireMusic?labelColor=&label=InspireMusic&color=green"></a>
<a href="https://github.com/modelscope/FunASR" target="_blank">
<img alt="Github" src="https://img.shields.io/badge/Repo | Space-FunASR?labelColor=&label=FunASR&color=green"></a>
<a href="https://github.com/FunAudioLLM" target="_blank">
<img alt="Demo" src="https://img.shields.io/badge/Repo | Space-FunAudioLLM?labelColor=&label=FunAudioLLM&color=green"></a>
<a href="https://github.com/modelscope/3D-Speaker" target="_blank">
<img alt="Demo" src="https://img.shields.io/badge/Repo | Space-3DSpeaker?labelColor=&label=3D-Speaker&color=green"></a>
</p>
## Acknowledge
ClearerVoice-Studio contains third-party components and code modified from some open-source repos, including: <br>
[Speechbrain](https://github.com/speechbrain/speechbrain), [ESPnet](https://github.com/espnet), [TalkNet-ASD
](https://github.com/TaoRuijie/TalkNet-ASD)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment