Commit 7d3e1727 authored by wangwei990215's avatar wangwei990215
Browse files

initial commit

parent 612ef0b3
# Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition
# EfficientConformer_pytorch
Official implementation of the Efficient Conformer, progressively downsampled Conformer with grouped attention for Automatic Speech Recognition.
## 论文
Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition
- https://arxiv.org/abs/2109.01163
**Efficient Conformer [Paper](https://arxiv.org/abs/2109.01163) | [Demo Notebook](https://colab.research.google.com/github/burchim/EfficientConformer/blob/master/EfficientConformer.ipynb)**
## 模型结构
Efficient Conformer 是在Conformer的基础上提出的,其目的是在有限的计算预算下降低Conformer体系结构的复杂性,从而得到一种更为有效的体系结构。模型结构如图所示:<br>
![模型结构](./images/model_architecture.png)
## Efficient Conformer Encoder
Inspired from previous works done in Automatic Speech Recognition and Computer Vision, the Efficient Conformer encoder is composed of three encoder stages where each stage comprises a number of Conformer blocks using grouped attention. The encoded sequence is progressively downsampled and projected to wider feature dimensions, lowering the amount of computation while achieving better performance. Grouped multi-head attention reduce attention complexity by grouping neighbouring time elements along the feature dimension before applying scaled dot-product attention.
## 算法原理
模型在Conformer的基础上,将渐进式下采样引入到Conformer的编码器中,并提出了一种名为分组注意力的新型注意力机制,使得序列长度为n,特征维度为d,分组数为g的情况下将复杂度从 $O(n^2d)$ 降低到 $O(n^2d/g)$ 。其中分组注意力的计算方式如图所示:
![分组注意力](./images/algorithm.png)
<img src="media/EfficientConformer.jpg" width="50%"/>
## 环境配置
### Dcoker(方法一)
此处提供[光源](https://sourcefind.cn/#main-page)拉取镜像的地址与使用步骤:
## Installation
Clone GitHub repository and set up environment
```
git clone https://github.com/burchim/EfficientConformer.git
cd EfficientConformer
pip install -r requirements.txt
```sh
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.8
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
pip install ctcdecode
pip install warp-rnnt===0.5.0
```
### Dockerfile(方法二)
Install [ctcdecode](https://github.com/parlance/ctcdecode)
此处提供Dockerfile的使用方法:
```shell
cd ./docker
## Download LibriSpeech
docker build --no-cache -t EfficientConformer:latest
[Librispeech](https://www.openslr.org/12) is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
```
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
```
cd datasets
./download_LibriSpeech.sh
DTK软件栈:dtk24,04,1
Python:3.8
touch:2.1.0
```
Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
## Running an experiment
## 数据集
官方代码在模型的训练和测试中使用的是LibriSpeech数据集。
- SCNet快速下载链接:
- [LibriSpeech_asr数据集下载](http://113.200.138.88:18080/aidatasets/librispeech_asr_dummy)
- 官方下载链接:
- [LibriSpeech_asr数据集官方下载](https://www.openslr.org/12)
You can run an experiment by providing a config file using the '--config_file' flag. Training checkpoints and logs will be saved in the callback folder specified in the config file. Note that '--prepare_dataset' and '--create_tokenizer' flags may be needed for your first experiment.
librisspeech是大约1000小时的16kHz英语阅读演讲语料库,数据来源于LibriVox项目的有声读物,并经过仔细分割和整理,其中的音频文件以flac格式存储,语音对应的文本转炉内容以txt格式存储。<br>
数据集的目录结构如下:
```
python main.py --config_file configs/config_file.json
LibriSpeech
├── train-clean-100
│ ├── 19
│ │ ├── 19-198
│ │ │ ├── 19-198-0000.flac
│ │ │ ├── 19-198-0001.flac
│ │ │ ├── 19-198-0002.flac
│ │ │ ├── 19-198-0003.flac
│ │ │ ├── ...
│ │ │ ├── 19-198.trans.txt
│ │ └── ...
│ └── ...
├── train-clean-360
├── train-other-500
├── dev-clean
├── dev-other
├── test-clean
└── test-othe
```
## Evaluation
Models can be evaluated by selecting a subset validation/test mode and by providing the epoch/name of the checkpoint to load for evaluation with the '--initial_epoch' flag. The '--gready' flag designates whether to use gready search or beam search decoding for evaluation.
## 训练
可以通过main.py的config_file参数提供一个配置文件来运行一个实验。训练checkpoint和日志将保存在配置文件中指定的文件夹中。另外,prepare_dataset和create_tokenizer参数可能需要用于第一次实验。
```shell
python main.py --config_file configs/config_file.json
```
### 监控训练
```
python main.py --config_file configs/config_file.json --initial_epoch epoch/name --mode validation/test --gready
tensorboard --logdir callback_path
```
## Options
## 测试
可以通过选择验证/测试模式,并通过使用initial_epoch参数提供要加载的epoch或者checkpoint的名称来评估模型。gready参数用于指定是使用gredy搜索还是波束搜索解码进行求值。
## main.py 全部设置
```
-c / --config_file type=str default="configs/EfficientConformerCTCSmall.json" help="Json configuration file containing model hyperparameters"
-m / --mode type=str default="training" help="Mode : training, validation-clean, test-clean, eval_time-dev-clean, ..."
......@@ -77,27 +121,13 @@ python main.py --config_file configs/config_file.json --initial_epoch epoch/name
--profiler action="store_true" help="Enable eval time profiler"
```
## Monitor training
```
tensorboard --logdir callback_path
```
<img src="media/logs.jpg"/>
## LibriSpeech Performance
| Model | Size | Type | Params (M) | test-clean/test-other gready WER (%)| test-clean/test-other n-gram WER (%) | GPUs |
| :-------------------: |:--------: |:-----:|:----------:|:------:|:------:|:------:|
| [Efficient Conformer](https://drive.google.com/drive/folders/1Dqu1RTHQ8jxGxEPar2-WMjR0hRhmkpoU?usp=sharing) | Small | CTC | 13.2 | 3.6 / 9.0 | 2.7 / 6.7 | 4 x RTX 2080 Ti |
| [Efficient Conformer](https://drive.google.com/drive/folders/1uaDQWdZZEfq8sbq0u8w6hnFpUqEyjH9S?usp=sharing) | Medium | CTC | 31.5 | 3.0 / 7.6 | 2.4 / 5.8 | 4 x RTX 2080 Ti |
| [Efficient Conformer](https://drive.google.com/drive/folders/1NyxiVNsR7qyLGeIYMOchu9JPiTLFlsoj?usp=sharing) | Large | CTC | 125.6 | 2.5 / 5.8 | 2.1 / 4.7 | 4 x RTX 3090 |
## Reference
[Maxime Burchi, Valentin Vielzeuf. Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition.](https://arxiv.org/abs/2109.01163)
<br><br>
## 应用场景
### 算法分类
语音识别
### 热点应用行业
语音识别、教育、医疗
## Author
* Maxime Burchi [@burchim](https://github.com/burchim)
* Contact: [maxime.burchi@gmail.com](mailto:maxime.burchi@gmail.com)
## 源码仓库集问题反馈
https://developer.hpccube.com/codes/modelzoo/efficientconformer_pytorch
## 参考资料
https://github.com/burchim/EfficientConformer
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.1.0-ubuntu20.04-dtk24.04.1-py3.8
RUN source/opt/dtk/env.sh
\ No newline at end of file
#模型编码
modelCode=1021
# 模型名称
modelName=EfficientConformer_pytorch
# 模型描述
modelDescription=Efficient Conformer 是在Conformer的基础上提出的,其目的是在有限的计算预算下降低Conformer体系结构的复杂性,从而得到一种更为有效的体系结构。
# 应用场景(多个标签以英文逗号分割)
appScenario=语音识别,教育,医疗
# 框架类型(多个标签以英文逗号分割)
frameType=PyTorch
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment