Commit 7e7942fc authored by wangwei990215's avatar wangwei990215
Browse files

initial commit

parent ab49eec5
# Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
![teaser](https://user-images.githubusercontent.com/50283958/172300924-157b8458-0e95-4b2e-b992-fc7927738146.png)
# Squeezeformer_tensorflow
## 论文
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
- https://arxiv.org/pdf/2206.00888
We provide testing codes for Squeezeformer, along with the pre-trained checkpoints.
## 模型结构
Squeezeformer 是在重新研究了Conformer的宏观和微观结构后,通过调整多头注意力、前馈模块等,实现了更低的WER,模型结构如图所示,左边是Conformer结构,右边则是改进后的Squeezeformer结构。<br>
![模型结构](./images/model_architecture.png)
Check out our [paper](https://arxiv.org/pdf/2206.00888.pdf) for more details.
## 算法原理
在宏观层面,Squeezeformer采用了:
- Temporal U-Net结构,减少了多头注意力模块在长序列上的成本。
- 更简单的多头注意力模块块结构或卷积模块块结构,然后是前反馈模块,而不是Conformer中提出的Macaron结构。
在微观层面,Squeezeformer进行了一下调整:
- 简化了卷积块中的激活。
- 消除了冗余的层规范化操作。
- 结合了一个有效的深度下采样层,用以有效地对输入信号进行下采样。
Squeezeformer is now supported at NVIDIA's [NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/intro.html#:~:text=NVIDIA%20NeMo%2C%20part%20of%20the,%2DSpeech%20(TTS)%20models.) library as well, along with the training recipes and scripts. Please check out [link](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/squeezeformer).
最终模型相比相同Flops的COnformer,取得了更低的词错误率(WER)。
## 环境配置
### Dcoker(方法一)
此处提供[光源](https://sourcefind.cn/#main-page)拉取镜像的地址与使用步骤:
## Install Squeezeformer
```sh
docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.13.1-ubuntu20.04-dtk24.04.2-py3.8
We recommend using Python version 3.8.
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
### 1. Install dependancies
pip install librosa
pip install PyYAML
```
### Dockerfile(方法二)
此处提供Dockerfile的使用方法:
```shell
cd ./docker
We support Tensorflow version of 2.5. Run the following commands depending on your target device type.
docker build --no-cache -t Squeezeformer:latest
* Running on CPUs: `pip install -e '.[tf2.5]'`
* Running on GPUs: `pip install -e '.[tf2.5-gpu]'`
docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash
### 2. Install CTC decoder
```bash
cd scripts
bash install_ctc_decoders.sh
```
### Anaconda(方法三)
关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
## Prepare Dataset
### 1. Download Librispeech
[Librispeech](https://ieeexplore.ieee.org/document/7178964) is a widely-used ASR benchmark that consists of 960hr speech corpus with text transcriptions.
The dataset consists of 3 training sets (`train-clean-100`, `train-clean-360`, `train-other-500`),
2 development sets (`dev-clean`, `dev-other`), and 2 test sets (`test-clean`, `test-other`).
```
DTK软件栈:dtk24,04,2
Python:3.8
tensorflow:2.13.1
```
Tips:以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应
Download the datasets from this [link](http://www.openslr.org/12) and untar them.
If this is for testing purposes only, you can skip the training datasets to save disk space.
You should have flac files under `{dataset_path}/LibriSpeech`.
## 数据集
官方代码在模型的训练和测试中使用的是LibriSpeech数据集。
- SCNet快速下载链接:
- [LibriSpeech_asr数据集下载](http://113.200.138.88:18080/aidatasets/librispeech_asr_dummy)
- 官方下载链接:
- [LibriSpeech_asr数据集官方下载](https://www.openslr.org/12)
### 2. Create Manifest Files
librisspeech是大约1000小时的16kHz英语阅读演讲语料库,数据来源于LibriVox项目的有声读物,并经过仔细分割和整理,其中的音频文件以flac格式存储,语音对应的文本转炉内容以txt格式存储。<br>
数据集的目录结构如下:
Once you download the datasets, you should create a manifest file that links the file path to the audio input and its transcription.
We use a script from [TensorFlowASR](https://github.com/TensorSpeech/TensorFlowASR).
```
LibriSpeech
├── train-clean-100
│ ├── 19
│ │ ├── 19-198
│ │ │ ├── 19-198-0000.flac
│ │ │ ├── 19-198-0001.flac
│ │ │ ├── 19-198-0002.flac
│ │ │ ├── 19-198-0003.flac
│ │ │ ├── ...
│ │ │ ├── 19-198.trans.txt
│ │ └── ...
│ └── ...
├── train-clean-360
├── train-other-500
├── dev-clean
├── dev-other
├── test-clean
└── test-othe
```
```bash
## 训练
### 创建Manifest文件
在训练之前,需要通过一下命令创建和数据集对应的Manifest文件,该文件包括数据集的文件路径和语音的转录文本
```sh
cd scripts
python create_librispeech_trans_all.py --data {dataset_path}/LibriSpeech --output {tsv_dir}
```
* The `dataset_path` is the directory that you untarred the datasets in the previous step.
* This script creates tsv files under `tsv_dir` that list the audio file path, duration, and the transcription.
* To skip processing the training datasets, use an additional argument `--mode test-only`.
If you have followed the instruction correctly, you should have the following files under `tsv_dir`.
* `dev_clean.tsv`, `dev_other.tsv`, `test_clean.tsv`, `test_other.tsv`
* `train_clean_100.tsv`, `train_clean_360.tsv`, `train_other_500.tsv` (if not `--mode test-only`)
* `train_other.tsv` that merges all training tsv files into one (if not `--mode test-only`)
- dataset_path是LibriSpeech数据集进行清理的目录。
- 此脚本在tsv_dir下创建tsv文件,其中列出音频文件路径、持续时间和转录文本。
- 如果要跳过处理训练数据集,请使用另一个参数 --mode test-only。
## Testing Squeezeformer
如果正确遵循了说明,应该会产生以下文件:
- dev_clean.tsv, dev_other.tsv, test_clean.tsv, test_other.tsv
- train_clean_100.tsv, train_clean_360.tsv, train_other_500.tsv (if not --mode test-only)
- train_other.tsv that merges all training tsv files into one (if not --mode test-only)
### 1. Download Pre-trained Checkpoints
We provide pre-trained checkpoints for all variants of Squeezeformer.
## 测试
### 使用预训练模型
所有Squeezeformer变种均提供了预先训练的checkpoint
| **Model** | **Checkpoint** | **test-clean** | **test-other** |
| :-----------------: | :---------------------------------------------------------------------------------------: | :------------: | :------------: |
......@@ -74,40 +113,23 @@ We provide pre-trained checkpoints for all variants of Squeezeformer.
| Squeezeformer-L | [link](https://drive.google.com/file/d/1LJua7A4ZMoZFi2cirf9AnYEl51pmC-m5/view?usp=sharing) | 2.47 | 5.97 |
### 2. Run Inference!
Run the following commands:
```bash
### 运行测试脚本
运行以下命令:
```
cd examples/squeezeformer
python test.py --bs {batch_size} --config configs/squeezeformer-S.yml --saved squeezeformer-S.h5 \
--dataset_path {tsv_dir} --dataset {dev_clean|dev_other|test_clean|test_other}
python test.py --bs {batch_size} --config configs/squeezeformer-S.yml --saved squeezeformer-S.h5 --dataset_path {tsv_dir} --dataset {dev_clean|dev_other|test_clean|test_other}
```
- tsv_dir是在上一步中创建的TSV清单文件的目录路径。
- 通过更改--config和--saved在其他Squeezeformer模型上进行测试,例如,Squeezeformer-L或Squeezeformer-M。
* `tsv_dir` is the directory path to the tsv manifest files that you created in the previous step.
* You can test on other Squeezeformer models by changing `--config` and `--saved`, e.g., Squeezeformer-L or Squeezeformer-M.
## External implementations
We are thankful to all the researchers who have extended Squeezeformer for different purposes.
| **Description** | **Checkpoint** |
| :-----------------------: | :----------------------------------------------: |
| PyTorch implementation | [link](https://github.com/upskyy/Squeezeformer) |
| NeMo | [link](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/squeezeformer) |
| WeNet | [link](https://github.com/wenet-e2e/wenet) |
## Citation
Squeezeformer has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:
```text
@article{kim2022squeezeformer,
title={Squeezeformer: An Efficient Transformer for Automatic Speech Recognition},
author={Kim, Sehoon and Gholami, Amir and Shaw, Albert and Lee, Nicholas and Mangalam, Karttikeya and Malik, Jitendra and Mahoney, Michael W and Keutzer, Kurt},
journal={arxiv:2206.00888},
year={2022}
}
```
## Copyright
## 应用场景
### 算法分类
语音识别
### 热点应用行业
语音识别、教育、医疗
THIS SOFTWARE AND/OR DATA WAS DEPOSITED IN THE BAIR OPEN RESEARCH COMMONS REPOSITORY ON 02/07/23.
## 源码仓库及问题反馈
https://developer.hpccube.com/codes/modelzoo/squeezeformer_tensorflow
## 参考资料
https://github.com/kssteven418/Squeezeformer
\ No newline at end of file
FROM image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.13.1-ubuntu20.04-dtk24.04.2-py3.8
RUN source/opt/dtk/env.sh
\ No newline at end of file
icon.png

68.4 KB

#模型编码
modelCode=1022
# 模型名称
modelName=Squeezeformer_tensorflow
# 模型描述
modelDescription=Squeezeformer 是在重新研究了Conformer的宏观和微观结构后,通过调整多头注意力、前馈模块等,实现了更低的WER的ASR模型,
# 应用场景(多个标签以英文逗号分割)
appScenario=语音识别,教育,医疗
# 框架类型(多个标签以英文逗号分割)
frameType=Tensorflow
# Copyright 2020 Huy Le Nguyen (@usimarit)
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import setuptools
with open("README.md", "r") as fh:
long_description = fh.read()
with open("requirements.txt", "r") as fr:
requirements = fr.read().splitlines()
setuptools.setup(
name="squeezeformer",
packages=setuptools.find_packages(include=["src*"]),
install_requires=requirements,
extras_require={
#"tf2.3": ["tensorflow>=2.3.0,<2.4", "tensorflow-text>2.3.0,<2.4", "tensorflow-io>=0.16.0,<0.17"],
#"tf2.3-gpu": ["tensorflow-gpu>=2.3.0,<2.4", "tensorflow-text>=2.3.0,<2.4", "tensorflow-io>=0.16.0,<0.17"],
#"tf2.4": ["tensorflow>=2.4.0,<2.5", "tensorflow-text>=2.4.0,<2.5", "tensorflow-io>=0.17.0,<0.18"],
#"tf2.4-gpu": ["tensorflow-gpu>=2.4.0,<2.5", "tensorflow-text>=2.4.0,<2.5", "tensorflow-io>=0.17.0,<0.18"],
"tf2.5": ["tensorflow>=2.5.0,<2.6", "tensorflow-text>=2.5.0,<2.6", "tensorflow-io>=0.18.0,<0.19"],
"tf2.5-gpu": ["tensorflow-gpu>=2.5.0,<2.6", "tensorflow-text>=2.5.0,<2.6", "tensorflow-io>=0.18.0,<0.19"]
},
classifiers=[
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Intended Audience :: Science/Research",
"Operating System :: POSIX :: Linux",
"License :: OSI Approved :: Apache Software License",
"Topic :: Software Development :: Libraries :: Python Modules"
],
python_requires='>=3.6',
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment