initial commit

7e7942fc · wangwei990215 · ab49eec5 · 7e7942fc · 7e7942fc · 7e7942fc
Commit 7e7942fc authored Oct 02, 2024 by wangwei990215
7 changed files
--- a/README.md
+++ b/README.md
-# Squeezeformer:  An Efficient Transformer for Automatic Speech Recognition
-![teaser](https://user-images.githubusercontent.com/50283958/172300924-157b8458-0e95-4b2e-b992-fc7927738146.png)
+# Squeezeformer_tensorflow

+## 论文
+Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
+- https://arxiv.org/pdf/2206.00888

-We provide testing codes for Squeezeformer, along with the pre-trained checkpoints.
+## 模型结构
+Squeezeformer 是在重新研究了Conformer的宏观和微观结构后，通过调整多头注意力、前馈模块等，实现了更低的WER，模型结构如图所示，左边是Conformer结构，右边则是改进后的Squeezeformer结构。<br>
+![模型结构](./images/model_architecture.png)

-Check out our [paper](https://arxiv.org/pdf/2206.00888.pdf) for more details.
+## 算法原理
+在宏观层面，Squeezeformer采用了：
+- Temporal U-Net结构，减少了多头注意力模块在长序列上的成本。
+- 更简单的多头注意力模块块结构或卷积模块块结构，然后是前反馈模块，而不是Conformer中提出的Macaron结构。

+在微观层面，Squeezeformer进行了一下调整：
+- 简化了卷积块中的激活。
+- 消除了冗余的层规范化操作。
+- 结合了一个有效的深度下采样层，用以有效地对输入信号进行下采样。

-Squeezeformer is now supported at NVIDIA's  [NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/intro.html#:~:text=NVIDIA%20NeMo%2C%20part%20of%20the,%2DSpeech%20(TTS)%20models.) library as well, along with the training recipes and scripts. Please check out [link](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/squeezeformer).
+最终模型相比相同Flops的COnformer，取得了更低的词错误率(WER)。

+## 环境配置
+### Dcoker（方法一）
+此处提供[光源](https://sourcefind.cn/#main-page)拉取镜像的地址与使用步骤:

-## Install Squeezeformer
+```sh
+docker pull image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.13.1-ubuntu20.04-dtk24.04.2-py3.8

-We recommend using Python version 3.8.  
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

-### 1. Install dependancies
+pip install librosa
+pip install PyYAML
+```
+### Dockerfile（方法二）
+此处提供Dockerfile的使用方法：
+```shell
+cd ./docker

-We support Tensorflow version of 2.5. Run the following commands depending on your target device type.
+docker build --no-cache -t Squeezeformer:latest

-* Running on CPUs: `pip install -e '.[tf2.5]'`
-* Running on GPUs: `pip install -e '.[tf2.5-gpu]'`
+docker run -it -v /path/your_code_data/:/path/your_code_data/ -v /opt/hyhal:/opt/hyhal:ro --shm-size=32G --privileged=true --device=/dev/kfd --device=/dev/dri/ --group-add video --name docker_name imageID bash

-### 2. Install CTC decoder 
-```bash
-cd scripts
-bash install_ctc_decoders.sh
 ```
+### Anaconda（方法三）
+关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装： https://developer.hpccube.com/tool/

-## Prepare Dataset
-
-### 1. Download Librispeech
-
-[Librispeech](https://ieeexplore.ieee.org/document/7178964) is a widely-used ASR benchmark that consists of 960hr speech corpus with text transcriptions.
-The dataset consists of 3 training sets (`train-clean-100`, `train-clean-360`, `train-other-500`), 
-2 development sets (`dev-clean`, `dev-other`), and 2 test sets (`test-clean`, `test-other`).
+```
+DTK软件栈：dtk24,04,2
+Python：3.8
+tensorflow：2.13.1
+```
+Tips：以上dtk软件栈、python、torch等DCU相关工具版本需要严格一一对应

-Download the datasets from this [link](http://www.openslr.org/12) and untar them.
-If this is for testing purposes only, you can skip the training datasets to save disk space.
-You should have flac files under `{dataset_path}/LibriSpeech`.
+## 数据集
+官方代码在模型的训练和测试中使用的是LibriSpeech数据集。
+- SCNet快速下载链接：
+    - [LibriSpeech_asr数据集下载](http://113.200.138.88:18080/aidatasets/librispeech_asr_dummy)
+- 官方下载链接：
+    - [LibriSpeech_asr数据集官方下载](https://www.openslr.org/12)

-### 2. Create Manifest Files
+librisspeech是大约1000小时的16kHz英语阅读演讲语料库，数据来源于LibriVox项目的有声读物，并经过仔细分割和整理，其中的音频文件以flac格式存储，语音对应的文本转炉内容以txt格式存储。<br>
+数据集的目录结构如下：

-Once you download the datasets, you should create a manifest file that links the file path to the audio input and its transcription.
-We use a script from [TensorFlowASR](https://github.com/TensorSpeech/TensorFlowASR).
+```
+LibriSpeech
+├── train-clean-100
+│   ├── 19
+│   │   ├── 19-198
+│   │   │   ├── 19-198-0000.flac
+│   │   │   ├── 19-198-0001.flac
+│   │   │   ├── 19-198-0002.flac
+│   │   │   ├── 19-198-0003.flac
+│   │   │   ├── ...
+│   │   │   ├── 19-198.trans.txt
+│   │   └── ...
+│   └── ...
+├── train-clean-360
+├── train-other-500
+├── dev-clean
+├── dev-other
+├── test-clean
+└── test-othe
+```

-```bash
+## 训练
+### 创建Manifest文件
+在训练之前，需要通过一下命令创建和数据集对应的Manifest文件，该文件包括数据集的文件路径和语音的转录文本
+```sh
 cd scripts
 python create_librispeech_trans_all.py --data {dataset_path}/LibriSpeech --output {tsv_dir}
 ```
-* The `dataset_path` is the directory that you untarred the datasets in the previous step.
-* This script creates tsv files under `tsv_dir` that list the audio file path, duration, and the transcription.
-* To skip processing the training datasets, use an additional argument `--mode test-only`.
-
-If you have followed the instruction correctly, you should have the following files under `tsv_dir`.
-* `dev_clean.tsv`, `dev_other.tsv`, `test_clean.tsv`, `test_other.tsv`
-* `train_clean_100.tsv`, `train_clean_360.tsv`, `train_other_500.tsv` (if not `--mode test-only`)
-* `train_other.tsv` that merges all training tsv files into one (if not `--mode test-only`)
-
+- dataset_path是LibriSpeech数据集进行清理的目录。
+- 此脚本在tsv_dir下创建tsv文件，其中列出音频文件路径、持续时间和转录文本。
+- 如果要跳过处理训练数据集，请使用另一个参数 --mode test-only。

-## Testing Squeezeformer
+如果正确遵循了说明，应该会产生以下文件：
+- dev_clean.tsv, dev_other.tsv, test_clean.tsv, test_other.tsv
+- train_clean_100.tsv, train_clean_360.tsv, train_other_500.tsv (if not --mode test-only)
+- train_other.tsv that merges all training tsv files into one (if not --mode test-only)

-### 1. Download Pre-trained Checkpoints
-
-We provide pre-trained checkpoints for all variants of Squeezeformer.
+## 测试
+### 使用预训练模型
+所有Squeezeformer变种均提供了预先训练的checkpoint

 |      **Model**      |                                                  **Checkpoint**                            | **test-clean** | **test-other** |
 | :-----------------: | :---------------------------------------------------------------------------------------:  | :------------: | :------------: |
@@ -74,40 +113,23 @@ We provide pre-trained checkpoints for all variants of Squeezeformer.
 |  Squeezeformer-L    | [link](https://drive.google.com/file/d/1LJua7A4ZMoZFi2cirf9AnYEl51pmC-m5/view?usp=sharing) |    2.47        |      5.97      |


-### 2. Run Inference!
-
-Run the following commands:
-```bash
+### 运行测试脚本
+运行以下命令：
+```
 cd examples/squeezeformer
-python test.py --bs {batch_size} --config configs/squeezeformer-S.yml --saved squeezeformer-S.h5 \
-    --dataset_path {tsv_dir} --dataset {dev_clean|dev_other|test_clean|test_other}
+python test.py --bs {batch_size} --config configs/squeezeformer-S.yml --saved squeezeformer-S.h5 --dataset_path {tsv_dir} --dataset {dev_clean|dev_other|test_clean|test_other}
 ```
+- tsv_dir是在上一步中创建的TSV清单文件的目录路径。
+- 通过更改--config和--saved在其他Squeezeformer模型上进行测试，例如，Squeezeformer-L或Squeezeformer-M。

-* `tsv_dir` is the directory path to the tsv manifest files that you created in the previous step.
-* You can test on other Squeezeformer models by changing `--config` and `--saved`, e.g., Squeezeformer-L or Squeezeformer-M.
-
-## External implementations 
-We are thankful to all the researchers who have extended Squeezeformer for different purposes.
-
-|      **Description**      | **Checkpoint**                                    | 
-| :-----------------------: | :----------------------------------------------:  |
-|  PyTorch implementation   | [link](https://github.com/upskyy/Squeezeformer)   | 
-|  NeMo                     | [link](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf/squeezeformer)   | 
-|  WeNet                    | [link](https://github.com/wenet-e2e/wenet) |
-
-
-## Citation
-Squeezeformer has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:
-
-```text
-@article{kim2022squeezeformer,
-  title={Squeezeformer: An Efficient Transformer for Automatic Speech Recognition},
-  author={Kim, Sehoon and Gholami, Amir and Shaw, Albert and Lee, Nicholas and Mangalam, Karttikeya and Malik, Jitendra and Mahoney, Michael W and Keutzer, Kurt},
-  journal={arxiv:2206.00888},
-  year={2022}
-}
-```

-## Copyright
+## 应用场景
+### 算法分类
+语音识别
+### 热点应用行业
+语音识别、教育、医疗

-THIS SOFTWARE AND/OR DATA WAS DEPOSITED IN THE BAIR OPEN RESEARCH COMMONS REPOSITORY ON 02/07/23.
+## 源码仓库及问题反馈
+https://developer.hpccube.com/codes/modelzoo/squeezeformer_tensorflow
+## 参考资料
+https://github.com/kssteven418/Squeezeformer
\ No newline at end of file
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
+FROM image.sourcefind.cn:5000/dcu/admin/base/tensorflow:2.13.1-ubuntu20.04-dtk24.04.2-py3.8
+RUN source/opt/dtk/env.sh
\ No newline at end of file
--- a/icon.png
+++ b/icon.png
--- a/images/model_architecture.png
+++ b/images/model_architecture.png
--- a/model.properties
+++ b/model.properties
+#模型编码
+modelCode=1022
+# 模型名称
+modelName=Squeezeformer_tensorflow
+# 模型描述
+modelDescription=Squeezeformer 是在重新研究了Conformer的宏观和微观结构后，通过调整多头注意力、前馈模块等，实现了更低的WER的ASR模型，
+# 应用场景(多个标签以英文逗号分割)
+appScenario=语音识别,教育,医疗
+# 框架类型(多个标签以英文逗号分割)
+frameType=Tensorflow
--- a/requirements.txt
+++ b/requirements.txt
-cython
-numpy
-scipy
-sklearn
-pandas
-tensorflow-datasets>=4.2.0
-tensorflow-addons>=0.11.1
-setuptools>=47.1.1
-librosa>=0.8.0
-soundfile>=0.10.3
-PyYAML>=5.3.1
-matplotlib>=3.2.1
-sox>=1.4.1
-tqdm>=4.54.1
-colorama>=0.4.4
-nlpaug>=1.1.1
-nltk>=3.5
-sentencepiece>=0.1.94
-wandb
-tensorflow_probability
-tensorflow_io==0.18
-google-cloud-storage
-cloud-tpu-client
-datasets
-jiwer
--- a/setup.py
+++ b/setup.py
-# Copyright 2020 Huy Le Nguyen (@usimarit)
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import setuptools
-
-with open("README.md", "r") as fh:
-    long_description = fh.read()
-
-with open("requirements.txt", "r") as fr:
-    requirements = fr.read().splitlines()
-
-setuptools.setup(
-    name="squeezeformer",
-    packages=setuptools.find_packages(include=["src*"]),
-    install_requires=requirements,
-    extras_require={
-        #"tf2.3": ["tensorflow>=2.3.0,<2.4", "tensorflow-text>2.3.0,<2.4", "tensorflow-io>=0.16.0,<0.17"],
-        #"tf2.3-gpu": ["tensorflow-gpu>=2.3.0,<2.4", "tensorflow-text>=2.3.0,<2.4", "tensorflow-io>=0.16.0,<0.17"],
-        #"tf2.4": ["tensorflow>=2.4.0,<2.5", "tensorflow-text>=2.4.0,<2.5", "tensorflow-io>=0.17.0,<0.18"],
-        #"tf2.4-gpu": ["tensorflow-gpu>=2.4.0,<2.5", "tensorflow-text>=2.4.0,<2.5", "tensorflow-io>=0.17.0,<0.18"],
-        "tf2.5": ["tensorflow>=2.5.0,<2.6", "tensorflow-text>=2.5.0,<2.6", "tensorflow-io>=0.18.0,<0.19"],
-        "tf2.5-gpu": ["tensorflow-gpu>=2.5.0,<2.6", "tensorflow-text>=2.5.0,<2.6", "tensorflow-io>=0.18.0,<0.19"]
-    },
-    classifiers=[
-        "Programming Language :: Python :: 3.6",
-        "Programming Language :: Python :: 3.7",
-        "Programming Language :: Python :: 3.8",
-        "Intended Audience :: Science/Research",
-        "Operating System :: POSIX :: Linux",
-        "License :: OSI Approved :: Apache Software License",
-        "Topic :: Software Development :: Libraries :: Python Modules"
-    ],
-    python_requires='>=3.6',
-)