add readme

cc2d6a67 · Sugon_ldc · 764b3a75 · cc2d6a67 · cc2d6a67 · cc2d6a67
Commit cc2d6a67 authored Jun 17, 2023 by Sugon_ldc
Hide whitespace changes
Inline Side-by-side

Showing with 209 additions and 125 deletions

README.md README.md +76 -125

README_origin.md README_origin.md +125 -0

model.properties model.properties +8 -0

No files found.
--- a/README.md
+++ b/README.md
-# WeNet
+# Conformer_PyTorch
-[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
+## 模型介绍
-[![Python-Version](https://img.shields.io/badge/Python-3.7%7C3.8-brightgreen)](https://github.com/wenet-e2e/wenet)
+Conformer模型是一种基于自注意力机制（self-attention）的序列建模方法，被广泛应用于语音识别、语言建模、机器翻译、语义分析等自然语言处理任务中。本工程使用的是wenet工具包来调用conformer模型。
-[**Roadmap**](https://github.com/wenet-e2e/wenet/issues/1683)
-| [**Docs**](https://wenet-e2e.github.io/wenet)
+Wenet是一个开源的端到端语音识别（ASR）工具包，基于PyTorch实现，旨在提供一个简单、高效、灵活的ASR框架，帮助研究人员和开发者快速构建自己的语音识别系统。
-| [**Papers**](https://wenet-e2e.github.io/wenet/papers.html)
-| [**Runtime (x86)**](https://github.com/wenet-e2e/wenet/tree/main/runtime/libtorch)
+Wenet的核心架构是基于端到端的深度神经网络（DNN）和自注意力机制（self-attention）实现的，其中包含了多种不同的模型结构和技术，如Conformer、Transformer、LSTM-TDNN等，可根据不同的任务需求进行选择。此外，Wenet还提供了一系列的工具和接口，方便用户进行模型的训练、推理、评估等操作，支持多种硬件平台和操作系统，如CPU、GPU、FPGA等。
-| [**Runtime (android)**](https://github.com/wenet-e2e/wenet/tree/main/runtime/android)
-| [**Pretrained Models**](docs/pretrained_models.md)
+## 模型结构
-| [**HuggingFace**](https://huggingface.co/spaces/wenet/wenet_demo)
-**We** share neural **Net** together.
+## 数据集
-The main motivation of WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,
-to reduce the effort of productionizing E2E models, and to explore better E2E models for production.
+使用的数据集为Aishell，Aishell是北京壳牌壳牌科技有限公司发布的开源中文普通话语音语料库。来自中国不同口音地区的400人被邀请参加录音，这是在一个安静的室内环境中进行的，使用高保真麦克风，并将采样降至16kHz。人工抄写准确率95%以上，经过专业的语音标注和严格的质量检测。这些数据对学术使用是免费的。我们希望为语音识别领域的新研究者提供适量的数据。
-## :fire: News
+数据集下载地址：http://openslr.org/33/
-* 2022.12: Horizon X3 pi BPU, see https://github.com/wenet-e2e/wenet/pull/1597, Kunlun Core XPU, see https://github.com/wenet-e2e/wenet/pull/1455, Raspberry Pi, see https://github.com/wenet-e2e/wenet/pull/1477, IOS, see https://github.com/wenet-e2e/wenet/pull/1549.
-* 2022.11: TrimTail paper released, see https://arxiv.org/pdf/2211.00522.pdf
-* 2022.10: Squeezeformer is supported, see https://github.com/wenet-e2e/wenet/pull/1447.
+## 训练及推理
-* 2022.07: RNN-T is supported now, see [rnnt](https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/rnnt) for benchmark.
+### 环境配置
-## Highlights
+在光源可拉取训练的docker镜像，本工程推荐的镜像如下：
-* **Production first and production ready**: The core design principle of WeNet. WeNet provides full stack solutions for speech recognition.
-  * *Unified solution for streaming and non-streaming ASR*: [U2++ framework](https://arxiv.org/pdf/2203.15455.pdf)--develop, train, and deploy only once.
+docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:1.10.0-centos7.6-dtk-22.10-py38-latest
-  * *Runtime solution*: built-in server [x86](https://github.com/wenet-e2e/wenet/tree/main/runtime/libtorch) and on-device [android](https://github.com/wenet-e2e/wenet/tree/main/runtime/android) runtime solution.
-  * *Model exporting solution*: built-in solution to export model to LibTorch/ONNX for inference.
+进入镜像后需要安装所需的三方依赖
-  * *LM solution*: built-in production-level [LM solution](docs/lm.md).
-  * *Other production solutions*: built-in contextual biasing, time stamp, endpoint, and n-best solutions.
+```
+pip3 install typeguard==2.13.3
-* **Accurate**: WeNet achieves SOTA results on a lot of public speech datasets.
+```
-* **Light weight**: WeNet is easy to install, easy to use, well designed, and well documented.
+### 数据预处理
-## Performance Benchmark
+```
-Please see `examples/$dataset/s0/README.md` for benchmark on different speech datasets.
+cd ./examples/aishell/s0
+#设置stage为0会自动下载数据集，若有下载好的数据集，可手动设置run.sh脚本中的data路径即可省去下载过程
-## Installation(Python Only)
+bash run.sh --stage 0 --stop_stage 0
-If you just want to use WeNet as a python package for speech recognition application,
+bash run.sh --stage 1 --stop_stage 1
-just install it by `pip`, please note python 3.6+ is required.
-``` sh
+bash run.sh --stage 2 --stop_stage 2
-pip3 install wenetruntime
-```
+bash run.sh --stage 3 --stop_stage 3
+```
-And please see [doc](runtime/binding/python/README.md) for usage.
+### 训练
-## Installation(Training and Developing)
+```
+bash train.sh
- Clone the repo
+```
-``` sh
-git clone https://github.com/wenet-e2e/wenet.git
+### 推理
-```
+训练结束后，模型会保存在exp/conformer/final.pt路径下，可以直接执行如下指令查看推理结果（若需要使用其他预训练模型，请手动修改）
- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
- Create Conda env:
+```
+bash validate.sh
-``` sh
+```
-conda create -n wenet python=3.8
-conda activate wenet
+## 模型精度数据
-pip install -r requirements.txt
-conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
+| 卡数 |  精度   |
-```
+| :--: | :-----: |
+| 4卡  | 93.1294 |
- Optionally, if you want to use x86 runtime or language model(LM),
-you have to build the runtime as follows. Otherwise, you can just ignore this step.
-``` sh
+## 源码仓库及问题反馈
-# runtime build requires cmake 3.14 or above
-cd runtime/libtorch
+http://developer.hpccube.com/codes/modelzoo/conformer_pytorch.git
-mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .
-```
-## Discussion & Communication
-Please visit [Discussions](https://github.com/wenet-e2e/wenet/discussions) for further discussion.
-For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet.
-We created a WeChat group for better discussion and quicker response.
-Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.
-If you can not access the QR image, please access it on [gitee](https://gitee.com/robin1001/qr/tree/master).
-| <img src="https://github.com/robin1001/qr/blob/master/wenet.jpeg" width="250px"> | <img src="https://github.com/robin1001/qr/blob/master/binbin.jpeg" width="250px"> |
-| ---- | ---- |
-Or you can directly discuss on [Github Issues](https://github.com/wenet-e2e/wenet/issues).
-## Contributors
-| <a href="https://www.chumenwenwen.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/chumenwenwen.png" width="250px"></a> | <a href="http://lxie.npu-aslp.org" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/colleges/nwpu.png" width="250px"></a> | <a href="http://www.aishelltech.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/aishelltech.png" width="250px"></a> | <a href="http://www.ximalaya.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/ximalaya.png" width="250px"></a> | <a href="https://www.jd.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/jd.jpeg" width="250px"></a> |
-| ---- | ---- | ---- | ---- | ---- |
-| <a href="https://horizon.ai" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/hobot.png" width="250px"></a> | <a href="https://thuhcsi.github.io" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/colleges/thu.png" width="250px"></a> | <a href="https://www.nvidia.com/en-us" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/nvidia.png" width="250px"></a> | | | |
-## Acknowledge
-1. We borrowed a lot of code from [ESPnet](https://github.com/espnet/espnet) for transformer based modeling.
-2. We borrowed a lot of code from [Kaldi](http://kaldi-asr.org/) for WFST based decoding for LM integration.
-3. We referred [EESEN](https://github.com/srvk/eesen) for building TLG based graph for LM integration.
-4. We referred to [OpenTransformer](https://github.com/ZhengkunTian/OpenTransformer/) for python batch inference of e2e models.
-## Citations
-``` bibtex
-@inproceedings{yao2021wenet,
-  title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
-  author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
-  booktitle={Proc. Interspeech},
-  year={2021},
-  address={Brno, Czech Republic },
-  organization={IEEE}
-}
-@article{zhang2022wenet,
-  title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},
-  author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},
-  journal={arXiv preprint arXiv:2203.15455},
-  year={2022}
-}
-```
--- a/README_origin.md
+++ b/README_origin.md
+# WeNet
+[![License](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Python-Version](https://img.shields.io/badge/Python-3.7%7C3.8-brightgreen)](https://github.com/wenet-e2e/wenet)
+[**Roadmap**](https://github.com/wenet-e2e/wenet/issues/1683)
+| [**Docs**](https://wenet-e2e.github.io/wenet)
+| [**Papers**](https://wenet-e2e.github.io/wenet/papers.html)
+| [**Runtime (x86)**](https://github.com/wenet-e2e/wenet/tree/main/runtime/libtorch)
+| [**Runtime (android)**](https://github.com/wenet-e2e/wenet/tree/main/runtime/android)
+| [**Pretrained Models**](docs/pretrained_models.md)
+| [**HuggingFace**](https://huggingface.co/spaces/wenet/wenet_demo)
+**We** share neural **Net** together.
+The main motivation of WeNet is to close the gap between research and production end-to-end (E2E) speech recognition models,
+to reduce the effort of productionizing E2E models, and to explore better E2E models for production.
+## :fire: News
+* 2022.12: Horizon X3 pi BPU, see https://github.com/wenet-e2e/wenet/pull/1597, Kunlun Core XPU, see https://github.com/wenet-e2e/wenet/pull/1455, Raspberry Pi, see https://github.com/wenet-e2e/wenet/pull/1477, IOS, see https://github.com/wenet-e2e/wenet/pull/1549.
+* 2022.11: TrimTail paper released, see https://arxiv.org/pdf/2211.00522.pdf
+* 2022.10: Squeezeformer is supported, see https://github.com/wenet-e2e/wenet/pull/1447.
+* 2022.07: RNN-T is supported now, see [rnnt](https://github.com/wenet-e2e/wenet/tree/main/examples/aishell/rnnt) for benchmark.
+## Highlights
+* **Production first and production ready**: The core design principle of WeNet. WeNet provides full stack solutions for speech recognition.
+  * *Unified solution for streaming and non-streaming ASR*: [U2++ framework](https://arxiv.org/pdf/2203.15455.pdf)--develop, train, and deploy only once.
+  * *Runtime solution*: built-in server [x86](https://github.com/wenet-e2e/wenet/tree/main/runtime/libtorch) and on-device [android](https://github.com/wenet-e2e/wenet/tree/main/runtime/android) runtime solution.
+  * *Model exporting solution*: built-in solution to export model to LibTorch/ONNX for inference.
+  * *LM solution*: built-in production-level [LM solution](docs/lm.md).
+  * *Other production solutions*: built-in contextual biasing, time stamp, endpoint, and n-best solutions.
+* **Accurate**: WeNet achieves SOTA results on a lot of public speech datasets.
+* **Light weight**: WeNet is easy to install, easy to use, well designed, and well documented.
+## Performance Benchmark
+Please see `examples/$dataset/s0/README.md` for benchmark on different speech datasets.
+## Installation(Python Only)
+If you just want to use WeNet as a python package for speech recognition application,
+just install it by `pip`, please note python 3.6+ is required.
+``` sh
+pip3 install wenetruntime
+```
+And please see [doc](runtime/binding/python/README.md) for usage.
+## Installation(Training and Developing)
+- Clone the repo
+``` sh
+git clone https://github.com/wenet-e2e/wenet.git
+```
+- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
+- Create Conda env:
+``` sh
+conda create -n wenet python=3.8
+conda activate wenet
+pip install -r requirements.txt
+conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
+```
+- Optionally, if you want to use x86 runtime or language model(LM),
+you have to build the runtime as follows. Otherwise, you can just ignore this step.
+``` sh
+# runtime build requires cmake 3.14 or above
+cd runtime/libtorch
+mkdir build && cd build && cmake -DGRAPH_TOOLS=ON .. && cmake --build .
+```
+## Discussion & Communication
+Please visit [Discussions](https://github.com/wenet-e2e/wenet/discussions) for further discussion.
+For Chinese users, you can aslo scan the QR code on the left to follow our offical account of WeNet.
+We created a WeChat group for better discussion and quicker response.
+Please scan the personal QR code on the right, and the guy is responsible for inviting you to the chat group.
+If you can not access the QR image, please access it on [gitee](https://gitee.com/robin1001/qr/tree/master).
+| <img src="https://github.com/robin1001/qr/blob/master/wenet.jpeg" width="250px"> | <img src="https://github.com/robin1001/qr/blob/master/binbin.jpeg" width="250px"> |
+| ---- | ---- |
+Or you can directly discuss on [Github Issues](https://github.com/wenet-e2e/wenet/issues).
+## Contributors
+| <a href="https://www.chumenwenwen.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/chumenwenwen.png" width="250px"></a> | <a href="http://lxie.npu-aslp.org" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/colleges/nwpu.png" width="250px"></a> | <a href="http://www.aishelltech.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/aishelltech.png" width="250px"></a> | <a href="http://www.ximalaya.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/ximalaya.png" width="250px"></a> | <a href="https://www.jd.com" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/jd.jpeg" width="250px"></a> |
+| ---- | ---- | ---- | ---- | ---- |
+| <a href="https://horizon.ai" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/hobot.png" width="250px"></a> | <a href="https://thuhcsi.github.io" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/colleges/thu.png" width="250px"></a> | <a href="https://www.nvidia.com/en-us" target="_blank"><img src="https://raw.githubusercontent.com/wenet-e2e/wenet-contributors/main/companies/nvidia.png" width="250px"></a> | | | |
+## Acknowledge
+1. We borrowed a lot of code from [ESPnet](https://github.com/espnet/espnet) for transformer based modeling.
+2. We borrowed a lot of code from [Kaldi](http://kaldi-asr.org/) for WFST based decoding for LM integration.
+3. We referred [EESEN](https://github.com/srvk/eesen) for building TLG based graph for LM integration.
+4. We referred to [OpenTransformer](https://github.com/ZhengkunTian/OpenTransformer/) for python batch inference of e2e models.
+## Citations
+``` bibtex
+@inproceedings{yao2021wenet,
+  title={WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit},
+  author={Yao, Zhuoyuan and Wu, Di and Wang, Xiong and Zhang, Binbin and Yu, Fan and Yang, Chao and Peng, Zhendong and Chen, Xiaoyu and Xie, Lei and Lei, Xin},
+  booktitle={Proc. Interspeech},
+  year={2021},
+  address={Brno, Czech Republic },
+  organization={IEEE}
+}
+@article{zhang2022wenet,
+  title={WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit},
+  author={Zhang, Binbin and Wu, Di and Peng, Zhendong and Song, Xingchen and Yao, Zhuoyuan and Lv, Hang and Xie, Lei and Yang, Chao and Pan, Fuping and Niu, Jianwei},
+  journal={arXiv preprint arXiv:2203.15455},
+  year={2022}
+}
+```
--- a/model.properties
+++ b/model.properties
+# 模型名称
+modelName=LPR
+# 模型描述
+modelDescription=LPR是一个基于深度学习技术的车牌识别模型，主要识别目标是自然场景的车牌图像
+# 应用场景(多个标签以英文逗号分割)
+appScenario=OCR,车牌识别,目标检测,训练,推理,pretrain,train,inference
+# 框架类型(多个标签以英文逗号分割)
+frameType=PyTorch,Migraphx,ONNXRuntime