Commit 0c7e3676 authored by zhanggzh's avatar zhanggzh
Browse files

add audio src code

parent e8cbe177
torchaudio: an audio library for PyTorch
========================================
# <div align="center"><strong>TorchAudio</strong></div>
[![Documentation](https://img.shields.io/badge/dynamic/json.svg?label=docs&url=https%3A%2F%2Fpypi.org%2Fpypi%2Ftorchaudio%2Fjson&query=%24.info.version&colorB=brightgreen&prefix=v)](https://pytorch.org/audio/main/)
[![Anaconda Badge](https://anaconda.org/pytorch/torchaudio/badges/downloads.svg)](https://anaconda.org/pytorch/torchaudio)
[![Anaconda-Server Badge](https://anaconda.org/pytorch/torchaudio/badges/platforms.svg)](https://anaconda.org/pytorch/torchaudio)
## 简介
![TorchAudio Logo](docs/source/_static/img/logo.png)
torchaudio 的目标是将 PyTorch 应用于音频领域。通过支持 PyTorch,torchaudio 遵循了相同的理念,即提供强大的 DCU 加速,注重通过 autograd 系统实现可训练的特性,并保持一致的风格(张量命名和维度命名)。因此,它主要是一个机器学习库,而不是一个通用的信号处理库。PyTorch 的优势在 torchaudio 中得以体现,所有计算都通过 PyTorch 操作完成,这使得它易于使用,并且像 PyTorch 的自然扩展。torchaudio官方github地址:[GitHub - pytorch/audio: Data manipulation and transformation for audio signal processing, powered by PyTorch](https://github.com/pytorch/audio)
The aim of torchaudio is to apply [PyTorch](https://github.com/pytorch/pytorch) to
the audio domain. By supporting PyTorch, torchaudio follows the same philosophy
of providing strong GPU acceleration, having a focus on trainable features through
the autograd system, and having consistent style (tensor names and dimension names).
Therefore, it is primarily a machine learning library and not a general signal
processing library. The benefits of PyTorch can be seen in torchaudio through
having all the computations be through PyTorch operations which makes it easy
to use and feel like a natural extension.
- 支持音频输入输出(加载文件,保存文件)
- 使用 SoX 将各种音频格式(如 wav、mp3、ogg、flac、opus、sphere)加载到 PyTorch 的张量中。
- 支持 Kaldi(ark/scp)格式。
- [Support audio I/O (Load files, Save files)](http://pytorch.org/audio/main/)
- Load a variety of audio formats, such as `wav`, `mp3`, `ogg`, `flac`, `opus`, `sphere`, into a torch Tensor using SoX
- [Kaldi (ark/scp)](http://pytorch.org/audio/main/kaldi_io.html)
- [Dataloaders for common audio datasets](http://pytorch.org/audio/main/datasets.html)
- Audio and speech processing functions
- [forced_align](https://pytorch.org/audio/main/generated/torchaudio.functional.forced_align.html)
- Common audio transforms
- [Spectrogram, AmplitudeToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample](http://pytorch.org/audio/main/transforms.html)
- Compliance interfaces: Run code using PyTorch that align with other libraries
- [Kaldi: spectrogram, fbank, mfcc](https://pytorch.org/audio/main/compliance.kaldi.html)
- 数据加载器
- 提供常见音频数据集的数据加载器。
Installation
------------
- 音频与语音处理功能
- 强制对齐(forced_align)。
Please refer to https://pytorch.org/audio/main/installation.html for installation and build process of TorchAudio.
- 常用音频变换
- 提供如频谱图、AmplitudeToDB、MelScale、MelSpectrogram、MFCC、MuLaw 编码与解码、重采样等常用的音频变换。
- 兼容性接口
- 通过 PyTorch 运行与其他库(如 Kaldi)对齐的代码,包括频谱图、fbank、MFCC 等功能。
API Reference
-------------
## 安装
API Reference is located here: http://pytorch.org/audio/main/
### 适用环境
Contributing Guidelines
-----------------------
- ubuntu20.04 或 rocky8.6
Please refer to [CONTRIBUTING.md](./CONTRIBUTING.md)
- Python==3.10
Citation
--------
- PyTorch==2.4.1 DTK=25.04
If you find this package useful, please cite as:
### 使用pip方式安装
```shell
pip install torchaudio* # (请下载对应操作系统的torchaudio的whl包)
```
### 源码编译安装
#### 编译环境准备
- 拉取torchaudio代码
```shell
git clone -b v2.4.1-fastpt http://developer.hpccube.com/codes/OpenDAS/torchaudio.git
```
- 导入环境变量以及安装必要依赖库
安装fastpt-2.0.1版本, cmake 版本要求3.19.0
```bibtex
@article{yang2021torchaudio,
title={TorchAudio: Building Blocks for Audio and Speech Processing},
author={Yao-Yuan Yang and Moto Hira and Zhaoheng Ni and Anjali Chourdia and Artyom Astafurov and Caroline Chen and Ching-Feng Yeh and Christian Puhrsch and David Pollack and Dmitriy Genzel and Donny Greenberg and Edward Z. Yang and Jason Lian and Jay Mahadeokar and Jeff Hwang and Ji Chen and Peter Goldsborough and Prabhat Roy and Sean Narenthiran and Shinji Watanabe and Soumith Chintala and Vincent Quenneville-Bélair and Yangyang Shi},
journal={arXiv preprint arXiv:2110.15018},
year={2021}
}
```shell
source /usr/local/bin/fastpt -c
```
使用audio时执行
```bibtex
@misc{hwang2023torchaudio,
title={TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch},
author={Jeff Hwang and Moto Hira and Caroline Chen and Xiaohui Zhang and Zhaoheng Ni and Guangzhi Sun and Pingchuan Ma and Ruizhe Huang and Vineel Pratap and Yuekai Zhang and Anurag Kumar and Chin-Yun Yu and Chuang Zhu and Chunxi Liu and Jacob Kahn and Mirco Ravanelli and Peng Sun and Shinji Watanabe and Yangyang Shi and Yumeng Tao and Robin Scheibler and Samuele Cornell and Sean Kim and Stavros Petridis},
year={2023},
eprint={2310.17864},
archivePrefix={arXiv},
primaryClass={eess.AS}
}
```shell
source /usr/local/bin/fastpt -e
```
Disclaimer on Datasets
----------------------
#### 编译安装
- 执行编译命令并安装
```shell
python3 setup.py bdist_wheel
pip3 install dist/torchaudio*
```
## 版本号查询
```shell
python -c "import torchaudio; print(torchaudio.__version__)"
```
This is a utility library that downloads and prepares public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have license to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.
- 版本号与官方版本同步,查询该软件的版本号,例如2.4.1;
If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thanks for your contribution to the ML community!
## Known Issue
Pre-trained Model License
-------------------------
-
The pre-trained models provided in this library may have their own licenses or terms and conditions derived from the dataset used for training. It is your responsibility to determine whether you have permission to use the models for your use case.
## 其他参考
For instance, SquimSubjective model is released under the Creative Commons Attribution Non Commercial 4.0 International (CC-BY-NC 4.0) license. See [the link](https://zenodo.org/record/4660670#.ZBtWPOxuerN) for additional details.
- [README_ORIGIN](README_ORIGIN.md)
Other pre-trained models that have different license are noted in documentation. Please checkout the [documentation page](https://pytorch.org/audio/main/).
- [GitHub - pytorch/audio](https://github.com/pytorch/audio)
......@@ -17,8 +17,9 @@ constexpr inline __host__ __device__ bool isPo2(IntType num) {
inline __device__ int laneId() {
int id;
asm("mov.s32 %0, %%laneid;" : "=r"(id));
return id;
//asm("mov.s32 %0, %%laneid;" : "=r"(id));
//return id;
return __lane_id();
}
/**
* @brief Shuffle the data inside a warp
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment