Commits · 2d99fee29efac7fcd09f679ed6f7f3379eaac512 · OpenDAS / Torchaudio

08 Nov, 2022 1 commit

Add convolution transforms (#2811) · 2d99fee2

hwangjeff authored Nov 07, 2022

Summary:
Adds `torch.nn.Module`-based implementations for convolution and FFT convolution.

Pull Request resolved: https://github.com/pytorch/audio/pull/2811

Reviewed By: carolineechen

Differential Revision: D40881937

Pulled By: hwangjeff

fbshipit-source-id: bfe8969e6178ad4f58981efd4b2720ac006be8de

2d99fee2

02 Nov, 2022 1 commit

Add links to training recipes (#2812) · ce2ae984

moto authored Nov 01, 2022

Summary:
<img width="756" alt="Screen Shot 2022-11-01 at 3 32 58 PM" src="https://user-images.githubusercontent.com/855818/199173348-f463ae71-438c-4dad-a481-b65522a8e52f.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2812

Reviewed By: carolineechen

Differential Revision: D40919942

Pulled By: mthrok

fbshipit-source-id: 18e5a709c262fb0b15ada0d303f1d0dee033beb1

ce2ae984

28 Oct, 2022 1 commit

Refactor tutorial index (#2767) · e6bd346e

moto authored Oct 28, 2022

Summary:
This commit re-organizes the tutorials.

1. Put all the tutorials in the left bar and make the section **folded by default**.
2. Add pytorch/tutorials-like cards in index
3. Move feature classifications to a dedicated page.

https://output.circle-artifacts.com/output/job/1f1a04a5-137e-428d-9da4-c46f59eeffa4/artifacts/0/docs/index.html

<img width="1073" alt="Screen Shot 2022-10-28 at 7 34 29 AM" src="https://user-images.githubusercontent.com/855818/198410686-3ef40ad2-c9c9-443c-800e-6e51e1b6a491.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2767

Reviewed By: carolineechen

Differential Revision: D40627547

Pulled By: mthrok

fbshipit-source-id: 098b825f242e91919126014abdab27852304ae64

e6bd346e

23 Sep, 2022 1 commit

Introduce IO section to getting started tutorials (#2703) · faf8f1cc

moto authored Sep 23, 2022

Summary:
Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest.
So this commit introduces sub-index for IO tutorials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2703

Reviewed By: carolineechen

Differential Revision: D39769049

Pulled By: mthrok

fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812

faf8f1cc

15 Sep, 2022 1 commit

Consolidate bibliography / reference (#2676) · 476ab9ab

moto authored Sep 14, 2022

Summary:
Preparation for the adoptation of `autosummary`.

Replace `:footcite:` with `:cite:` and introduce dedicated reference page, as `:footcite:` does not work well with `autosummary`.

Example:

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/datasets.html#cmuarctic

https://output.circle-artifacts.com/output/job/4da47ba6-d9c7-418e-b5b0-e9f8a146a6c3/artifacts/0/docs/references.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2676

Reviewed By: carolineechen

Differential Revision: D39509431

Pulled By: mthrok

fbshipit-source-id: e6003dd01ec3eff3d598054690f61de8ee31ac9a

476ab9ab

15 Aug, 2022 1 commit

Remove outdated doc (#2617) · aa591c0d

Zhaoheng Ni authored Aug 15, 2022

Summary:
`ctc_decoder` has become beta, remove it from prototype documents.

Pull Request resolved: https://github.com/pytorch/audio/pull/2617

Reviewed By: hwangjeff

Differential Revision: D38706869

Pulled By: nateanl

fbshipit-source-id: 41679f4e65a584b6b882af4551a50123f1dcef02

aa591c0d

05 Aug, 2022 1 commit

Add convolution operator (#2602) · b396157d

hwangjeff authored Aug 05, 2022

Summary:
Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT.

Pull Request resolved: https://github.com/pytorch/audio/pull/2602

Reviewed By: nateanl, mthrok

Differential Revision: D38450771

Pulled By: hwangjeff

fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b

b396157d

28 Jul, 2022 1 commit

Create tutorial for HDemucs (#2572) · 919fd0c4

Sean Kim authored Jul 28, 2022

Summary:
Add tutorial python file, draft PR, will continue to modify accordingly to feedback.

Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments

Pull Request resolved: https://github.com/pytorch/audio/pull/2572

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D38234001

Pulled By: skim0514

fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5

919fd0c4

08 Jun, 2022 1 commit

Split Streaming API tutorials into two (#2446) · 2d846263

moto authored Jun 07, 2022

Summary:
The Streaming API tutorial has gotten long, so this commit split it into two.

Pull Request resolved: https://github.com/pytorch/audio/pull/2446

Reviewed By: hwangjeff

Differential Revision: D36987513

Pulled By: mthrok

fbshipit-source-id: 13e3aad74c0d0e654c39c0eeceffca1a00b0dac4

2d846263

01 Jun, 2022 1 commit

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

20 May, 2022 1 commit

Add tutorial to use NVDEC with Stream API (#2393) · 07ace387

moto authored May 20, 2022

Summary:
This commit adds tutorial to enable/use NVDEC with Stream API.

https://output.circle-artifacts.com/output/job/19e66a4b-1819-4804-8834-d38e6c80c4fd/artifacts/0/docs/hw_acceleration_tutorial.html

Because the use of NVDEC requires build / install FFmpeg from source,
this tutorial was authored on Google Colab, tailored to its environment.

The tutorial here is the result of the notebook execution, with
the link to the publicly accessible Google Colab notebook.

Pull Request resolved: https://github.com/pytorch/audio/pull/2393

Reviewed By: hwangjeff

Differential Revision: D36404408

Pulled By: mthrok

fbshipit-source-id: 9c820d3db4d06c5b343ecad0708489125ca06948

07ace387

13 May, 2022 1 commit

Move Streamer API out of prototype (#2378) · 72b712a1

moto authored May 13, 2022

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328

72b712a1

12 Apr, 2022 1 commit

Add Conformer RNN-T model prototype (#2322) · b0c8e239

hwangjeff authored Apr 11, 2022

Summary:
Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
- Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
- Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
- Introduces tests for `conformer_rnnt_model`.
- Adds docs.

Pull Request resolved: https://github.com/pytorch/audio/pull/2322

Reviewed By: xiaohui-zhang

Differential Revision: D35565987

Pulled By: hwangjeff

fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789

b0c8e239

08 Apr, 2022 1 commit

Add devices/properties badges (#2321) · 72ae755a

moto authored Apr 07, 2022

Summary:
Add badges of supported properties and devices to functionals and transforms.

This commit adds `.. devices::` and `.. properties::` directives to sphinx.

APIs with these directives will have badges (based off of shields.io) which link to the
page with description of these features.

Continuation of https://github.com/pytorch/audio/issues/2316
Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.

Pull Request resolved: https://github.com/pytorch/audio/pull/2321

Reviewed By: hwangjeff

Differential Revision: D35489063

Pulled By: mthrok

fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762

72ae755a

26 Feb, 2022 1 commit

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

04 Feb, 2022 1 commit

Add RNNTBundle with weights pre-trained on tedlium3 dataset (#2177) · a1dc9e0a

Zhaoheng Ni authored Feb 04, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2177

Reviewed By: hwangjeff

Differential Revision: D33893052

Pulled By: nateanl

fbshipit-source-id: 00ff011eb96662b162c0327196a9564721e9c8f7

a1dc9e0a

03 Feb, 2022 1 commit

Add tutorials with streaming API (#2193) · c00f65da

moto authored Feb 03, 2022

Summary:
* tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html
* tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2193

Reviewed By: hwangjeff

Differential Revision: D33971312

Pulled By: mthrok

fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f

c00f65da

02 Feb, 2022 1 commit

Add Streaming API (#2164) · 7a3e262d

moto authored Feb 01, 2022

Summary:
This PR adds the prototype streaming API.
The implementation is based on ffmpeg libraries.

For the detailed usage, please refer to [the resulting tutorial](https://534376-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html).

Pull Request resolved: https://github.com/pytorch/audio/pull/2164

Reviewed By: hwangjeff

Differential Revision: D33934457

Pulled By: mthrok

fbshipit-source-id: 92ade4aff2d25baf02c0054682d4fbdc9ba8f3fe

7a3e262d

01 Feb, 2022 1 commit

Update stale prototype references (#2189) · 1a0935c6

hwangjeff authored Feb 01, 2022

Summary:
Missed a couple of spots in https://github.com/pytorch/audio/issues/2187.

Pull Request resolved: https://github.com/pytorch/audio/pull/2189

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D33926342

Pulled By: hwangjeff

fbshipit-source-id: e1324c0fe8f9be90ad3143d19cd61c3d53f02b06

1a0935c6

29 Dec, 2021 2 commits

Reorganize RNN-T components in prototype module (#2110) · 67cdf882

hwangjeff authored Dec 29, 2021

Summary:
Regroup RNN-T components under `torchaudio.prototype.models` and `torchaudio.prototype.pipelines`.

Updated docs: https://492321-90321822-gh.circle-artifacts.com/0/docs/prototype.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2110

Reviewed By: carolineechen, mthrok

Differential Revision: D33354116

Pulled By: hwangjeff

fbshipit-source-id: 9cf4afed548cb173d56211c16d31bcfa25a8e4cb

67cdf882

Update prototype documentations (#2108) · 10cce198

moto authored Dec 28, 2021

Summary:
### Change list

* Split the documentation of prototypes
* Add a new API reference section dedicated for prototypes.
* Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder
* Hide the signature of RNNT constructor. (cc hwangjeff )
  * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT
* Tweak CTC tutorial
  * Replace hyperlinks to API reference with backlinks
  * Add `progress=False` to download

### Follow-up

RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous.
I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough.

### Before

https://pytorch.org/audio/main/prototype.html

<img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png">

### After

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html

<img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html

<img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png">

https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html

<img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2108

Reviewed By: hwangjeff, carolineechen, nateanl

Differential Revision: D33340816

Pulled By: mthrok

fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187

10cce198

28 Dec, 2021 1 commit

Add ASR CTC inference tutorial (#2106) · 133d0065

Caroline Chen authored Dec 28, 2021

Summary:
demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:
- incorporate `nbest`
- demonstrate customizability of different beam search parameters

Pull Request resolved: https://github.com/pytorch/audio/pull/2106

Reviewed By: mthrok

Differential Revision: D33340946

Pulled By: carolineechen

fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7

133d0065

05 Nov, 2021 4 commits

Update documentation top page (#1988) · e7ea820e

moto authored Nov 05, 2021

- Add link to index page on left
- Package Reference -> API Reference
- Update description.

e7ea820e

Port MVDR tutorial (#1983) · b9247022
moto authored Nov 05, 2021

b9247022
Port audio manipulation tutorial (#1970) · 8f061987
moto authored Nov 05, 2021

8f061987

Refactor tutorial organization (#1987) · 6cf84866

moto authored Nov 05, 2021

* Refactor tutorial organization

* Merge tutorial subdirectoris under to examples/gallery/tutorials
* Do not use index.rst generated by Sphinx-gallery
* Instead use flat structure so that all the tutorials are listed in left menu
* Use `_assets` dir for artifacts of tutorials

6cf84866

04 Nov, 2021 2 commits
- Port TTS tutorial (#1973) · b3c2cfce
  moto authored Nov 04, 2021
  
  b3c2cfce
- Add Sphinx-gallery to doc (#1967) · a3363539
  moto authored Nov 04, 2021
  
  a3363539
02 Nov, 2021 1 commit
- Add citation information in the documentation (#1962) · 8a93717c
  yangarbiter authored Nov 02, 2021
  
  8a93717c
15 Oct, 2021 1 commit

Move wav2vec2 pretrained models to pipelines module (#1876) · fad855cd

moto authored Oct 15, 2021

- Move wav2vec2 pretrained weights to `torchaudio.pipelines` namespace to align with #1872.
- Split `Wav2Vec2PretrainedModelBundle` into `Wav2Vec2Bundle` (for pre-training model) and  `Wav2Vec2ASRBundle` (for models fine-tuned for ASR).
- Update base URL

fad855cd

06 Oct, 2021 1 commit

Introduce Emformer (#1801) · 48cfbf2b

hwangjeff authored Oct 06, 2021

Adds an implementation of Emformer, a memory-efficient transformer architecture
introduced in https://ieeexplore.ieee.org/document/9414560 that targets low-latency
streaming speech recognition applications.

48cfbf2b

19 Aug, 2021 1 commit
- Move RNNT Loss out of prototype (#1711) · 2c115821
  Caroline Chen authored Aug 19, 2021
  
  2c115821
18 Aug, 2021 1 commit
- Move Tacotron2 out of prototype (#1714) · 352d63c5
  yangarbiter authored Aug 17, 2021
  
  352d63c5
12 Aug, 2021 1 commit
- Add prototype.tacotron2 page to docs (#1695) · 9c641849
  yangarbiter authored Aug 12, 2021
  
  9c641849
30 Apr, 2021 1 commit

Replace existing prototype RNNT Loss (#1479) · 0c263a93

Caroline Chen authored Apr 30, 2021

Replace the prototype RNNT implementation (using warp-transducer) with one without external library dependencies

0c263a93

11 Jan, 2021 1 commit
- add doc for rnnt loss (#1171) · b57f05c4
  Vincent QB authored Jan 11, 2021
  
  b57f05c4
19 Oct, 2020 1 commit

Update index.rst (#968) · ba1698ba

Brian Johnson authored Oct 19, 2020

Adds introductory context and links to the PyTorch Libraries to audio docs.

ba1698ba

20 Jul, 2020 1 commit

Update documentation and fix docstrings (#788) · 2381dd89

moto authored Jul 20, 2020

- Addresses #549 #638 #786 
- Add `torchaudio` top level module doc
- Separate `torchaudio` top level module doc from `index.html`
- Add `backend` module doc.
- Remove `-> None` from function signature as it adds noise to documentation
- Changed function argument name of `torchaudio.backend.sox_io_backend.save` from `tensor` to `src`, so that it matches with the reset of backends.
- Tweak bunch of docstrings

2381dd89

16 Jul, 2020 1 commit

Add Torchscript sox effects (#760) · 60a8e23d

moto authored Jul 15, 2020

* Add sox_utils module

* Make init/shutdown thread safe

* Add sox effects implementation

* Add test for sox effects

* Update docstrings and add examples

60a8e23d

01 Aug, 2019 1 commit
- Removal of torchaudio.legacy · d8a47f4a
  jamarshon authored Aug 01, 2019
  
  d8a47f4a