Commits · faf8f1cc06995c56fe06237fd2e485ab7b571546 · OpenDAS / Torchaudio

23 Sep, 2022 1 commit

Introduce IO section to getting started tutorials (#2703) · faf8f1cc

moto authored Sep 23, 2022

Summary:
Since that new tutorials for StreamWriter are being added, there are more tutorials for media IO than the rest.
So this commit introduces sub-index for IO tutorials.

Pull Request resolved: https://github.com/pytorch/audio/pull/2703

Reviewed By: carolineechen

Differential Revision: D39769049

Pulled By: mthrok

fbshipit-source-id: 19a3981bc624fdce1d5d703c67e28a751a15e812

faf8f1cc

22 Sep, 2022 2 commits

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692) · 49b23e15

moto authored Sep 22, 2022

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

49b23e15

Update and fix tutorials (#2701) · 709b4439

moto authored Sep 22, 2022

Summary:
* Fix Sphinx warning
* Update asset management

Pull Request resolved: https://github.com/pytorch/audio/pull/2701

Reviewed By: carolineechen

Differential Revision: D39714126

Pulled By: mthrok

fbshipit-source-id: a5b04cfbf8bedce67c621b6bfe1dcd975b343313

709b4439

21 Sep, 2022 2 commits

Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689) · 0b3ddec6

moto authored Sep 21, 2022

Summary:
* Introduce the mini-index at `torchaudio.pipelines` page.
* Add introductions
* Update pipeline tutorials

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html

<img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png">

<img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png">

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle

<img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2689

Reviewed By: carolineechen

Differential Revision: D39691253

Pulled By: mthrok

fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49

0b3ddec6

Adopt `:autosummary:` in `torchaudio.models` module doc (#2690) · 30c7077b

moto authored Sep 20, 2022

Summary:
* Introduce the mini-index at `torchaudio.models` page.

https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html

<img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png">

<img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2690

Reviewed By: carolineechen

Differential Revision: D39654948

Pulled By: mthrok

fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8

30c7077b

14 Sep, 2022 1 commit

Move Hybrid Demucs pipeline to beta (#2673) · 60868748

Caroline Chen authored Sep 14, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673

Reviewed By: mthrok

Differential Revision: D39507612

Pulled By: carolineechen

fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53

60868748

13 Sep, 2022 1 commit

[Bootcamp] Fix Typo (#2661) · 697e15ab

Anthony Tao authored Sep 13, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2661

Fixed typo in `audio_data_augmentation_tutorial.py`

Reviewed By: malfet, mthrok

Differential Revision: D39352353

fbshipit-source-id: aea35dab03fb7422421948bd26716e10a8d65f92

697e15ab

18 Aug, 2022 3 commits

Update ASR inference tutorial (#2631) · 189edb1b

moto authored Aug 18, 2022

Summary:
* Use download_asset
* Remove notes around nightly
* Print versions first
* Remove duplicated import

Pull Request resolved: https://github.com/pytorch/audio/pull/2631

Reviewed By: carolineechen

Differential Revision: D38830395

Pulled By: mthrok

fbshipit-source-id: c9259df33562defe249734d1ed074dac0fddc2f6

189edb1b

Update notes around nightly build and third parties (#2632) · 55ce80b1

moto authored Aug 18, 2022

Summary:
Google Colab now has torchaudio 0.12 pre-installed.
This commit removes the note about nightly build.

Pull Request resolved: https://github.com/pytorch/audio/pull/2632

Reviewed By: carolineechen

Differential Revision: D38827632

Pulled By: mthrok

fbshipit-source-id: ac769780868b741c3012357d589ec0019d9af6eb

55ce80b1

Tweak tutorials (#2630) · cab2bb44

moto authored Aug 18, 2022

Summary:
Resolves the following warnings

```
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:195: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/asr_inference_with_ctc_decoder_tutorial.rst:446: WARNING: Unexpected indentation.
/torchaudio/docs/source/tutorials/audio_io_tutorial.rst:559: WARNING: Content block expected for the "note" directive; none found.
/torchaudio/docs/source/tutorials/mvdr_tutorial.rst:338: WARNING: Bullet list ends without a blank line; unexpected unindent.
```

Pull Request resolved: https://github.com/pytorch/audio/pull/2630

Reviewed By: nateanl

Differential Revision: D38816632

Pulled By: mthrok

fbshipit-source-id: 135ded4e064d136be67ce24439e96f5e9c9ce635

cab2bb44

05 Aug, 2022 1 commit

Add note for lexicon free decoder output (#2603) · 33485b8c

Caroline Chen authored Aug 05, 2022

Summary:
``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.

Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc

Pull Request resolved: https://github.com/pytorch/audio/pull/2603

Reviewed By: mthrok

Differential Revision: D38459709

Pulled By: carolineechen

fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934

33485b8c

01 Aug, 2022 1 commit

Update data augmentation tutorial (#2595) · f1443b8f

moto authored Aug 01, 2022

Summary:
In https://github.com/pytorch/audio/pull/2285, the SNR calculation was fixed,
but there was still one that was not fixed. This commit fixes it.

Also following the feedback https://github.com/pytorch/tutorials/issues/1930#issuecomment-1199741336, update the variable name.

Pull Request resolved: https://github.com/pytorch/audio/pull/2595

Reviewed By: carolineechen

Differential Revision: D38314672

Pulled By: mthrok

fbshipit-source-id: b2015e2709729190d97264aa191651b3af4ba856

f1443b8f

29 Jul, 2022 2 commits

Update forced alignment tutorial (#2544) · c26b38b2

moto authored Jul 29, 2022

Summary:
1. Fix initialization.
Previously, the SOS token score was initialized to 0 across the time axis.
This was biasing the alignment to delay the start.
The proper way to delay the SOS is via blank token.
The new initilization takes the cumulated sum of blank scores.
2. Fill the end of trellis with Inf
Similar to the start, at the end where there remaining time frame is less
than the number of tokens, it is no longer possible to align the text, thus
we fill with Inf for better visualization.
3. Clean up asset management code.

Pull Request resolved: https://github.com/pytorch/audio/pull/2544

Reviewed By: nateanl

Differential Revision: D38276478

Pulled By: mthrok

fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299

c26b38b2

Improve speech enhancement tutorial (#2527) · d6267031

Zhaoheng Ni authored Jul 29, 2022

Summary:
- The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
- Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
- FIx the figure in `rtf_power` subsection.
    - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
- Print PESQ, STOI, and SDR metric scores.

Pull Request resolved: https://github.com/pytorch/audio/pull/2527

Reviewed By: mthrok

Differential Revision: D38190218

Pulled By: nateanl

fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de

d6267031

28 Jul, 2022 1 commit

Create tutorial for HDemucs (#2572) · 919fd0c4

Sean Kim authored Jul 28, 2022

Summary:
Add tutorial python file, draft PR, will continue to modify accordingly to feedback.

Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments

Pull Request resolved: https://github.com/pytorch/audio/pull/2572

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D38234001

Pulled By: skim0514

fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5

919fd0c4

08 Jun, 2022 2 commits

Update HW decoding tutorial and add notes about unseekable object (#2408) · 711d6016

moto authored Jun 08, 2022

Summary:
https://output.circle-artifacts.com/output/job/75187a52-b0d8-4cac-89f3-24e10889a36a/artifacts/0/docs/hw_acceleration_tutorial.html

1. Update HW decoding tutorial to include file-like object
1. Add note about unseekable object int streaming API tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2408

Reviewed By: hwangjeff

Differential Revision: D36632702

Pulled By: mthrok

fbshipit-source-id: 17be2fb8528cb1d2d1ee11901b6a95c512466feb

711d6016

Split Streaming API tutorials into two (#2446) · 2d846263

moto authored Jun 07, 2022

Summary:
The Streaming API tutorial has gotten long, so this commit split it into two.

Pull Request resolved: https://github.com/pytorch/audio/pull/2446

Reviewed By: hwangjeff

Differential Revision: D36987513

Pulled By: mthrok

fbshipit-source-id: 13e3aad74c0d0e654c39c0eeceffca1a00b0dac4

2d846263

07 Jun, 2022 2 commits

Remove CTC decoder prototype message (#2459) · da3ffe9b

Caroline Chen authored Jun 07, 2022

Summary:
ctc decoder has been moved to beta, remove prototype message from tutorial

(this is done on the release branch in https://github.com/pytorch/audio/issues/2457)

Pull Request resolved: https://github.com/pytorch/audio/pull/2459

Reviewed By: hwangjeff

Differential Revision: D36978417

Pulled By: carolineechen

fbshipit-source-id: e580c1e8475a1a0aa924d44deea3852adc332a86

da3ffe9b

Update audio I/O tutorials (#2385) · 4c19e2cb

moto authored Jun 07, 2022

Summary:
- Adopt `torchaudio.utils.download_asset` to simplify asset management.
- Break down the first section about helper functions.
- Use tempfile so that executing tutorial won't leave any artifacts on local file system.

Example: https://output.circle-artifacts.com/output/job/b11a0087-8bf9-4999-a74f-b53798eaa77f/artifacts/0/docs/tutorials/audio_io_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2385

Reviewed By: hwangjeff

Differential Revision: D36404399

Pulled By: mthrok

fbshipit-source-id: 106af34e8ddd22a061aa12767b444b32aef07bad

4c19e2cb

03 Jun, 2022 3 commits

Update audio data augmentation tutorial (#2388) · 41082eb0

moto authored Jun 03, 2022

Summary:
- Adopt `torchaudio.utils.download_asset` to simplify asset management.
- Break down the first section about helper functions.
- Reduce the number of helper functions

https://output.circle-artifacts.com/output/job/d7dd1b93-6dfe-46da-a080-109bfdc63881/artifacts/0/docs/tutorials/audio_data_augmentation_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2388

Reviewed By: carolineechen

Differential Revision: D36404405

Pulled By: mthrok

fbshipit-source-id: f460ed810519797fce6e2fa7baaee110bddd1d06

41082eb0

Update audio resampling tutorial (#2386) · fd2be89a

moto authored Jun 03, 2022

Summary:
- Replace mis-use of plot_specgram with plot_sweep, and remove plot_specgram
- Move `benchmark_resample` to later section

https://output.circle-artifacts.com/output/job/9f7af187-777d-4d75-840f-2630a36295b7/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2386

Reviewed By: carolineechen

Differential Revision: D36404403

Pulled By: mthrok

fbshipit-source-id: f9df8453e3f531bdc4549b0134e5dbba90653bf7

fd2be89a

Update audio feature extraction tutorial (#2391) · 8e20d546

moto authored Jun 03, 2022

Summary:
- Adopt torchaudio.utils.download_asset to simplify asset management.
- Break down the first section about helper functions.
- Reduce the number of helper functions

Pull Request resolved: https://github.com/pytorch/audio/pull/2391

Reviewed By: carolineechen, nateanl

Differential Revision: D36885626

Pulled By: mthrok

fbshipit-source-id: 1306f22ab70ab1e7f74ed7e43bf43150015448b6

8e20d546

02 Jun, 2022 1 commit

Update MVDR beamforming tutorial (#2398) · d01f5891

Zhaoheng Ni authored Jun 01, 2022

Summary:
- Use `download_asset` to download audios.
- Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules.
- Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation.
- Visualize the spectrograms and masks.

Pull Request resolved: https://github.com/pytorch/audio/pull/2398

Reviewed By: carolineechen

Differential Revision: D36549402

Pulled By: nateanl

fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5

d01f5891

01 Jun, 2022 1 commit

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

13 May, 2022 1 commit

Move Streamer API out of prototype (#2378) · 72b712a1

moto authored May 13, 2022

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328

72b712a1

12 May, 2022 1 commit

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

28 Apr, 2022 1 commit

Add BUILD_MAD option and default to OFF (#2354) · a71e3a40

moto authored Apr 28, 2022

Summary:
libmad integration should be enabled only from source-build

Pull Request resolved: https://github.com/pytorch/audio/pull/2354

Reviewed By: nateanl

Differential Revision: D36012035

Pulled By: mthrok

fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7

a71e3a40

26 Apr, 2022 1 commit

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

13 Apr, 2022 1 commit

Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc

hwangjeff authored Apr 12, 2022

Summary:
Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.

Pull Request resolved: https://github.com/pytorch/audio/pull/2325

Reviewed By: xiaohui-zhang

Differential Revision: D35597753

Pulled By: hwangjeff

fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c

fb51cecc

25 Mar, 2022 1 commit

Add Pretrained LM Support for Decoder (#2275) · 34c0d115

Caroline Chen authored Mar 24, 2022

Summary:
add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2275

Reviewed By: mthrok

Differential Revision: D35115418

Pulled By: carolineechen

fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0

34c0d115

24 Mar, 2022 2 commits

Update CTC decoder docs and add citation (#2278) · 05592dff

Caroline Chen authored Mar 24, 2022

Summary:
rendered:
- [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html)
- [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html)

Pull Request resolved: https://github.com/pytorch/audio/pull/2278

Reviewed By: mthrok

Differential Revision: D35097734

Pulled By: carolineechen

fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c

05592dff

Add notes about prototype features in tutorials (#2288) · 8844fbb7

moto authored Mar 23, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288

Reviewed By: hwangjeff

Differential Revision: D35099492

Pulled By: mthrok

fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f

8844fbb7

22 Mar, 2022 1 commit

Fix calculation of SNR value in tutorial (#2285) · 8395fe65

Hagen Wierstorf authored Mar 22, 2022

Summary:
The calculation of the SNR in tha data augmentation examples seems to be wrong to me:

![image](https://user-images.githubusercontent.com/173624/159487032-c60470c6-ef8e-48a0-ad5e-a117fcb8d606.png)

If we start from the definition of the signal-to-noise ratio using the root mean square value we get:

```
SNR = 20 log10 ( rms(scale * speech) / rms(noise) )
```
this can be transformed to
```
scale = 10^(SNR/20) rms(noise) / rms(speech)
```
In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have
```
rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2)
```
this would lead us to:
```
10^(SNR/20) = e^(SNR / 10)
```
which is not true.

Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`.

For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41.

Pull Request resolved: https://github.com/pytorch/audio/pull/2285

Reviewed By: nateanl

Differential Revision: D35047737

Pulled By: mthrok

fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3

8395fe65

17 Mar, 2022 1 commit

[Doc] fix typo and backlink (#2281) · 1c3403ea

moto authored Mar 17, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281

Reviewed By: carolineechen

Differential Revision: D34939494

Pulled By: mthrok

fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d

1c3403ea

10 Mar, 2022 1 commit

Fix typos and remove comments (#2270) · 4b47412e

moto authored Mar 10, 2022

Summary:
Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202

Pull Request resolved: https://github.com/pytorch/audio/pull/2270

Reviewed By: hwangjeff

Differential Revision: D34793460

Pulled By: mthrok

fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723

4b47412e

26 Feb, 2022 1 commit

Improve device streaming (#2202) · 365313ed

moto authored Feb 25, 2022

Summary:
This commit adds tutorial for device ASR, and update API for device streaming.

The changes for the interface are
1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
2. Move `fill_buffer` method to private.

When dealing with device stream, there are situations where the device buffer is not
ready and the system returns `EAGAIN`. In such case, the previous implementation of
`process_packet` method raised an exception in Python layer , but for device ASR,
this is inefficient. A better approach is to retry within C++ layer in blocking manner.
The new `timeout` parameter serves this purpose.

Pull Request resolved: https://github.com/pytorch/audio/pull/2202

Reviewed By: nateanl

Differential Revision: D34475829

Pulled By: mthrok

fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59

365313ed

17 Feb, 2022 1 commit

Update online ASR tutorial (#2226) · c5c4bbfd

moto authored Feb 16, 2022

Summary:
https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html

1. Add figure to explain the caching
2. Fix the initialization of stream iterator

Pull Request resolved: https://github.com/pytorch/audio/pull/2226

Reviewed By: carolineechen

Differential Revision: D34265971

Pulled By: mthrok

fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4

c5c4bbfd

15 Feb, 2022 1 commit

Update context building to not delay the inference (#2213) · 8e3c6144

moto authored Feb 14, 2022

Summary:
Updating the context cacher so that fetched audio chunk is used for inference immediately.

https://github.com/pytorch/audio/pull/2202#discussion_r802838174

Pull Request resolved: https://github.com/pytorch/audio/pull/2213

Reviewed By: hwangjeff

Differential Revision: D34235230

Pulled By: mthrok

fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e

8e3c6144