Commits · 265c09d80e22c7c26063770ea2d11c0095270fc0 · OpenDAS / Torchaudio

10 Aug, 2022 1 commit

Fix bug in Conformer RNN-T recipe (#2611) · cd4d6607

hwangjeff authored Aug 10, 2022

Summary:
https://github.com/pytorch/audio/issues/2535 modified the Conformer RNN-T Lightning module to accept a SentencePiece model instance rather than a file path. This PR makes changes to account for this in the train script.

Pull Request resolved: https://github.com/pytorch/audio/pull/2611

Reviewed By: carolineechen

Differential Revision: D38578892

Pulled By: hwangjeff

fbshipit-source-id: ec3b9823ad30ffb730baa13d10d8b79020866aac

cd4d6607

05 Aug, 2022 1 commit

Add note for lexicon free decoder output (#2603) · 33485b8c

Caroline Chen authored Aug 05, 2022

Summary:
``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.

Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc

Pull Request resolved: https://github.com/pytorch/audio/pull/2603

Reviewed By: mthrok

Differential Revision: D38459709

Pulled By: carolineechen

fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934

33485b8c

01 Aug, 2022 1 commit

Update data augmentation tutorial (#2595) · f1443b8f

moto authored Aug 01, 2022

Summary:
In https://github.com/pytorch/audio/pull/2285, the SNR calculation was fixed,
but there was still one that was not fixed. This commit fixes it.

Also following the feedback https://github.com/pytorch/tutorials/issues/1930#issuecomment-1199741336, update the variable name.

Pull Request resolved: https://github.com/pytorch/audio/pull/2595

Reviewed By: carolineechen

Differential Revision: D38314672

Pulled By: mthrok

fbshipit-source-id: b2015e2709729190d97264aa191651b3af4ba856

f1443b8f

29 Jul, 2022 2 commits

Update forced alignment tutorial (#2544) · c26b38b2

moto authored Jul 29, 2022

Summary:
1. Fix initialization.
Previously, the SOS token score was initialized to 0 across the time axis.
This was biasing the alignment to delay the start.
The proper way to delay the SOS is via blank token.
The new initilization takes the cumulated sum of blank scores.
2. Fill the end of trellis with Inf
Similar to the start, at the end where there remaining time frame is less
than the number of tokens, it is no longer possible to align the text, thus
we fill with Inf for better visualization.
3. Clean up asset management code.

Pull Request resolved: https://github.com/pytorch/audio/pull/2544

Reviewed By: nateanl

Differential Revision: D38276478

Pulled By: mthrok

fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299

c26b38b2

Improve speech enhancement tutorial (#2527) · d6267031

Zhaoheng Ni authored Jul 29, 2022

Summary:
- The "speech + noise" mixture still has a high SNR, which can't show the effectiveness of MVDR beamforming. To make the task more challenging, amplify the noise waveform to reduce the SNR of mixture speech.
- Show the Si-SNR score of mixture speech when visualizing the mixture spectrogram.
- FIx the figure in `rtf_power` subsection.
    - The description of enhanced spectrogram by `rtf_power` is wrong. Correct it to `rtf_power`.
- Print PESQ, STOI, and SDR metric scores.

Pull Request resolved: https://github.com/pytorch/audio/pull/2527

Reviewed By: mthrok

Differential Revision: D38190218

Pulled By: nateanl

fbshipit-source-id: 39562850a67f58a16e0a2866ed95f78c3f4dc7de

d6267031

28 Jul, 2022 2 commits

Create tutorial for HDemucs (#2572) · 919fd0c4

Sean Kim authored Jul 28, 2022

Summary:
Add tutorial python file, draft PR, will continue to modify accordingly to feedback.

Future plan: modify spectrogram and bottom audio design and work on finding best audio track and segments

Pull Request resolved: https://github.com/pytorch/audio/pull/2572

Reviewed By: carolineechen, nateanl, mthrok

Differential Revision: D38234001

Pulled By: skim0514

fbshipit-source-id: fe9207864f354dec5cf5ff52bf7d9ddcf4a001d5

919fd0c4

Fix hubert fine-tuning recipe bugs (#2588) · 0092aa3c

Zhaoheng Ni authored Jul 28, 2022

Summary:
- The optimizer in fine-tuning recipe should also be `AdamW`. See https://github.com/pytorch/audio/pull/2412
- Fix the import of `DistributedBatchSampler` in hubert dataset
- Fix `dataset_path` in fine-tuning module.

Pull Request resolved: https://github.com/pytorch/audio/pull/2588

Reviewed By: carolineechen

Differential Revision: D38243423

Pulled By: nateanl

fbshipit-source-id: badc88ce9eddfd71270201a65ae89433fae2733f

0092aa3c

11 Jul, 2022 1 commit

Revise LibriSpeech Conformer RNN-T recipe (#2535) · a7d1b31c

Jeff Hwang authored Jul 11, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2535

Modifies LibriSpeech Conformer RNN-T example recipe to make the Lightning module and datamodule more generic and reusable.

Reviewed By: mthrok

Differential Revision: D36731576

fbshipit-source-id: 4643e86fac78f3c2bacc15f5d385bc7b10f410a2

a7d1b31c

23 Jun, 2022 1 commit

[AutoAccept][Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` · fee994ce

CodemodService FBSourceBlackLinterBot authored Jun 23, 2022

Summary:
Meta:
**If you take no action, this diff will be automatically accepted on 2022-06-23.**
(To remove yourself from auto-accept diffs and just let them all land, add yourself to [this Butterfly rule](https://www.internalfb.com/butterfly/rule/904302247110220))

Produced by `tools/arcanist/lint/codemods/black-fbsource`.

#nocancel

Rules run:
- CodemodTransformerSimpleShell

Config Oncall: [lint](https://our.intern.facebook.com/intern/oncall3/?shortname=lint)
CodemodConfig: [CodemodConfigFBSourceBlackLinter](https://www.internalfb.com/code/www/flib/intern/codemod_service/config/fbsource_arc_f/CodemodConfigFBSourceBlackLinter.php)
ConfigType: php
Sandcastle URL: https://www.internalfb.com/intern/sandcastle/job/13510799586951394/
This diff was automatically created with CodemodService.
To learn more about CodemodService, check out the [CodemodService wiki](https://fburl.com/CodemodService).

_____

## Questions / Comments / Feedback?

**[Click here to give feedback about this diff](https://www.internalfb.com/codemod_service/feedback?sandcastle_job_id=13510799586951394).**

* Returning back to author or abandoning this diff will only cause the diff to be regenerated in the future.
* Do **NOT** post in the CodemodService Feedback group about this specific diff.

drop-conflicts

Reviewed By: adamjernst

Differential Revision: D37375235

fbshipit-source-id: 3d7eb39e5c0539a78d1412f37562dec90b0fc759

fee994ce

17 Jun, 2022 1 commit

Make lazy import for joblib (#2498) · 10195316

Zhaoheng Ni authored Jun 17, 2022

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2498

Reviewed By: mthrok

Differential Revision: D37224024

Pulled By: nateanl

fbshipit-source-id: 5d5d561c43d1ee323ae0cc599ffa1479208ea09a

10195316

08 Jun, 2022 2 commits

Update HW decoding tutorial and add notes about unseekable object (#2408) · 711d6016

moto authored Jun 08, 2022

Summary:
https://output.circle-artifacts.com/output/job/75187a52-b0d8-4cac-89f3-24e10889a36a/artifacts/0/docs/hw_acceleration_tutorial.html

1. Update HW decoding tutorial to include file-like object
1. Add note about unseekable object int streaming API tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2408

Reviewed By: hwangjeff

Differential Revision: D36632702

Pulled By: mthrok

fbshipit-source-id: 17be2fb8528cb1d2d1ee11901b6a95c512466feb

711d6016

Split Streaming API tutorials into two (#2446) · 2d846263

moto authored Jun 07, 2022

Summary:
The Streaming API tutorial has gotten long, so this commit split it into two.

Pull Request resolved: https://github.com/pytorch/audio/pull/2446

Reviewed By: hwangjeff

Differential Revision: D36987513

Pulled By: mthrok

fbshipit-source-id: 13e3aad74c0d0e654c39c0eeceffca1a00b0dac4

2d846263

07 Jun, 2022 3 commits

Remove CTC decoder prototype message (#2459) · da3ffe9b

Caroline Chen authored Jun 07, 2022

Summary:
ctc decoder has been moved to beta, remove prototype message from tutorial

(this is done on the release branch in https://github.com/pytorch/audio/issues/2457)

Pull Request resolved: https://github.com/pytorch/audio/pull/2459

Reviewed By: hwangjeff

Differential Revision: D36978417

Pulled By: carolineechen

fbshipit-source-id: e580c1e8475a1a0aa924d44deea3852adc332a86

da3ffe9b

Add HuBERT fine-tuning recipe (#2352) · ab5edfcd

Zhaoheng Ni authored Jun 07, 2022

Summary:
The PR contains the CTC fine-tuning recipe of HuBERT Base model.
The files include:
- lightning module
- training script
- README and the result table
- evaluation scripts

Pull Request resolved: https://github.com/pytorch/audio/pull/2352

Reviewed By: hwangjeff

Differential Revision: D36915712

Pulled By: nateanl

fbshipit-source-id: 0249635ad5e81a8aa2d228c1d5fe84d78b62a15b

ab5edfcd

Update audio I/O tutorials (#2385) · 4c19e2cb

moto authored Jun 07, 2022

Summary:
- Adopt `torchaudio.utils.download_asset` to simplify asset management.
- Break down the first section about helper functions.
- Use tempfile so that executing tutorial won't leave any artifacts on local file system.

Example: https://output.circle-artifacts.com/output/job/b11a0087-8bf9-4999-a74f-b53798eaa77f/artifacts/0/docs/tutorials/audio_io_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2385

Reviewed By: hwangjeff

Differential Revision: D36404399

Pulled By: mthrok

fbshipit-source-id: 106af34e8ddd22a061aa12767b444b32aef07bad

4c19e2cb

04 Jun, 2022 1 commit

Refactor LibriSpeech Lightning datamodule to accommodate different dataset implementations (#2437) · a63629b6

Jeff Hwang authored Jun 04, 2022

Summary:
Pull Request resolved: https://github.com/pytorch/audio/pull/2437

Refactors LibriSpeech Lightning datamodule to accommodate different dataset implementations.

Reviewed By: carolineechen, nateanl

Differential Revision: D36731577

fbshipit-source-id: 4ba91044311fa3f99a928aef6ef411316955f6b5

a63629b6

03 Jun, 2022 3 commits

Update audio data augmentation tutorial (#2388) · 41082eb0

moto authored Jun 03, 2022

Summary:
- Adopt `torchaudio.utils.download_asset` to simplify asset management.
- Break down the first section about helper functions.
- Reduce the number of helper functions

https://output.circle-artifacts.com/output/job/d7dd1b93-6dfe-46da-a080-109bfdc63881/artifacts/0/docs/tutorials/audio_data_augmentation_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2388

Reviewed By: carolineechen

Differential Revision: D36404405

Pulled By: mthrok

fbshipit-source-id: f460ed810519797fce6e2fa7baaee110bddd1d06

41082eb0

Update audio resampling tutorial (#2386) · fd2be89a

moto authored Jun 03, 2022

Summary:
- Replace mis-use of plot_specgram with plot_sweep, and remove plot_specgram
- Move `benchmark_resample` to later section

https://output.circle-artifacts.com/output/job/9f7af187-777d-4d75-840f-2630a36295b7/artifacts/0/docs/tutorials/audio_resampling_tutorial.html

Pull Request resolved: https://github.com/pytorch/audio/pull/2386

Reviewed By: carolineechen

Differential Revision: D36404403

Pulled By: mthrok

fbshipit-source-id: f9df8453e3f531bdc4549b0134e5dbba90653bf7

fd2be89a

Update audio feature extraction tutorial (#2391) · 8e20d546

moto authored Jun 03, 2022

Summary:
- Adopt torchaudio.utils.download_asset to simplify asset management.
- Break down the first section about helper functions.
- Reduce the number of helper functions

Pull Request resolved: https://github.com/pytorch/audio/pull/2391

Reviewed By: carolineechen, nateanl

Differential Revision: D36885626

Pulled By: mthrok

fbshipit-source-id: 1306f22ab70ab1e7f74ed7e43bf43150015448b6

8e20d546

02 Jun, 2022 1 commit

Update MVDR beamforming tutorial (#2398) · d01f5891

Zhaoheng Ni authored Jun 01, 2022

Summary:
- Use `download_asset` to download audios.
- Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules.
- Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation.
- Visualize the spectrograms and masks.

Pull Request resolved: https://github.com/pytorch/audio/pull/2398

Reviewed By: carolineechen

Differential Revision: D36549402

Pulled By: nateanl

fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5

d01f5891

01 Jun, 2022 1 commit

Move CTC beam search decoder to beta (#2410) · 93024ace

Caroline Chen authored May 31, 2022

Summary:
Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module.

hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well??

Pull Request resolved: https://github.com/pytorch/audio/pull/2410

Reviewed By: mthrok

Differential Revision: D36784521

Pulled By: carolineechen

fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed

93024ace

26 May, 2022 1 commit
- change Adam to AdamW (#2412) · 752de3e4
  nateanl authored May 26, 2022
  
  752de3e4
23 May, 2022 1 commit

Add recipe for HuBERT model pre-training (#2198) · 48a0c17a

Zhaoheng Ni authored May 23, 2022

Summary:
Replace https://github.com/pytorch/audio/issues/2129

Pull Request resolved: https://github.com/pytorch/audio/pull/2198

Reviewed By: carolineechen

Differential Revision: D36544163

Pulled By: nateanl

fbshipit-source-id: 3f19ba5b0f2c2b9e93b0603c3b4491c1dbc40ef8

48a0c17a

21 May, 2022 1 commit

Add file-like object support to Streaming API (#2400) · a984872d

moto authored May 21, 2022

Summary:
This commit adds file-like object support to Streaming API.

## Features
- File-like objects are expected to implement `read(self, n)`.
- Additionally `seek(self, offset, whence)` is used if available.
- Without `seek` method, some formats cannot be decoded properly.
  - To work around this, one can use the existing `decoder` option to tell what decoder it should use.
  - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`.
  - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed.
  - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods.

## Code structure

The approach is very similar to how file-like object is supported in sox-based I/O.
In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind,
if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.

![Untitled drawing](https://user-images.githubusercontent.com/855818/169098391-6116afee-7b29-460d-b50d-1037bb8a359d.png)

## Refactoring involved
- Extracted to https://github.com/pytorch/audio/issues/2402
  - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding.
  - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python.
  - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly.

## TODO:
- [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding).

Pull Request resolved: https://github.com/pytorch/audio/pull/2400

Reviewed By: carolineechen

Differential Revision: D36520073

Pulled By: mthrok

fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6

a984872d

15 May, 2022 1 commit

[codemod][usort] apply import merging for fbcode (8 of 11) · d62875cc

John Reese authored May 15, 2022

Summary:
Applies new import merging and sorting from µsort v1.0.

When merging imports, µsort will make a best-effort to move associated
comments to match merged elements, but there are known limitations due to
the diynamic nature of Python and developer tooling. These changes should
not produce any dangerous runtime changes, but may require touch-ups to
satisfy linters and other tooling.

Note that µsort uses case-insensitive, lexicographical sorting, which
results in a different ordering compared to isort. This provides a more
consistent sorting order, matching the case-insensitive order used when
sorting import statements by module name, and ensures that "frog", "FROG",
and "Frog" always sort next to each other.

For details on µsort's sorting and merging semantics, see the user guide:
https://usort.readthedocs.io/en/stable/guide.html#sorting

Reviewed By: lisroach

Differential Revision: D36402214

fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c

d62875cc

13 May, 2022 1 commit

Move Streamer API out of prototype (#2378) · 72b712a1

moto authored May 13, 2022

Summary:
This commit moves the Streaming API out of prototype module.

* The related classes are renamed as following

  - `Streamer` -> `StreamReader`.
  - `SourceStream` -> `StreamReaderSourceStream`
  - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
  - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
  - `OutputStream` -> `StreamReaderOutputStream`

This change is preemptive measurement for the possibility to add
`StreamWriter` API.

* Replace BUILD_FFMPEG build arg with USE_FFMPEG

We are not building FFmpeg, so USE_FFMPEG is more appropriate

 ---

After https://github.com/pytorch/audio/issues/2377

Remaining TODOs: (different PRs)
- [ ] Introduce `is_ffmpeg_binding_available` function.
- [ ] Refactor C++ code:
   - Rename `Streamer` to `StreamReader`.
   - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
   - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
   - Introduce `stream_reader` directory.
- [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)

Pull Request resolved: https://github.com/pytorch/audio/pull/2378

Reviewed By: carolineechen

Differential Revision: D36359299

Pulled By: mthrok

fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328

72b712a1

12 May, 2022 2 commits

Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680

Zhaoheng Ni authored May 12, 2022

Summary:
- When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
- If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.

Pull Request resolved: https://github.com/pytorch/audio/pull/2296

Reviewed By: mthrok

Differential Revision: D36323217

Pulled By: nateanl

fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423

09639680

[black][codemod] formatting changes from black 22.3.0 · 595dc5d3

John Reese authored May 11, 2022

Summary:
Applies the black-fbsource codemod with the new build of pyfmt.

paintitblack

Reviewed By: lisroach

Differential Revision: D36324783

fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc

595dc5d3

11 May, 2022 1 commit

Refactor LibriSpeech Conformer RNN-T recipe (#2366) · 69467ea5

hwangjeff authored May 10, 2022

Summary:
Modifies the example LibriSpeech Conformer RNN-T recipe as follows:
- Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module).
- Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms).
- Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments).
- Modifies training script to allow for specifying path model checkpoint to restart training from.

Pull Request resolved: https://github.com/pytorch/audio/pull/2366

Reviewed By: mthrok

Differential Revision: D36305028

Pulled By: hwangjeff

fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba

69467ea5

28 Apr, 2022 1 commit

Add BUILD_MAD option and default to OFF (#2354) · a71e3a40

moto authored Apr 28, 2022

Summary:
libmad integration should be enabled only from source-build

Pull Request resolved: https://github.com/pytorch/audio/pull/2354

Reviewed By: nateanl

Differential Revision: D36012035

Pulled By: mthrok

fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7

a71e3a40

26 Apr, 2022 1 commit

Add lexicon free CTC decoder (#2342) · 97ed428d

Caroline Chen authored Apr 26, 2022

Summary:
Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation

Follow ups
- Add pretrained LM support for lex free decoding
- Add example in tutorial
- Replace flashlight C++ source code with flashlight text submodule
- [optional] fairseq compatibility test

Pull Request resolved: https://github.com/pytorch/audio/pull/2342

Reviewed By: nateanl

Differential Revision: D35856104

Pulled By: carolineechen

fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7

97ed428d

22 Apr, 2022 1 commit

Introduce DistributedBatchSampler (#2299) · 6411c9ad

Zhaoheng Ni authored Apr 22, 2022

Summary:
When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode.

The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`.

The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value.
Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same.

Pull Request resolved: https://github.com/pytorch/audio/pull/2299

Reviewed By: hwangjeff

Differential Revision: D35781538

Pulled By: nateanl

fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a

6411c9ad

21 Apr, 2022 1 commit

Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29

hwangjeff authored Apr 21, 2022

Summary:
PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.

Pull Request resolved: https://github.com/pytorch/audio/pull/2339

Reviewed By: nateanl

Differential Revision: D35806529

Pulled By: hwangjeff

fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5

6b242c29

13 Apr, 2022 2 commits

Add Conformer RNN-T LibriSpeech training recipe (#2329) · c262758b

hwangjeff authored Apr 13, 2022

Summary:
Adds Conformer RNN-T LibriSpeech training recipe to examples directory.

Produces 30M-parameter model that achieves the following WER:

|                     |          WER |
|:-------------------:|-------------:|
| test-clean          |       0.0310 |
| test-other          |       0.0805 |
| dev-clean           |       0.0314 |
| dev-other           |       0.0827 |

Pull Request resolved: https://github.com/pytorch/audio/pull/2329

Reviewed By: xiaohui-zhang

Differential Revision: D35578727

Pulled By: hwangjeff

fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d

c262758b

Add nightly build installation code snippet to prototype feature tutorials (#2325) · fb51cecc

hwangjeff authored Apr 12, 2022

Summary:
Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally.

Pull Request resolved: https://github.com/pytorch/audio/pull/2325

Reviewed By: xiaohui-zhang

Differential Revision: D35597753

Pulled By: hwangjeff

fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c

fb51cecc

05 Apr, 2022 1 commit

Disable multiprocessing when dumping features in hubert preprocessing (#2311) · f7afe29e

Zhaoheng Ni authored Apr 05, 2022

Summary:
The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2311

Reviewed By: mthrok

Differential Revision: D35393813

Pulled By: nateanl

fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11

f7afe29e

04 Apr, 2022 2 commits

Use pretrained LM API for decoder example (#2317) · 66185e00

Caroline Chen authored Apr 04, 2022

Summary:
update example ASR pipeline to use the recently added pretrained LM API for decoding

Pull Request resolved: https://github.com/pytorch/audio/pull/2317

Reviewed By: mthrok

Differential Revision: D35361354

Pulled By: carolineechen

fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46

66185e00

Fix arguments in CTC decoding script (#2315) · 4a749e2d

Zhaoheng Ni authored Apr 04, 2022

Summary:
Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser.

Pull Request resolved: https://github.com/pytorch/audio/pull/2315

Reviewed By: carolineechen

Differential Revision: D35357678

Pulled By: nateanl

fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92

4a749e2d

01 Apr, 2022 1 commit

Fix loading checkpoint in hubert preprocessing (#2310) · 87f0d198

Zhaoheng Ni authored Apr 01, 2022

Summary:
When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default.

Pull Request resolved: https://github.com/pytorch/audio/pull/2310

Reviewed By: mthrok

Differential Revision: D35316903

Pulled By: nateanl

fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed

87f0d198

25 Mar, 2022 1 commit

Add Pretrained LM Support for Decoder (#2275) · 34c0d115

Caroline Chen authored Mar 24, 2022

Summary:
add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial

Pull Request resolved: https://github.com/pytorch/audio/pull/2275

Reviewed By: mthrok

Differential Revision: D35115418

Pulled By: carolineechen

fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0

34c0d115