- 08 Sep, 2023 1 commit
-
-
Pingchuan Ma authored
* Simplify trainining step in av-asr recipe * Run pre-commit
-
- 25 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe. Pull Request resolved: https://github.com/pytorch/audio/pull/3493 Reviewed By: mthrok Differential Revision: D47758072 Pulled By: mpc001 fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc
-
- 24 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489 Reviewed By: mthrok Differential Revision: D47726448 Pulled By: mpc001 fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841
-
- 06 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410 Differential Revision: D46496786 Pulled By: mthrok fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059
-
- 25 May, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes. CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff Pull Request resolved: https://github.com/pytorch/audio/pull/3278 Reviewed By: nateanl Differential Revision: D46121550 Pulled By: mpc001 fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
-
- 19 Jan, 2023 1 commit
-
-
hwangjeff authored
Summary: In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead. Pull Request resolved: https://github.com/pytorch/audio/pull/2981 Reviewed By: mthrok Differential Revision: D42507228 Pulled By: hwangjeff fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
-
- 11 Jul, 2022 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2535 Modifies LibriSpeech Conformer RNN-T example recipe to make the Lightning module and datamodule more generic and reusable. Reviewed By: mthrok Differential Revision: D36731576 fbshipit-source-id: 4643e86fac78f3c2bacc15f5d385bc7b10f410a2
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402214 fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
-
- 11 May, 2022 1 commit
-
-
hwangjeff authored
Summary: Modifies the example LibriSpeech Conformer RNN-T recipe as follows: - Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module). - Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms). - Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments). - Modifies training script to allow for specifying path model checkpoint to restart training from. Pull Request resolved: https://github.com/pytorch/audio/pull/2366 Reviewed By: mthrok Differential Revision: D36305028 Pulled By: hwangjeff fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba
-
- 21 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2339 Reviewed By: nateanl Differential Revision: D35806529 Pulled By: hwangjeff fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
-
- 13 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T LibriSpeech training recipe to examples directory. Produces 30M-parameter model that achieves the following WER: | | WER | |:-------------------:|-------------:| | test-clean | 0.0310 | | test-other | 0.0805 | | dev-clean | 0.0314 | | dev-other | 0.0827 | Pull Request resolved: https://github.com/pytorch/audio/pull/2329 Reviewed By: xiaohui-zhang Differential Revision: D35578727 Pulled By: hwangjeff fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d
-