- 03 Oct, 2023 1 commit
-
-
Oren Amsalem authored
* Update transforms.py * Update train.py
-
- 08 Sep, 2023 1 commit
-
-
Pingchuan Ma authored
* Simplify trainining step in av-asr recipe * Run pre-commit
-
- 25 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe. Pull Request resolved: https://github.com/pytorch/audio/pull/3493 Reviewed By: mthrok Differential Revision: D47758072 Pulled By: mpc001 fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc
-
- 24 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489 Reviewed By: mthrok Differential Revision: D47726448 Pulled By: mpc001 fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841
-
- 16 Jun, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset. This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability. Pull Request resolved: https://github.com/pytorch/audio/pull/3421 Reviewed By: mpc001 Differential Revision: D46799748 Pulled By: mthrok fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
-
- 06 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410 Differential Revision: D46496786 Pulled By: mthrok fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059
-
- 25 May, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes. CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff Pull Request resolved: https://github.com/pytorch/audio/pull/3278 Reviewed By: nateanl Differential Revision: D46121550 Pulled By: mpc001 fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
-