- 24 Jul, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3489 Reviewed By: mthrok Differential Revision: D47726448 Pulled By: mpc001 fbshipit-source-id: 3d5aa7646c6bb816dcbbf70c61e98404bb148841
-
- 16 Jun, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset. This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability. Pull Request resolved: https://github.com/pytorch/audio/pull/3421 Reviewed By: mpc001 Differential Revision: D46799748 Pulled By: mthrok fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
-
- 06 Jun, 2023 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3410 Differential Revision: D46496786 Pulled By: mthrok fbshipit-source-id: e517b273c40b340f39ce7db7ab1be1c3eb5f2059
-
- 25 May, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes. CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff Pull Request resolved: https://github.com/pytorch/audio/pull/3278 Reviewed By: nateanl Differential Revision: D46121550 Pulled By: mpc001 fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
-