1. 03 Oct, 2023 1 commit
  2. 08 Sep, 2023 1 commit
  3. 25 Jul, 2023 1 commit
    • Pingchuan Ma's avatar
      Update avsr recipe (#3493) · d4644793
      Pingchuan Ma authored
      Summary:
      This PR is to include few changes in the AV-ASR recipe. The changes include better results, a faster face detector (Mediapipe), renamed variable names, a streamlined dataloader, and a few illustrated examples. These changes were made to improve the usability of the recipe.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3493
      
      Reviewed By: mthrok
      
      Differential Revision: D47758072
      
      Pulled By: mpc001
      
      fbshipit-source-id: 4533587776f3a7a74f3f11b0ece773a0934bacdc
      d4644793
  4. 24 Jul, 2023 1 commit
  5. 16 Jun, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 data preparation (#3421) · 77cdd160
      Pingchuan Ma authored
      Summary:
      This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.
      
      This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3421
      
      Reviewed By: mpc001
      
      Differential Revision: D46799748
      
      Pulled By: mthrok
      
      fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
      77cdd160
  6. 06 Jun, 2023 1 commit
  7. 25 May, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 AV-ASR recipe (#3278) · c6624fa6
      Pingchuan Ma authored
      Summary:
      This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.
      
      CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3278
      
      Reviewed By: nateanl
      
      Differential Revision: D46121550
      
      Pulled By: mpc001
      
      fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
      c6624fa6