• Pingchuan Ma's avatar
    Add LRS3 data preparation (#3421) · 77cdd160
    Pingchuan Ma authored
    Summary:
    This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.
    
    This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.
    
    Pull Request resolved: https://github.com/pytorch/audio/pull/3421
    
    Reviewed By: mpc001
    
    Differential Revision: D46799748
    
    Pulled By: mthrok
    
    fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
    77cdd160
utils.py 2.38 KB