1. 12 Jul, 2023 5 commits
    • moto's avatar
      Resolve some compilation warnings (#3471) · a6d1fec0
      moto authored
      Summary:
      - FFmpeg 6 deprecated attributes
      - Guard CUDA specific functions not used in CPU builds
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3471
      
      Differential Revision: D47402174
      
      Pulled By: mthrok
      
      fbshipit-source-id: 00c0719ab1849b50c0b56b03d8fb38bc7aa74538
      a6d1fec0
    • Bogdan Teleaga's avatar
      Fix resampling to support dynamic input lengths for onnx exports. (#3473) · a3b6bfb6
      Bogdan Teleaga authored
      Summary:
      This is a port of https://github.com/adefossez/julius/pull/17 for torchaudio.
      
      Not sure if it's possible/desirable to add tests to test the functionality of ONNX exports, but I did a quick test on my machine to ensure this works. The logic is a bit simpler compared to the other PR because the torchaudio version does not support the additional flags available in julius.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3473
      
      Differential Revision: D47401988
      
      Pulled By: mthrok
      
      fbshipit-source-id: 62fa1e4388923f6a62cef2c0f902a79ea179cec4
      a3b6bfb6
    • moto's avatar
      Use FFmpeg6 in build doc (#3475) · 989702b3
      moto authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3475
      
      Differential Revision: D47403772
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5cdde521dbbbbf33856470a9dc79419b4a3a1683
      989702b3
    • Moto Hira's avatar
      Fix FFmpeg initialization logic (#3474) · 49e269ab
      Moto Hira authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3474
      
      Differential Revision: D47398447
      
      fbshipit-source-id: f77b685d54ddfc222b806475707d4a10239872f5
      49e269ab
    • moto's avatar
      Support multiple FFmpeg versions (#3464) · 786066b4
      moto authored
      Summary:
      This commit introduces support for multiple FFmpeg versions for OSS binary distributions.
      
      Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
      This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.
      
      The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
      At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
      The order of preference is 6, 5, then 4.
      
      To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
      They are LGPL and downloaded from S3 at build time, instead of building every time.
      
      The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
      single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
      so that it will only support one specific version of FFmpeg.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3464
      
      Differential Revision: D47300223
      
      Pulled By: mthrok
      
      fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04
      786066b4
  2. 11 Jul, 2023 4 commits
  3. 10 Jul, 2023 1 commit
  4. 07 Jul, 2023 3 commits
  5. 06 Jul, 2023 2 commits
  6. 05 Jul, 2023 4 commits
  7. 03 Jul, 2023 1 commit
  8. 28 Jun, 2023 2 commits
  9. 26 Jun, 2023 1 commit
  10. 21 Jun, 2023 2 commits
  11. 16 Jun, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 data preparation (#3421) · 77cdd160
      Pingchuan Ma authored
      Summary:
      This PR adds a data preparation recipe that uses the ultra face detector to extract full-face video. The resulting video output is then used as input for training and evaluating RNNT-based models for automatic speech recognition (ASR), visual speech recognition (VSR), and audio-visual ASR (AV-ASR) on the LRS3 dataset.
      
      This PR also updates the word error rate (WER) for AV-ASR LRS3 models and improves the code readability.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3421
      
      Reviewed By: mpc001
      
      Differential Revision: D46799748
      
      Pulled By: mthrok
      
      fbshipit-source-id: 97af3feac0592b240617faaffa4c0ac8cef614a9
      77cdd160
  12. 15 Jun, 2023 1 commit
    • moto's avatar
      Update forced alignment tutorial (#3440) · 18601691
      moto authored
      Summary:
      * Fix backtrack visualization (the cooridnate was off-by-one.)
      * Add note about the simplification and the new align API
      * Explicitly handle SOS and EOS
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3440
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D46761282
      
      Pulled By: mthrok
      
      fbshipit-source-id: b0b6c9754674e8e23543e9f002e29b55102c92f8
      18601691
  13. 14 Jun, 2023 1 commit
  14. 13 Jun, 2023 2 commits
  15. 12 Jun, 2023 1 commit
  16. 09 Jun, 2023 3 commits
  17. 08 Jun, 2023 6 commits
    • Jeff Hwang's avatar
      Introduce chroma filter bank function (#3395) · dfd0c5fd
      Jeff Hwang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3395
      
      Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.
      
      Reviewed By: mthrok
      
      Differential Revision: D46307672
      
      fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5
      dfd0c5fd
    • atalman's avatar
      [Nova] Add cache ffmpeg before building #2 (#3423) · 25e96f42
      atalman authored
      Summary:
      [Nova] Add cache ffmpeg before building - 2
      Follow up after https://github.com/pytorch/audio/pull/3417, need to pass new arguments to test-infra workflows
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3423
      
      Reviewed By: mthrok
      
      Differential Revision: D46559344
      
      Pulled By: atalman
      
      fbshipit-source-id: fa5cccc3bfb052688de4a05cc3b4f37fcbe3a6f5
      25e96f42
    • moto's avatar
      Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca
      moto authored
      Summary:
      StreamReader decoding process is composed of the three steps;
      
      1. Decode the incoming AVPacket into AVFrame
      2. Pass AVFrame through AVFilter to perform post process
      3. Convert the resulgint AVFrame
      
      The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.
      
      For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
      However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405
      
      AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.
      
      Fix https://github.com/pytorch/audio/issues/3405
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3419
      
      Differential Revision: D46557505
      
      Pulled By: mthrok
      
      fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
      7dff24ca
    • moto's avatar
      Remove CCI badge from README (#3420) · a7fea8a6
      moto authored
      Summary:
      CI jobs are migrated from CCI to GHA
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3420
      
      Differential Revision: D46548562
      
      Pulled By: mthrok
      
      fbshipit-source-id: d7e17201e8b256efaa54543e445a0f139aa549b2
      a7fea8a6
    • moto's avatar
      Clean up CI scripts (#3407) · f0803152
      moto authored
      Summary:
      - Moving the unit test scripts from .circleci to .github
      - Remove docker file for unit test base
      - Use the Conda from Docker image in Linux jobs.
      
      Remaining follow-up items
      
      - Reuse the unittest script in Linux GPU job like done in Linux CPU job.
      
      The unit test script needs to be fixed to be used for Linux GPU job
      in new GHA workflow. Keeping it as a separate follow-up work item.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3407
      
      Differential Revision: D46498263
      
      Pulled By: mthrok
      
      fbshipit-source-id: d8256717a55bb4257151d819d3b2ebd453601eac
      f0803152
    • Kuba Rad's avatar
      Optimize Torchaudio Vad (#3382) · 1e117f57
      Kuba Rad authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3382
      
      The voice activity detector function was unoptimized, confusingly written, and buggy.
      
      The optimizations created here allow for the function to run roughly 17x faster.
      The main optimizations were to loop over windows of audio rather than individual audio samples. Reducing the number of copies also helped.
      
      There was an off by one error where the array slice referenced was [1: 16001] (for the default settings) instead of [0: 16000]
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44749359
      
      fbshipit-source-id: c76c9412e70cdc6fcd527d113603c88f78480558
      1e117f57