1. 01 Jun, 2023 7 commits
  2. 31 May, 2023 6 commits
  3. 30 May, 2023 3 commits
  4. 29 May, 2023 1 commit
  5. 27 May, 2023 1 commit
    • moto's avatar
      Fix AudioEffector for mulaw (#3372) · af932cc7
      moto authored
      Summary:
      When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3372
      
      Reviewed By: hwangjeff
      
      Differential Revision: D46234772
      
      Pulled By: mthrok
      
      fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552
      af932cc7
  6. 26 May, 2023 6 commits
    • moto's avatar
      Fix encoding g722 format (#3373) · 1b05ca7e
      moto authored
      Summary:
      g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3373
      
      Reviewed By: hwangjeff
      
      Differential Revision: D46233181
      
      Pulled By: mthrok
      
      fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c
      1b05ca7e
    • Huy Do's avatar
      Use the same CUDNN version on Windows as PyTorch (#3380) · c120f316
      Huy Do authored
      Summary:
      11.7 uses 8.5.0; 11.8 uses 8.7.0; 12.1 uses 8.8.1.  Otherwise, Windows vision job (8.5.0) would overwrite the CUDNN version setup by PyTorch (8.7.0) leading to this flaky failures https://github.com/pytorch/pytorch/actions/runs/5088860652/jobs/9146641450
      
      ```
      RuntimeError: cuDNN version incompatibility: PyTorch was compiled  against (8, 7, 0) but found runtime version (8, 5, 0). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN.
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3380
      
      Reviewed By: atalman
      
      Differential Revision: D46236286
      
      Pulled By: huydhn
      
      fbshipit-source-id: 9ca12d5068c3029688347d52c5c284488f33728d
      c120f316
    • atalman's avatar
      Use cuda 11.8 for circleci tests (#3381) · 5c0249b0
      atalman authored
      Summary:
      Use cuda 11.8 for circleci tests.
      11.7 was deprecated
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3381
      
      Reviewed By: osalpekar
      
      Differential Revision: D46236223
      
      Pulled By: atalman
      
      fbshipit-source-id: 6d6a8e09603807a07241f31c1bd1e6d3a2b67d9d
      5c0249b0
    • Zhaoheng Ni's avatar
      Temporarily remove test for extract_features (#3378) · 05649ca3
      Zhaoheng Ni authored
      Summary:
      The tests failed for several bundles. Remove them and will re-add once the root cause is figured out.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3378
      
      Reviewed By: atalman
      
      Differential Revision: D46230884
      
      Pulled By: nateanl
      
      fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92
      05649ca3
    • atalman's avatar
      Revert "Upgrade to FFmpeg5 (#3298)" (#3377) · 37779ef9
      atalman authored
      Summary:
      This reverts commit d38a7854.
      
      This is temporary revert to unblock unit test migration from circleci to github
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3377
      
      Reviewed By: mthrok
      
      Differential Revision: D46230498
      
      Pulled By: atalman
      
      fbshipit-source-id: 000d8a9ca00750fc1ca61f4c2cdd6e930a5ce46d
      37779ef9
    • Lakshmi Krishnan's avatar
      Improve RNN-T streaming decoding (#3295) · 9fc0dcaa
      Lakshmi Krishnan authored
      Summary:
      This commit fixes the following issues affecting streaming decoding quality
      1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
      2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step.  This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
      3. Some minor errors regarding shape checking for length.
      
      This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3295
      
      Reviewed By: nateanl
      
      Differential Revision: D46216113
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
      9fc0dcaa
  7. 25 May, 2023 1 commit
    • Pingchuan Ma's avatar
      Add LRS3 AV-ASR recipe (#3278) · c6624fa6
      Pingchuan Ma authored
      Summary:
      This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes.
      
      CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3278
      
      Reviewed By: nateanl
      
      Differential Revision: D46121550
      
      Pulled By: mpc001
      
      fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
      c6624fa6
  8. 24 May, 2023 6 commits
  9. 23 May, 2023 6 commits
  10. 22 May, 2023 3 commits