1. 19 Sep, 2023 1 commit
  2. 05 Sep, 2023 1 commit
    • moto's avatar
      Fix backward compatibility layer in backend module (#3595) · 931598c1
      moto authored
      Summary:
      The PR https://github.com/pytorch/audio/issues/3549 re-organized the backend implementations and deprecated the direct access to torchaudio.backend.
      
      The change was supposed to be BC-compatible while issuing a warning to users, but the implementation of module-level `__getattr__` was not quite right.
      
      See an issue https://github.com/pyannote/pyannote-audio/pull/1456.
      
      This commit fixes it so that the following imports work;
      
      ```python
      from torchaudio.backend.common import AudioMetaData
      
      from torchaudio.backend import sox_io_backend
      from torchaudio.backend.sox_io_backend import save, load, info
      
      from torchaudio.backend import no_backend
      from torchaudio.backend.no_backend import save, load, info
      
      from torchaudio.backend import soundfile_backend
      from torchaudio.backend.soundfile_backend import save, load, info
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3595
      
      Reviewed By: nateanl
      
      Differential Revision: D48957446
      
      Pulled By: mthrok
      
      fbshipit-source-id: ebb256461dd3032025fd27d0455ce980888f7778
      931598c1
  3. 04 Sep, 2023 1 commit
  4. 20 Aug, 2023 1 commit
    • moto's avatar
      Fix I/O test (#3568) · 0688863c
      moto authored
      Summary:
      Turned out FFmpeg 5 installed via conda reports video frame rate -1. FFmpeg 4 and 6 are fine. This is either a regression in FFmpeg or in the underlying decoding library.
      
      Make the reference value adoptive.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3568
      
      Reviewed By: huangruizhe
      
      Differential Revision: D48499621
      
      Pulled By: mthrok
      
      fbshipit-source-id: fb64187bcf0dc57b753cb6c05f04d436238f5c51
      0688863c
  5. 14 Aug, 2023 1 commit
  6. 11 Aug, 2023 1 commit
  7. 10 Aug, 2023 2 commits
    • Jeff Hwang's avatar
      Add Frechet distance function (#3545) · 06301c0a
      Jeff Hwang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3545
      
      Adds function for computing the Fréchet distance between two multivariate normal distributions.
      
      Reviewed By: mthrok
      
      Differential Revision: D48126102
      
      fbshipit-source-id: e4e122b831e1e752037c03f5baa9451e81ef1697
      06301c0a
    • moto's avatar
      Move backend initialization to toplevel (#3548) · 6fb21ab1
      moto authored
      Summary:
      The backend dispatcher is implemented in `torchaudio._backend`, while the legacy backend is implemented in `torchaudio.backend`.
      
      The initialization happen in `torchaudio._backend`.
      This commit moves it to `torchaudio.__init__`, so that `backend` and `_backend` is more independent.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3548
      
      Reviewed By: huangruizhe
      
      Differential Revision: D48219244
      
      Pulled By: mthrok
      
      fbshipit-source-id: e694cb232794f90902a60ee51c7bf11b7f0548a0
      6fb21ab1
  8. 09 Aug, 2023 1 commit
  9. 07 Aug, 2023 1 commit
    • moto's avatar
      Add merge_tokens / TokenSpan (#3535) · 30668afb
      moto authored
      Summary:
      This commit adds `merge_tokens` function which removes repeated tokens from CTC token sequences returned from `forced_align`.
      
      Resolving repeated tokens is a necessary step and almost universal, thus it makes sense to have such helper function in torchaudio.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3535
      
      Reviewed By: huangruizhe
      
      Differential Revision: D48111202
      
      Pulled By: mthrok
      
      fbshipit-source-id: 25354bfa210aa5c03f8c1d3e201f253ca3761b24
      30668afb
  10. 03 Aug, 2023 1 commit
  11. 01 Aug, 2023 1 commit
  12. 31 Jul, 2023 1 commit
  13. 29 Jul, 2023 1 commit
    • moto's avatar
      Refactor compat (#3518) · 8497ee91
      moto authored
      Summary:
      The I/O functions in _compat module was introduced there so that
      everything related to FFmpeg is in torchaudio.io and FFmpeg library
      initialization can be carried out in `torchaudio.io.__init__`.
      
      Now that this constraint is removed, (all the initialization happens
      at `torchaudio._extension.__init__`) and `_compat` is only used by
      FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
      for better locality.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3518
      
      Reviewed By: huangruizhe
      
      Differential Revision: D47877412
      
      Pulled By: mthrok
      
      fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f
      8497ee91
  14. 28 Jul, 2023 2 commits
  15. 26 Jul, 2023 1 commit
  16. 25 Jul, 2023 1 commit
  17. 17 Jul, 2023 1 commit
  18. 12 Jul, 2023 1 commit
    • moto's avatar
      Support multiple FFmpeg versions (#3464) · 786066b4
      moto authored
      Summary:
      This commit introduces support for multiple FFmpeg versions for OSS binary distributions.
      
      Currently torchaudio only works with FFmpeg 4. This is inconvenient from installing to runtime linking.
      This commit allows to pick FFmpeg 4, 5 or 6 at runtime, instead of just looking for v4.
      
      The way it works is that we compile the FFmpeg extension three times with different FFmpeg and ship them.
      At runtime, we look for libavutil of specific version and when one is found, load the corresponding FFmpeg extension.
      The order of preference is 6, 5, then 4.
      
      To make the build process simple and reproducible, we use pre-built binaries of FFmpeg during the build.
      They are LGPL and downloaded from S3 at build time, instead of building every time.
      
      The use of pre-built binaries as scaffolding limits the system that can build torchaudio, so it also introduces
      single FFmpeg version support mode. setting FFMPEG_ROOT during the build will change the way binaries are built
      so that it will only support one specific version of FFmpeg.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3464
      
      Differential Revision: D47300223
      
      Pulled By: mthrok
      
      fbshipit-source-id: 560c7968315e4c8922afa11a4693f648c0356d04
      786066b4
  19. 10 Jul, 2023 1 commit
  20. 05 Jul, 2023 1 commit
  21. 21 Jun, 2023 1 commit
  22. 14 Jun, 2023 1 commit
  23. 09 Jun, 2023 1 commit
  24. 08 Jun, 2023 2 commits
    • Jeff Hwang's avatar
      Introduce chroma filter bank function (#3395) · dfd0c5fd
      Jeff Hwang authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/3395
      
      Adds chroma filter bank function `chroma_filterbank` to `torchaudio.prototype.functional`.
      
      Reviewed By: mthrok
      
      Differential Revision: D46307672
      
      fbshipit-source-id: c5d8104a8bb03da70d0629b5cc224e0d897148d5
      dfd0c5fd
    • moto's avatar
      Delay the initialization of CUDA tensor converter (#3419) · 7dff24ca
      moto authored
      Summary:
      StreamReader decoding process is composed of the three steps;
      
      1. Decode the incoming AVPacket into AVFrame
      2. Pass AVFrame through AVFilter to perform post process
      3. Convert the resulgint AVFrame
      
      The internal of StreamReader was refactored in https://github.com/pytorch/audio/issues/3188 so that the above pipeline is initialized at the time output stream is defined and output stream shape can be retrieved.
      
      For CPU decoder, this works fine because resizing happens in step 2, and the resulting shape can be retrievable.
      However, this is problematic for GPU decoder, as resizing is currently done using GPU decoder option (step 1) and there seems to be no interface to retrieve the output shape. This refactor introduced regression, which is described in https://github.com/pytorch/audio/issues/3405
      
      AVFilter internally is adoptive to the change of input frame size. This commit changes the conversion process to be similar, so that it will wait until the first frame comes in to finalize the frame shape.
      
      Fix https://github.com/pytorch/audio/issues/3405
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3419
      
      Differential Revision: D46557505
      
      Pulled By: mthrok
      
      fbshipit-source-id: 46ad2d82c8c30f368ebfbaf6947718a5036c7dc6
      7dff24ca
  25. 07 Jun, 2023 1 commit
  26. 06 Jun, 2023 3 commits
  27. 02 Jun, 2023 1 commit
    • moto's avatar
      [BC-Breaking] Remove compute_kaldi_pitch (#3368) · 5bbbb1d5
      moto authored
      Summary:
      This commit removes compute_kaldi_pitch function and the underlying Kaldi integration from torchaudio.
      
      Kaldi pitch function was added in a short period of time by integrating the original Kaldi implementation, instead of reimplementing it in PyTorch.
      
      The Kaldi integration employed a hack which replaces the base vector/matrix implementation of Kaldi with PyTorch Tensor so that there is only one blas library within torchaudio.
      
      Recently, we are making torchaudio more lean, and we don't see a wide adoption of kaldi_pitch feature, so we decided to remove them.
      
      See some of the discussion https://github.com/pytorch/audio/issues/1269
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3368
      
      Differential Revision: D46406176
      
      Pulled By: mthrok
      
      fbshipit-source-id: ee5e24d825188f379979ddccd680c7323b119b1e
      5bbbb1d5
  28. 01 Jun, 2023 3 commits
  29. 30 May, 2023 1 commit
  30. 27 May, 2023 1 commit
    • moto's avatar
      Fix AudioEffector for mulaw (#3372) · af932cc7
      moto authored
      Summary:
      When encoding audio with mulaw, the resulting data does not have header, and the StreamReader defaults to 16k Hz, which can strech/shrink the resulting waveform.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3372
      
      Reviewed By: hwangjeff
      
      Differential Revision: D46234772
      
      Pulled By: mthrok
      
      fbshipit-source-id: 942c89a8cfe29b0b6f57b3e5b6c9dfd3524ca552
      af932cc7
  31. 26 May, 2023 3 commits
    • moto's avatar
      Fix encoding g722 format (#3373) · 1b05ca7e
      moto authored
      Summary:
      g722 format only supports 16k Hz, but AVCodec does not list this. The implementation does not insert resampling and the resulting audio can be slowed down or sped up.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3373
      
      Reviewed By: hwangjeff
      
      Differential Revision: D46233181
      
      Pulled By: mthrok
      
      fbshipit-source-id: 902b3f862a8f7269dc35bc871e868b0e78326c6c
      1b05ca7e
    • Zhaoheng Ni's avatar
      Temporarily remove test for extract_features (#3378) · 05649ca3
      Zhaoheng Ni authored
      Summary:
      The tests failed for several bundles. Remove them and will re-add once the root cause is figured out.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3378
      
      Reviewed By: atalman
      
      Differential Revision: D46230884
      
      Pulled By: nateanl
      
      fbshipit-source-id: 42056a29b2ec2335268b273d3e37fb517035be92
      05649ca3
    • Lakshmi Krishnan's avatar
      Improve RNN-T streaming decoding (#3295) · 9fc0dcaa
      Lakshmi Krishnan authored
      Summary:
      This commit fixes the following issues affecting streaming decoding quality
      1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided.
      2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step.  This dramatically affects decoding quality especially for speech with long pauses and disfluencies.
      3. Some minor errors regarding shape checking for length.
      
      This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3295
      
      Reviewed By: nateanl
      
      Differential Revision: D46216113
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
      9fc0dcaa