1. 07 Aug, 2023 2 commits
    • moto's avatar
      Add merge_tokens / TokenSpan (#3535) · 30668afb
      moto authored
      Summary:
      This commit adds `merge_tokens` function which removes repeated tokens from CTC token sequences returned from `forced_align`.
      
      Resolving repeated tokens is a necessary step and almost universal, thus it makes sense to have such helper function in torchaudio.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3535
      
      Reviewed By: huangruizhe
      
      Differential Revision: D48111202
      
      Pulled By: mthrok
      
      fbshipit-source-id: 25354bfa210aa5c03f8c1d3e201f253ca3761b24
      30668afb
    • moto's avatar
      Make target_lengths/input_lengths in forced_align optional (#3533) · cd80976e
      moto authored
      Summary:
      Currently `torchaudio.functional.forced_align` function requires full information on input/target lengths.
      When performing non-batched alignment, these can be inferred from the size of Tensor.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3533
      
      Reviewed By: nateanl
      
      Differential Revision: D48111041
      
      Pulled By: mthrok
      
      fbshipit-source-id: fbf07124d3959c5cc5533dcd86296851587082fb
      cd80976e
  2. 04 Aug, 2023 2 commits
  3. 03 Aug, 2023 2 commits
  4. 02 Aug, 2023 1 commit
  5. 01 Aug, 2023 3 commits
  6. 31 Jul, 2023 2 commits
  7. 29 Jul, 2023 1 commit
    • moto's avatar
      Refactor compat (#3518) · 8497ee91
      moto authored
      Summary:
      The I/O functions in _compat module was introduced there so that
      everything related to FFmpeg is in torchaudio.io and FFmpeg library
      initialization can be carried out in `torchaudio.io.__init__`.
      
      Now that this constraint is removed, (all the initialization happens
      at `torchaudio._extension.__init__`) and `_compat` is only used by
      FFmpeg dispatcher backend, we move the module to `torchaudio._backend`
      for better locality.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3518
      
      Reviewed By: huangruizhe
      
      Differential Revision: D47877412
      
      Pulled By: mthrok
      
      fbshipit-source-id: aa18c8cb6e5d5360950df5158c33c653e37c565f
      8497ee91
  8. 28 Jul, 2023 5 commits
  9. 27 Jul, 2023 3 commits
  10. 26 Jul, 2023 3 commits
  11. 25 Jul, 2023 7 commits
  12. 24 Jul, 2023 1 commit
  13. 18 Jul, 2023 1 commit
  14. 17 Jul, 2023 1 commit
  15. 15 Jul, 2023 2 commits
  16. 14 Jul, 2023 1 commit
    • moto's avatar
      Update the logic to fetch pixel format from filter graph (#3479) · cf53a486
      moto authored
      Summary:
      When using GPU decoder in some environments, attempting to read the output formats from filter graph caused an issue in which the software pixel format cannot be determined.
      
      We do not know the exact cause but when it happens, the input link of buffer sink does not have HW frames context.
      
      Since currently no filter can convert the pixel format of CUDA frame, we resort to the HW frames context of the output link of buffer source.
      
      Environments this was observed.
      
      Env1
      - OS: Fedora 36 (x86_64)
      - GCC 12.2.1
      - Python 3.10.12
      - GPU: GeForce RTX 3070 Ti Laptop GPU
      - FFmpeg: 5.1.3
      - nv-codec-header: n11.1.5.2
      - CUDA: 12.1
      
      Env2
      - Ubuntu 20.04.4 LTS (x86_64)
      - GCC 9.4.0
      - Python 3.11.3
      - GPU: Quadro GV100
      - FFmpeg: 5.1.3
      - nv-codec-header: n11.1.5.2
      - CUDA: 11.4
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3479
      
      Differential Revision: D47482407
      
      Pulled By: mthrok
      
      fbshipit-source-id: 1c53096b27824453b260138ab64e1948afeeefc7
      cf53a486
  17. 13 Jul, 2023 2 commits
  18. 12 Jul, 2023 1 commit