1. 23 Aug, 2022 1 commit
  2. 22 Aug, 2022 2 commits
  3. 20 Aug, 2022 1 commit
  4. 19 Aug, 2022 2 commits
    • Moto Hira's avatar
      Refactor sox pybind source code (#2636) · 789adf07
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2636
      
      At the early stage of torchaudio extension module,
      `torchaudio/csrc/pybind` directory was created so that
      all the code defining Python interface would be placed
      there and there will be only one extension module called
      `torchaudio._torchaudio`.
      
      However, the codebase has been evolved in a way separate
      extensions are defined for each feature (third party
      dependency) for the sake of more moduler file organization.
      
      What is left in `csrc/pybind` is libsox Python bindings.
      This commit moves it under `csrc/sox`.
      
      Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.
      
      Reviewed By: carolineechen
      
      Differential Revision: D38829253
      
      fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d
      789adf07
    • moto's avatar
      Update README.md (#2633) · 0b7f2fba
      moto authored
      Summary:
      Update compatibility matrix
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2633
      
      Reviewed By: nateanl
      
      Differential Revision: D38827670
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100
      0b7f2fba
  5. 18 Aug, 2022 6 commits
  6. 16 Aug, 2022 4 commits
  7. 15 Aug, 2022 3 commits
  8. 12 Aug, 2022 1 commit
  9. 11 Aug, 2022 1 commit
  10. 10 Aug, 2022 3 commits
  11. 09 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add NNLM support to CTC Decoder (#2528) · 03a0d68e
      Caroline Chen authored
      Summary:
      Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.
      
      The `ctc_decoder` API is as follows
      - To decode with KenLM, pass in KenLM language model path to `lm` variable
      - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
      - To decode without a language model, set `lm` to `None` (default)
      
      Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.
      
      Follow ups:
      - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory
      
      cc jacobkahn
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2528
      
      Reviewed By: mthrok
      
      Differential Revision: D38243802
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
      03a0d68e
  12. 08 Aug, 2022 1 commit
  13. 05 Aug, 2022 4 commits
  14. 04 Aug, 2022 1 commit
  15. 03 Aug, 2022 2 commits
    • Sean Kim's avatar
      Add HDEMUCS_HIGH_MUSDB (#2601) · 6ecc11c2
      Sean Kim authored
      Summary:
      Add new model pretrained weights and tests
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2601
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D38396673
      
      Pulled By: skim0514
      
      fbshipit-source-id: e06f97d28508543bc18e671344386a947bc870c1
      6ecc11c2
    • bshall's avatar
      An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a
      bshall authored
      Summary:
      I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
      - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
      - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
      - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
      - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?
      
      I hope this is helpful! looking forward to hearing from you.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2472
      
      Reviewed By: hwangjeff
      
      Differential Revision: D38389155
      
      Pulled By: carolineechen
      
      fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
      946b180a
  16. 02 Aug, 2022 1 commit
  17. 01 Aug, 2022 3 commits
  18. 30 Jul, 2022 1 commit
  19. 29 Jul, 2022 2 commits
    • moto's avatar
      Update forced alignment tutorial (#2544) · c26b38b2
      moto authored
      Summary:
      1. Fix initialization.
      Previously, the SOS token score was initialized to 0 across the time axis.
      This was biasing the alignment to delay the start.
      The proper way to delay the SOS is via blank token.
      The new initilization takes the cumulated sum of blank scores.
      2. Fill the end of trellis with Inf
      Similar to the start, at the end where there remaining time frame is less
      than the number of tokens, it is no longer possible to align the text, thus
      we fill with Inf for better visualization.
      3. Clean up asset management code.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2544
      
      Reviewed By: nateanl
      
      Differential Revision: D38276478
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6d934cc850a0790b8c463a4f69f8f1143633d299
      c26b38b2
    • moto's avatar
      Enable CTC decoder in Windows (#2587) · 67cb420d
      moto authored
      Summary:
      This commit enables CTC decoder on Windows.
      
      The functionality seems to work fine.
      The tests are passing, the decoding tutorial runs fine.
      
      The only difference to the Linux/macOS version is that
      loading model in XZ compression format is not supported.
      
      ![289961785_399620772041679_7768117002438616376_n](https://user-images.githubusercontent.com/855818/181420923-cfbd8402-20de-4e63-b9e4-e39f9aa9fc50.png)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2587
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D38276490
      
      Pulled By: mthrok
      
      fbshipit-source-id: f2203b2235c5bbb0220fe560aaaf0e1d5530347a
      67cb420d