1. 10 Oct, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add unit test for LibriMix dataset (#2659) · c5b8e585
      Zhaoheng Ni authored
      Summary:
      Besides the unit test, the PR also addresses these issues:
      - The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use.
      - If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2659
      
      Reviewed By: carolineechen
      
      Differential Revision: D40229227
      
      Pulled By: nateanl
      
      fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235
      c5b8e585
  2. 09 Oct, 2022 1 commit
  3. 07 Oct, 2022 1 commit
  4. 21 Sep, 2022 1 commit
  5. 12 Sep, 2022 1 commit
  6. 01 Sep, 2022 1 commit
  7. 24 Aug, 2022 1 commit
    • moto's avatar
      Add StreamWriter (#2628) · 72404de9
      moto authored
      Summary:
      This commit adds FFmpeg-based encoder StreamWriter class.
      StreamWriter is pretty much the opposite of StreamReader class, and
      it supports;
      
      * Encoding audio / still image / video
      * Exporting to local file / streaming protocol / devices etc...
      * File-like object support (in later commit)
      * HW video encoding (in later commit)
      
      See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2628
      
      Reviewed By: nateanl
      
      Differential Revision: D38816650
      
      Pulled By: mthrok
      
      fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8
      72404de9
  8. 11 Aug, 2022 1 commit
  9. 09 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add NNLM support to CTC Decoder (#2528) · 03a0d68e
      Caroline Chen authored
      Summary:
      Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.
      
      The `ctc_decoder` API is as follows
      - To decode with KenLM, pass in KenLM language model path to `lm` variable
      - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
      - To decode without a language model, set `lm` to `None` (default)
      
      Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.
      
      Follow ups:
      - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory
      
      cc jacobkahn
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2528
      
      Reviewed By: mthrok
      
      Differential Revision: D38243802
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
      03a0d68e
  10. 05 Aug, 2022 1 commit
    • hwangjeff's avatar
      Add convolution operator (#2602) · b396157d
      hwangjeff authored
      Summary:
      Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2602
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D38450771
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b
      b396157d
  11. 03 Aug, 2022 1 commit
    • bshall's avatar
      An implemenation of the ITU-R BS.1770-4 loudness recommendation (#2472) · 946b180a
      bshall authored
      Summary:
      I took a stab at implementing the ITU-R BS.1770-4 loudness recommendation (closes https://github.com/pytorch/audio/issues/1205). To give some more details:
      - I've implemented K-weighting following csteinmetz1 instead of BrechtDeMan since it fit well with torchaudio's already implemented filters (`treble_biquad` and `highpass_biquad`).
      - I've added four audio files to test compliance with the recommendation. These are linked in [this pdf](https://www.itu.int/dms_pub/itu-r/opb/rep/R-REP-BS.2217-2-2016-PDF-E.pdf). There are many more test files there but I didn't want to bog down the assets directory with too many files. Let me know if I should add or remove anything.
      - I've kept many of the constant internal to the function (e.g. the block duration, overlap, and the absolute threshold gamma). I'm not sure if these should be exposed in the signature.
      - I've implemented support for up to 5 channels (following both csteinmetz1 and BrechtDeMan). The recommendation includes weights for up to 24 channels. Is there any convention for how many channels to support?
      
      I hope this is helpful! looking forward to hearing from you.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2472
      
      Reviewed By: hwangjeff
      
      Differential Revision: D38389155
      
      Pulled By: carolineechen
      
      fbshipit-source-id: fcc86d864c04ab2bedaa9acd941ebc4478ca6904
      946b180a
  12. 28 Jul, 2022 2 commits
  13. 19 Jul, 2022 1 commit
  14. 12 Jul, 2022 1 commit
  15. 07 Jul, 2022 1 commit
  16. 06 Jul, 2022 1 commit
    • Caroline Chen's avatar
      Fix fluent test for windows (#2510) · 09daa438
      Caroline Chen authored
      Summary:
      fluent dataset test currently fails on windows, due to new line generation in csv writer in testing and incorrect path parsing in dataset impl.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2510
      
      Reviewed By: carolineechen
      
      Differential Revision: D37573203
      
      Pulled By: mthrok
      
      fbshipit-source-id: 4868bc649690c7e596b002686c6128ce735d3564
      09daa438
  17. 28 Jun, 2022 1 commit
    • moto's avatar
      Refactor AVDictionary clean up (#2507) · 0ad03adf
      moto authored
      Summary:
      Small clean up in ffmpeg binding code.
      
      1. Make `get_option_dict` and `clean_up_dict` public utility
      2. Merge the exception into `clean_up_dict`
      3. Get rid of custom string join function and use `c10::Join`.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2507
      
      Reviewed By: hwangjeff
      
      Differential Revision: D37466022
      
      Pulled By: mthrok
      
      fbshipit-source-id: 44b769ac6ff1ab20e6d6ae086cd1447deacb5969
      0ad03adf
  18. 27 Jun, 2022 3 commits
  19. 23 Jun, 2022 1 commit
  20. 21 Jun, 2022 1 commit
    • Sean Kim's avatar
      Create musdb handler and tests (#2484) · b92a8a09
      Sean Kim authored
      Summary:
      Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2484
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D37250556
      
      Pulled By: skim0514
      
      fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51
      b92a8a09
  21. 20 Jun, 2022 1 commit
  22. 13 Jun, 2022 1 commit
  23. 10 Jun, 2022 1 commit
  24. 08 Jun, 2022 2 commits
  25. 04 Jun, 2022 1 commit
    • moto's avatar
      Make FFmpeg log level configurable (#2439) · 877a88c5
      moto authored
      Summary:
      Undesired logs are one of the loudest UX complains we get.
      Yet, loading media files involves uncertainty which is
      difficult to debug without debug log.
      
      This commit introduces utility functions to configure logging level
      so that we can ask users to enable it when they encounter an issue,
      while defaulting to non-verbose option.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2439
      
      Reviewed By: hwangjeff, xiaohui-zhang
      
      Differential Revision: D36903763
      
      Pulled By: mthrok
      
      fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987
      877a88c5
  26. 03 Jun, 2022 1 commit
  27. 02 Jun, 2022 3 commits
  28. 01 Jun, 2022 3 commits
  29. 31 May, 2022 1 commit
  30. 29 May, 2022 1 commit
    • moto's avatar
      Update source info (#2418) · bb77cbeb
      moto authored
      Summary:
      Add num_frames and bits_per_sample to match with the current
      `torchaudio.info` capability.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2418
      
      Reviewed By: carolineechen
      
      Differential Revision: D36749077
      
      Pulled By: mthrok
      
      fbshipit-source-id: 7b368ee993cf5ed63ff2f53c9e3b1f50fcce7713
      bb77cbeb
  31. 23 May, 2022 2 commits
    • Zhaoheng Ni's avatar
      Add assertion checks to multi-channel functions (#2401) · 38e530d7
      Zhaoheng Ni authored
      Summary:
      - The multi-channel functions only support complex-valued tensors for spectrogram and PSD matrices.
      - The mask can be real-valued or complex-valued, hence there is no explicit assertion for mask.
      - The shape of input Tensors need to be verified before the computation. For example, the shape of PSD matrix must be `(..., freq, channel, channel)`, the shape of the mask must be `(..., freq, time)`, etc.
      - The autograd unittest of `apply_beamforming` has wrong dimensions for beamform_weights detected by the assertion check. FIx it in this PR.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2401
      
      Reviewed By: carolineechen
      
      Differential Revision: D36597689
      
      Pulled By: nateanl
      
      fbshipit-source-id: 6ad1adebe3726851cc1d865650bdf177a98985f6
      38e530d7
    • Zhaoheng Ni's avatar
      Add LibriLightLimited dataset (#2302) · af9cab3b
      Zhaoheng Ni authored
      Summary:
      The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
      It contains "10 min", "1 hour", "10 hour" splits.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2302
      
      Reviewed By: mthrok
      
      Differential Revision: D36388188
      
      Pulled By: nateanl
      
      fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249
      af9cab3b