1. 21 Jun, 2022 1 commit
    • Sean Kim's avatar
      Create musdb handler and tests (#2484) · b92a8a09
      Sean Kim authored
      Summary:
      Create dataset handler and tests for new dataset. Manually tested and unit tested to test validity. Pre-commit ran for style checks.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2484
      
      Reviewed By: carolineechen, nateanl
      
      Differential Revision: D37250556
      
      Pulled By: skim0514
      
      fbshipit-source-id: d2c8d73d22fd9d7282026265676f3eab1e178d51
      b92a8a09
  2. 20 Jun, 2022 1 commit
  3. 08 Jun, 2022 2 commits
  4. 04 Jun, 2022 1 commit
    • moto's avatar
      Make FFmpeg log level configurable (#2439) · 877a88c5
      moto authored
      Summary:
      Undesired logs are one of the loudest UX complains we get.
      Yet, loading media files involves uncertainty which is
      difficult to debug without debug log.
      
      This commit introduces utility functions to configure logging level
      so that we can ask users to enable it when they encounter an issue,
      while defaulting to non-verbose option.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2439
      
      Reviewed By: hwangjeff, xiaohui-zhang
      
      Differential Revision: D36903763
      
      Pulled By: mthrok
      
      fbshipit-source-id: f4ddd9915b13197c2a2eb97e965005b8b5b8d987
      877a88c5
  5. 01 Jun, 2022 2 commits
  6. 24 May, 2022 2 commits
  7. 23 May, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add LibriLightLimited dataset (#2302) · af9cab3b
      Zhaoheng Ni authored
      Summary:
      The `LibriLightLimited` dataset is created for fine-tuning SSL models, such as Wav2Vec2 and HuBERT. It is a supervised subset of [Libri-Light](https://github.com/facebookresearch/libri-light) dataset. To distinguish the unsupervised subset and the supervised one, it's clearer to put it in a separate dataset class for fine-tuning purpose.
      It contains "10 min", "1 hour", "10 hour" splits.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2302
      
      Reviewed By: mthrok
      
      Differential Revision: D36388188
      
      Pulled By: nateanl
      
      fbshipit-source-id: ba49f1c9996be17db5db41127d8ca96224c94249
      af9cab3b
  8. 20 May, 2022 1 commit
  9. 17 May, 2022 1 commit
  10. 13 May, 2022 1 commit
    • moto's avatar
      Move Streamer API out of prototype (#2378) · 72b712a1
      moto authored
      Summary:
      This commit moves the Streaming API out of prototype module.
      
      * The related classes are renamed as following
      
        - `Streamer` -> `StreamReader`.
        - `SourceStream` -> `StreamReaderSourceStream`
        - `SourceAudioStream` -> `StreamReaderSourceAudioStream`
        - `SourceVideoStream` -> `StreamReaderSourceVideoStream`
        - `OutputStream` -> `StreamReaderOutputStream`
      
      This change is preemptive measurement for the possibility to add
      `StreamWriter` API.
      
      * Replace BUILD_FFMPEG build arg with USE_FFMPEG
      
      We are not building FFmpeg, so USE_FFMPEG is more appropriate
      
       ---
      
      After https://github.com/pytorch/audio/issues/2377
      
      Remaining TODOs: (different PRs)
      - [ ] Introduce `is_ffmpeg_binding_available` function.
      - [ ] Refactor C++ code:
         - Rename `Streamer` to `StreamReader`.
         - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`.
         - Rename `prototype.cpp` to `stream_reader_binding.cpp`.
         - Introduce `stream_reader` directory.
      - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2378
      
      Reviewed By: carolineechen
      
      Differential Revision: D36359299
      
      Pulled By: mthrok
      
      fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
      72b712a1
  11. 10 May, 2022 4 commits
    • hwangjeff's avatar
      Add ConvEmformer module (#2358) · 2c79b55a
      hwangjeff authored
      Summary:
      Adds an implementation of the convolution-augmented streaming transformer (effectively Emformer with convolution block) described in https://arxiv.org/abs/2110.05241.
      
      Continuation of https://github.com/pytorch/audio/issues/2324.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2358
      
      Reviewed By: nateanl, xiaohui-zhang
      
      Differential Revision: D36137992
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9c7a7c233944fe9ef15b9ba397d7f0809da1f063
      2c79b55a
    • Zhaoheng Ni's avatar
      Add RTFMVDR module (#2368) · 4b021ae3
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The RTFMVDR module supports the method based on the relative transfer function (RTF) and power spectral density (PSD) matrix of noise.
      The input arguments are:
      - multi-channel spectrum.
      - RTF vector of the target speech
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2368
      
      Reviewed By: carolineechen
      
      Differential Revision: D36214940
      
      Pulled By: nateanl
      
      fbshipit-source-id: 5f29f778663c96591e1b520b15f7876d07116937
      4b021ae3
    • Zhaoheng Ni's avatar
      Add SoudenMVDR module (#2367) · aed5eb88
      Zhaoheng Ni authored
      Summary:
      Add a new design of MVDR module.
      The `SoudenMVDR` module supports the method proposed by [Souden et, al.](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.725.673&rep=rep1&type=pdf).
      The input arguments are:
      - multi-channel spectrum.
      - PSD matrix of target speech.
      - PSD matrix of noise.
      - reference channel in the microphone array.
      - diagonal_loading option to enable or disable diagonal loading in matrix inverse computation.
      - diag_eps for computing the inverse of the matrix.
      - eps for computing the beamforming weight.
      
      The output of the module is the single-channel complex-valued spectrum for the enhanced speech.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2367
      
      Reviewed By: hwangjeff
      
      Differential Revision: D36198015
      
      Pulled By: nateanl
      
      fbshipit-source-id: 4027f4752a84aaef730ef3ea8c625e801cc35527
      aed5eb88
    • Caroline Chen's avatar
      Add citations for datasets (#2371) · 638120ca
      Caroline Chen authored
      Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2371
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D36246167
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 23042a1c393711864a18c9815d248c18d1d258b4
      638120ca
  12. 26 Apr, 2022 2 commits
  13. 21 Apr, 2022 1 commit
    • hwangjeff's avatar
      Change underlying implementation of RNN-T hypothesis to tuple (#2339) · 6b242c29
      hwangjeff authored
      Summary:
      PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2339
      
      Reviewed By: nateanl
      
      Differential Revision: D35806529
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
      6b242c29
  14. 18 Apr, 2022 1 commit
  15. 12 Apr, 2022 1 commit
    • hwangjeff's avatar
      Add Conformer RNN-T model prototype (#2322) · b0c8e239
      hwangjeff authored
      Summary:
      Adds Conformer RNN-T model as prototype feature, by way of factory functions `conformer_rnnt_model` and `conformer_rnnt_base`, which instantiates a baseline version of the model. Also includes the following:
      - Modifies `Conformer` to accept arguments `use_group_norm` and `convolution_first` to pass to each of its `ConformerLayer` instances.
      - Makes `_Predictor` an abstract class and introduces `_EmformerEncoder` and `_ConformerEncoder`.
      - Introduces tests for `conformer_rnnt_model`.
      - Adds docs.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2322
      
      Reviewed By: xiaohui-zhang
      
      Differential Revision: D35565987
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: cb37bb0477ae3d5fcf0b7124f334f4cbb89b5789
      b0c8e239
  16. 08 Apr, 2022 1 commit
    • moto's avatar
      Add devices/properties badges (#2321) · 72ae755a
      moto authored
      Summary:
      Add badges of supported properties and devices to functionals and transforms.
      
      This commit adds `.. devices::` and `.. properties::` directives to sphinx.
      
      APIs with these directives will have badges (based off of shields.io) which link to the
      page with description of these features.
      
      Continuation of https://github.com/pytorch/audio/issues/2316
      Excluded dtypes for further improvement, and actually added badges to most of functional/transforms.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2321
      
      Reviewed By: hwangjeff
      
      Differential Revision: D35489063
      
      Pulled By: mthrok
      
      fbshipit-source-id: f68a70ebb22df29d5e9bd171273bd19007a81762
      72ae755a
  17. 26 Mar, 2022 1 commit
  18. 24 Mar, 2022 1 commit
  19. 26 Feb, 2022 2 commits
    • Zhaoheng Ni's avatar
      Add apply_beamforming to torchaudio.functional (#2232) · 9c56ffb4
      Zhaoheng Ni authored
      Summary:
      This PR adds ``apply_beamforming`` method to ``torchaudio.functional``.
      The method employs the beamforming weight to the multi-channel noisy spectrum to obtain the single-channel enhanced spectrum.
      The input arguments are the complex-valued beamforming weight Tensor and the multi-channel noisy spectrum.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2232
      
      Reviewed By: mthrok
      
      Differential Revision: D34474561
      
      Pulled By: nateanl
      
      fbshipit-source-id: 2910251a8f111e65375dfb50495b6a415113f06d
      9c56ffb4
    • moto's avatar
      Improve device streaming (#2202) · 365313ed
      moto authored
      Summary:
      This commit adds tutorial for device ASR, and update API for device streaming.
      
      The changes for the interface are
      1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
      2. Move `fill_buffer` method to private.
      
      When dealing with device stream, there are situations where the device buffer is not
      ready and the system returns `EAGAIN`. In such case, the previous implementation of
      `process_packet` method raised an exception in Python layer , but for device ASR,
      this is inefficient. A better approach is to retry within C++ layer in blocking manner.
      The new `timeout` parameter serves this purpose.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2202
      
      Reviewed By: nateanl
      
      Differential Revision: D34475829
      
      Pulled By: mthrok
      
      fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
      365313ed
  20. 25 Feb, 2022 5 commits
  21. 16 Feb, 2022 1 commit
    • Zhaoheng Ni's avatar
      Add EMFORMER_RNNT_BASE_MUSTC bundle to torchaudio.prototype (#2241) · 99b5ef5c
      Zhaoheng Ni authored
      Summary:
      This PR provides a RNNTBundle that is pre-trained on the MuST-C release v2.0 dataset.
      The model preserves the casing and punctuations of the transcripts when training the SentencePiece model.
      
      Here is the model performance on the dev and test sets of MuST-C 2.0:
      |                   |          WER |
      |:-----------------:|-------------:|
      | dev               |       0.190  |
      | tst-COMMON        |       0.213  |
      | tst-HE            |       0.186  |
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2241
      
      Reviewed By: mthrok
      
      Differential Revision: D34267792
      
      Pulled By: nateanl
      
      fbshipit-source-id: 67bca9f277e66d41a4530d01615f249b3cec7167
      99b5ef5c
  22. 04 Feb, 2022 1 commit
  23. 03 Feb, 2022 1 commit
  24. 02 Feb, 2022 1 commit
  25. 01 Feb, 2022 3 commits
  26. 27 Jan, 2022 1 commit
    • Caroline Chen's avatar
      Add no lm support for CTC decoder (#2174) · 4c3fa875
      Caroline Chen authored
      Summary:
      Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2174
      
      Reviewed By: hwangjeff, nateanl
      
      Differential Revision: D33798674
      
      Pulled By: carolineechen
      
      fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
      4c3fa875