1. 07 Sep, 2022 1 commit
    • moto's avatar
      Tweak documentation (#2656) · 8a0d7b36
      moto authored
      Summary:
      1. Override class `__module__` attribute in `conf.py` so that no manual override is necessary
      2. Fix SourceSeparationBundle member attribute
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2656
      
      Reviewed By: carolineechen
      
      Differential Revision: D39293053
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2b8d6be1aee517d0e692043c26ac2438a787adc6
      8a0d7b36
  2. 06 Sep, 2022 3 commits
    • Ravi Makhija's avatar
      Fix random Gaussian generation (#2639) · 3430fd68
      Ravi Makhija authored
      Summary:
      This PR is meant to address the bug raised in issue https://github.com/pytorch/audio/issues/2634.
      
      In particular, previously the Box Muller transform was used to generate Gaussian variates for dithering based on `torch.rand` uniform variates, but it was incorrectly implemented (e.g. the same uniform variate was used as input to the transform, rather than two different uniform variates), which led to a different (non-Gaussian) distribution. This PR instead uses `torch.randn` to generate the Gaussian variates.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2639
      
      Reviewed By: mthrok
      
      Differential Revision: D39101144
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 691e49679f6598ef0a1675f6f4ee721ef32215fd
      3430fd68
    • Caroline Chen's avatar
      Add metadata function for LibriSpeech (#2653) · 08d3bb17
      Caroline Chen authored
      Summary:
      Adding support for metadata mode, requested in https://github.com/pytorch/audio/issues/2539, by adding a public `get_metadata()` function in the dataset. This function can be used directly by users to fetch metadata for individual dataset indices, or users can subclass the dataset and override `__getitem__` with `get_metadata` to create a dataset class that directly handles metadata mode.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2653
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D39105114
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 6f26f1402a053dffcfcc5d859f87271ed5923348
      08d3bb17
    • Peter Albert's avatar
      Remove obsolete examples (#2655) · 4a20c412
      Peter Albert authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2655
      
      Removed obsolete example and the corresponding test
      
      Reviewed By: mthrok
      
      Differential Revision: D39260253
      
      fbshipit-source-id: 0bde71ffd75dd0c94a5cc4a9940f4648a5d61bd7
      4a20c412
  3. 02 Sep, 2022 1 commit
  4. 01 Sep, 2022 1 commit
  5. 26 Aug, 2022 3 commits
  6. 25 Aug, 2022 1 commit
  7. 24 Aug, 2022 1 commit
    • moto's avatar
      Add StreamWriter (#2628) · 72404de9
      moto authored
      Summary:
      This commit adds FFmpeg-based encoder StreamWriter class.
      StreamWriter is pretty much the opposite of StreamReader class, and
      it supports;
      
      * Encoding audio / still image / video
      * Exporting to local file / streaming protocol / devices etc...
      * File-like object support (in later commit)
      * HW video encoding (in later commit)
      
      See also: https://fburl.com/gslide/z85kn5a9 (Meta internal)
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2628
      
      Reviewed By: nateanl
      
      Differential Revision: D38816650
      
      Pulled By: mthrok
      
      fbshipit-source-id: a9343b0d55755e186971dc96fb86eb52daa003c8
      72404de9
  8. 23 Aug, 2022 2 commits
  9. 22 Aug, 2022 2 commits
  10. 20 Aug, 2022 1 commit
  11. 19 Aug, 2022 2 commits
    • Moto Hira's avatar
      Refactor sox pybind source code (#2636) · 789adf07
      Moto Hira authored
      Summary:
      Pull Request resolved: https://github.com/pytorch/audio/pull/2636
      
      At the early stage of torchaudio extension module,
      `torchaudio/csrc/pybind` directory was created so that
      all the code defining Python interface would be placed
      there and there will be only one extension module called
      `torchaudio._torchaudio`.
      
      However, the codebase has been evolved in a way separate
      extensions are defined for each feature (third party
      dependency) for the sake of more moduler file organization.
      
      What is left in `csrc/pybind` is libsox Python bindings.
      This commit moves it under `csrc/sox`.
      
      Follow-up rename `torchaudio._torchaudio` to `torchaudio._torchaudio_sox`.
      
      Reviewed By: carolineechen
      
      Differential Revision: D38829253
      
      fbshipit-source-id: 3554af45a2beb0f902810c5548751264e093f28d
      789adf07
    • moto's avatar
      Update README.md (#2633) · 0b7f2fba
      moto authored
      Summary:
      Update compatibility matrix
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2633
      
      Reviewed By: nateanl
      
      Differential Revision: D38827670
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5c66bf60a06e37919ee725a5f4adf571e6c89100
      0b7f2fba
  12. 18 Aug, 2022 6 commits
  13. 16 Aug, 2022 4 commits
  14. 15 Aug, 2022 3 commits
  15. 12 Aug, 2022 1 commit
  16. 11 Aug, 2022 1 commit
  17. 10 Aug, 2022 3 commits
  18. 09 Aug, 2022 1 commit
    • Caroline Chen's avatar
      Add NNLM support to CTC Decoder (#2528) · 03a0d68e
      Caroline Chen authored
      Summary:
      Expose flashlight's LM and LMState classes to support decoding with custom language models, including NN LMs.
      
      The `ctc_decoder` API is as follows
      - To decode with KenLM, pass in KenLM language model path to `lm` variable
      - To decode with custom LM, create Python class with `CTCDecoderLM` subclass, and pass in the class to `lm` variable. Additionally create a file of LM words listed in order of the LM index, with a word per line, and pass in the file to `lm_path`.
      - To decode without a language model, set `lm` to `None` (default)
      
      Validated against fairseq w2l decoder on sample LibriSpeech dataset and LM. Code for validation can be found [here](https://github.com/facebookresearch/fairseq/compare/main...carolineechen:fairseq:ctc-decoder). Also added unit tests to validate custom implementations of ZeroLM and KenLM, and also using a biased LM.
      
      Follow ups:
      - Train simple LM on LibriSpeech and demonstrate usage in tutorial or examples directory
      
      cc jacobkahn
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2528
      
      Reviewed By: mthrok
      
      Differential Revision: D38243802
      
      Pulled By: carolineechen
      
      fbshipit-source-id: 445e78f6c20bda655aabf819fc0f771fe68c73d7
      03a0d68e
  19. 08 Aug, 2022 1 commit
  20. 05 Aug, 2022 2 commits
    • hwangjeff's avatar
      Add convolution operator (#2602) · b396157d
      hwangjeff authored
      Summary:
      Adds functions `convolve` and `fftconvolve`, which compute the convolution of two tensors along their trailing dimension. The former performs the convolution directly, whereas the latter performs it using FFT.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2602
      
      Reviewed By: nateanl, mthrok
      
      Differential Revision: D38450771
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: b2d1e063ba21eafeddf317d60749e7120b14292b
      b396157d
    • Caroline Chen's avatar
      Add note for lexicon free decoder output (#2603) · 33485b8c
      Caroline Chen authored
      Summary:
      ``words`` field of CTCHypothesis is empty if no lexicon is provided, which produces confusing output (see issue https://github.com/pytorch/audio/issues/2584) when following our tutorial example with lexicon free usage. This PR adds a note in both docs and tutorial.
      
      Followup: determine if we want to modify the behavior of ``words`` in the lexicon free case. One option is to merge and then split the generated tokens by the input silent token to populate the words field, but this is tricky since the meaning of a "word" in the lexicon free case can be vague and not all languages have whitespaces between words, etc
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2603
      
      Reviewed By: mthrok
      
      Differential Revision: D38459709
      
      Pulled By: carolineechen
      
      fbshipit-source-id: d64ff186df4633f00e94c64afeaa6a50cebf2934
      33485b8c