1. 26 Oct, 2023 1 commit
  2. 14 Aug, 2023 1 commit
  3. 07 Aug, 2023 1 commit
    • moto's avatar
      Add MMS FA Bundle (#3521) · 5e211d66
      moto authored
      Summary:
      Port the MMS FA model from tutorial to the library with post-processing module.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3521
      
      Reviewed By: huangruizhe
      
      Differential Revision: D48038285
      
      Pulled By: mthrok
      
      fbshipit-source-id: 571cf0fceaaab4790983be2719f1a85805b814f5
      5e211d66
  4. 01 Aug, 2023 1 commit
  5. 28 Jul, 2023 1 commit
    • Zhaoheng Ni's avatar
      Move TorchAudio-Squim models to Beta (#3512) · b7d2d928
      Zhaoheng Ni authored
      Summary:
      The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3512
      
      Reviewed By: mthrok
      
      Differential Revision: D47837434
      
      Pulled By: nateanl
      
      fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8
      b7d2d928
  6. 17 May, 2023 1 commit
    • Carl Parker's avatar
      Fix for breadcrumbs displaying "Old version (stable)" on Nightly build (#3333) · 3ffd76c8
      Carl Parker authored
      Summary:
      Previously, `breadcrumbs.html` identified a nightly build version by the prefix "Nightly" which would normally be prepended to the version in `conf.py`. However, the version string is coming through without the "Nightly" prefix, so this change causes `breadcrumbs.html` to key on the substring "dev" instead.
      
      The reason we aren't getting "Nightly" is apparently because the environment variable BUILD_VERSION is available, so `conf.py` is using the value of that env var instead of the version string imported from the `torchaudio` module itself, which actually appears to be incorrect; see below.
      
      If I install torchaudio using
      
          conda install torchaudio -c pytorch-nightly
      
      then `torchaudio.__version__` returns the incorrect version string:
      
          2.0.0.dev20230309
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3333
      
      Reviewed By: mthrok
      
      Differential Revision: D45926466
      
      Pulled By: carljparker
      
      fbshipit-source-id: d5516f2d9f1716c2400d3e9b285bd5d32b4b3a77
      3ffd76c8
  7. 28 Apr, 2023 1 commit
    • Yuekai Zhang's avatar
      Add cuctc decoder (#3096) · 0a1801ed
      Yuekai Zhang authored
      Summary:
      This PR implements a CUDA based ctc prefix beam search decoder.
      
      Attach serveral benchmark results using V100 below:
      |decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
      |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
      | cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
      | cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
      | cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|
      
      Note:
      1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
      2. WER is the same as CPU implementations. However, it can't decode with LM now.
      
      Resolves: https://github.com/pytorch/audio/issues/2957.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3096
      
      Reviewed By: nateanl
      
      Differential Revision: D44709397
      
      Pulled By: mthrok
      
      fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
      0a1801ed
  8. 01 Apr, 2023 1 commit
    • moto's avatar
      Add AudioEffector (#3163) · a4036248
      moto authored
      Summary:
      This commit adds a new feature AudioEffector, which can be used to
      apply various effects and codecs to waveforms in Tensor.
      
      Under the hood it uses StreamWriter and StreamReader to apply
      filters and encode/decode.
      
      This is going to replace the deprecated `apply_codec` and
      `apply_sox_effect_tensor` functions.
      
      It can also perform online, chunk-by-chunk filtering.
      
      Tutorial to follow.
      
      closes https://github.com/pytorch/audio/issues/3161
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3163
      
      Reviewed By: hwangjeff
      
      Differential Revision: D44576660
      
      Pulled By: mthrok
      
      fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
      a4036248
  9. 27 Mar, 2023 1 commit
    • hwangjeff's avatar
      Revise encoder config arg and docstrings (#3203) · b1de9f1a
      hwangjeff authored
      Summary:
      For `StreamWriter`,
      * Renames arg `config` to codec_config`.
      * Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
      * Adds docstrings for arg codec_config`.
      * Updates `chunk` to `frames` in `write_*_chunk` methods.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3203
      
      Reviewed By: mthrok
      
      Differential Revision: D44350153
      
      Pulled By: hwangjeff
      
      fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343
      b1de9f1a
  10. 21 Mar, 2023 1 commit
    • Zhaoheng Ni's avatar
      Add SquimSubjective Model (#3189) · a8a16238
      Zhaoheng Ni authored
      Summary:
      Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3189
      
      Reviewed By: mthrok
      
      Differential Revision: D44267255
      
      Pulled By: nateanl
      
      fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db
      a8a16238
  11. 17 Mar, 2023 1 commit
  12. 15 Mar, 2023 1 commit
    • Carl Parker's avatar
      Enhance UX on TorchAudio pages to improve awareness of doc versioning (#3167) · 92f2ea89
      Carl Parker authored
      Summary:
      - Boldface the version-selection UX and increase size by three percent.
      - Add text to breadcrumbs to indicate version and stability.
      - New `breadcrumbs.html` in `_templates` overrides Sphinx version.
      
      I create a new variable in `conf.py`, **version_stable**, which has the version number for the most-recent stable release. I define this variable in the **html_context** dictionary so that it is visible to the templates.
      
      I use this approach because I was not able to find any other way of discerning the current stable release during the build. Note that the `versions.html` file--which identifies the current stable release--appears to be available only in the **gh-pages** branch and so it is not available at build time.
      
      However, this means that someone will need to update `conf.py` whenever the current stable release changes.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3167
      
      Reviewed By: mthrok
      
      Differential Revision: D44112224
      
      Pulled By: carljparker
      
      fbshipit-source-id: e76f5cb6734a784d161342964459577aa9b64cac
      92f2ea89
  13. 08 Mar, 2023 1 commit
    • moto's avatar
      Include format information after filter (#3155) · 146195d8
      moto authored
      Summary:
      This commit adds fields to OutputStream, which shows the result
      of fitlers, such as width and height after filtering.
      
      Before
      
      ```
      OutputStream(
          source_index=0,
          filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
      ```
      
      After
      
      ```
      OutputVideoStream(
          source_index=0,
          filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
          media_type='video',
          format='gray',
          width=320,
          height=320,
          frame_rate=3.0)
      ```
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/3155
      
      Reviewed By: nateanl
      
      Differential Revision: D43882399
      
      Pulled By: mthrok
      
      fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d
      146195d8
  14. 24 Feb, 2023 1 commit
  15. 09 Feb, 2023 1 commit
  16. 01 Feb, 2023 1 commit
  17. 22 Jan, 2023 1 commit
    • moto's avatar
      Make StreamReader return PTS (#2975) · 0dd59e0d
      moto authored
      Summary:
      This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.
      
      Example
      
      ```python
      from torchaudio.io import StreamReader
      
      s = StreamReader(...)
      s.add_video_stream(...)
      for (video_chunk, ) in s.stream():
          # video_chunk is Torch tensor type but has extra attribute of PTS
          print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
      ```
      
      For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
      of Tensor and metadata, but works like a normal tensor in PyTorch operations.
      
      The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).
      
      It was also suggested to attach metadata directly to Tensor object,
      but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
      PyTorch cannot be ignored, so we use Tensor subclass implementation.
      
      If any unexpected issue arise from metadata attribute name collision, client code can
      fetch the bare Tensor and continue.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2975
      
      Reviewed By: hwangjeff
      
      Differential Revision: D42526945
      
      Pulled By: mthrok
      
      fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35
      0dd59e0d
  18. 13 Jan, 2023 1 commit
  19. 10 Dec, 2022 1 commit
  20. 02 Nov, 2022 1 commit
  21. 28 Oct, 2022 1 commit
  22. 13 Oct, 2022 1 commit
    • moto's avatar
      Fix CTCDecoder doc (#2766) · 3e4b961d
      moto authored
      Summary:
      * Document `__call__` instead of `__init__`
      * List CTCHypothesis first as it is used in combination with CTCDecoder
      * Fix indentation of score method docstring
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2766
      
      Reviewed By: carolineechen
      
      Differential Revision: D40349388
      
      Pulled By: mthrok
      
      fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c
      3e4b961d
  23. 03 Oct, 2022 1 commit
  24. 22 Sep, 2022 1 commit
  25. 21 Sep, 2022 2 commits
  26. 16 Sep, 2022 3 commits
  27. 15 Aug, 2022 1 commit
  28. 24 May, 2022 2 commits
  29. 17 May, 2022 1 commit
  30. 26 Feb, 2022 1 commit
    • moto's avatar
      Improve device streaming (#2202) · 365313ed
      moto authored
      Summary:
      This commit adds tutorial for device ASR, and update API for device streaming.
      
      The changes for the interface are
      1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods.
      2. Move `fill_buffer` method to private.
      
      When dealing with device stream, there are situations where the device buffer is not
      ready and the system returns `EAGAIN`. In such case, the previous implementation of
      `process_packet` method raised an exception in Python layer , but for device ASR,
      this is inefficient. A better approach is to retry within C++ layer in blocking manner.
      The new `timeout` parameter serves this purpose.
      
      Pull Request resolved: https://github.com/pytorch/audio/pull/2202
      
      Reviewed By: nateanl
      
      Differential Revision: D34475829
      
      Pulled By: mthrok
      
      fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
      365313ed
  31. 05 Nov, 2021 1 commit
    • moto's avatar
      Refactor tutorial organization (#1987) · 6cf84866
      moto authored
      * Refactor tutorial organization
      
      * Merge tutorial subdirectoris under to examples/gallery/tutorials
      * Do not use index.rst generated by Sphinx-gallery
      * Instead use flat structure so that all the tutorials are listed in left menu
      * Use `_assets` dir for artifacts of tutorials
      6cf84866
  32. 04 Nov, 2021 2 commits
    • moto's avatar
      Fix colab URL (#1981) · a6bcd291
      moto authored
      a6bcd291
    • moto's avatar
      Add Colab/Download/Github link similar to tutorials (#1969) · 7c9402f1
      moto authored
      This commit adds colab/download/source link to tutorials, like in `pytorch/tutorials` repo.
      
      Since the upstream `pytorch-sphinx-theme` does not provide the interface for this,
      a hack to overwrite the URL is added.
      
      This hack might stop working if there is some update in `pytorch-sphinx-theme`.
      7c9402f1
  33. 17 Feb, 2021 1 commit