Commits · 2a0f4c067413411c9bd8cc88210320c8989580ff · OpenDAS / Torchaudio

26 Oct, 2023 1 commit
- Fix doc on FA (#3679) · 2a0f4c06
  moto authored Oct 25, 2023
  
  2a0f4c06
14 Aug, 2023 1 commit

Update I/O and backend docs (#3555) · c0f25f21

moto authored Aug 14, 2023

Summary:
* Merge backend doc into torchaudio toplevel doc
* Update backend, dispatcher, installation doc

Pull Request resolved: https://github.com/pytorch/audio/pull/3555

Reviewed By: huangruizhe

Differential Revision: D48326812

Pulled By: mthrok

fbshipit-source-id: cc0d7326eacfebd341323b5d613ca1777255748b

c0f25f21

07 Aug, 2023 1 commit

Add MMS FA Bundle (#3521) · 5e211d66

moto authored Aug 07, 2023

Summary:
Port the MMS FA model from tutorial to the library with post-processing module.

Pull Request resolved: https://github.com/pytorch/audio/pull/3521

Reviewed By: huangruizhe

Differential Revision: D48038285

Pulled By: mthrok

fbshipit-source-id: 571cf0fceaaab4790983be2719f1a85805b814f5

5e211d66

01 Aug, 2023 1 commit

Add pretrained VGGish inference pipeline (#3491) · cbfde17b

hwangjeff authored Jul 31, 2023

Summary:
Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset.

Pull Request resolved: https://github.com/pytorch/audio/pull/3491

Reviewed By: mthrok

Differential Revision: D47738130

Pulled By: hwangjeff

fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67

cbfde17b

28 Jul, 2023 1 commit

Move TorchAudio-Squim models to Beta (#3512) · b7d2d928

Zhaoheng Ni authored Jul 28, 2023

Summary:
The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release.

Pull Request resolved: https://github.com/pytorch/audio/pull/3512

Reviewed By: mthrok

Differential Revision: D47837434

Pulled By: nateanl

fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8

b7d2d928

28 Apr, 2023 1 commit

Add cuctc decoder (#3096) · 0a1801ed

Yuekai Zhang authored Apr 28, 2023

Summary:
This PR implements a CUDA based ctc prefix beam search decoder.

Attach serveral benchmark results using V100 below:
|decoder type| model |datasets       | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size |
|--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------|
| cuctc |  conformer nemo    |dev clean        |7.68s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  conformer nemo   |dev clean  (sort by length)      |1.6s | 8           |  32       | bpe         |    4  | 1000|
| cuctc |  wav2vec2.0 torchaudio |dev clean                                |22s | 10           |  1       | char         |    2  | 29|
| cuctc |   conformer espnet   |aishell1 test                             | 5s | 10           |  24       | char         |    4  | 4233|

Note:
1.  The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations.
2. WER is the same as CPU implementations. However, it can't decode with LM now.

Resolves: https://github.com/pytorch/audio/issues/2957.

Pull Request resolved: https://github.com/pytorch/audio/pull/3096

Reviewed By: nateanl

Differential Revision: D44709397

Pulled By: mthrok

fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155

0a1801ed

01 Apr, 2023 1 commit

Add AudioEffector (#3163) · a4036248

moto authored Mar 31, 2023

Summary:
This commit adds a new feature AudioEffector, which can be used to
apply various effects and codecs to waveforms in Tensor.

Under the hood it uses StreamWriter and StreamReader to apply
filters and encode/decode.

This is going to replace the deprecated `apply_codec` and
`apply_sox_effect_tensor` functions.

It can also perform online, chunk-by-chunk filtering.

Tutorial to follow.

closes https://github.com/pytorch/audio/issues/3161

Pull Request resolved: https://github.com/pytorch/audio/pull/3163

Reviewed By: hwangjeff

Differential Revision: D44576660

Pulled By: mthrok

fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb

a4036248

27 Mar, 2023 1 commit

Revise encoder config arg and docstrings (#3203) · b1de9f1a

hwangjeff authored Mar 27, 2023

Summary:
For `StreamWriter`,
* Renames arg `config` to codec_config`.
* Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`.
* Adds docstrings for arg codec_config`.
* Updates `chunk` to `frames` in `write_*_chunk` methods.

Pull Request resolved: https://github.com/pytorch/audio/pull/3203

Reviewed By: mthrok

Differential Revision: D44350153

Pulled By: hwangjeff

fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343

b1de9f1a

21 Mar, 2023 1 commit

Add SquimSubjective Model (#3189) · a8a16238

Zhaoheng Ni authored Mar 21, 2023

Summary:
Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task.

Pull Request resolved: https://github.com/pytorch/audio/pull/3189

Reviewed By: mthrok

Differential Revision: D44267255

Pulled By: nateanl

fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db

a8a16238

17 Mar, 2023 1 commit

Add EncodingConfig (#3179) · 9bb35070

moto authored Mar 16, 2023

Summary:
Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level.

Pull Request resolved: https://github.com/pytorch/audio/pull/3179

Pull Request resolved: https://github.com/pytorch/audio/pull/3164

Reviewed By: mthrok

Differential Revision: D43861413

Pulled By: hwangjeff

fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161

9bb35070

08 Mar, 2023 1 commit

Include format information after filter (#3155) · 146195d8

moto authored Mar 08, 2023

Summary:
This commit adds fields to OutputStream, which shows the result
of fitlers, such as width and height after filtering.

Before

```
OutputStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray')
```

After

```
OutputVideoStream(
    source_index=0,
    filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray',
    media_type='video',
    format='gray',
    width=320,
    height=320,
    frame_rate=3.0)
```

Pull Request resolved: https://github.com/pytorch/audio/pull/3155

Reviewed By: nateanl

Differential Revision: D43882399

Pulled By: mthrok

fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d

146195d8

24 Feb, 2023 1 commit

Use autosummary for torchaudio.prototyoe.models documentation (#3084) · f6d1bc96

Zhaoheng Ni authored Feb 24, 2023

Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3084

Reviewed By: mthrok

Differential Revision: D43550150

Pulled By: nateanl

fbshipit-source-id: 5c5e3d9461e375be202493e3399ff38ce5cd7690

f6d1bc96

09 Feb, 2023 1 commit

Follow-up on audio playback function (#3051) · 91b05e2e

moto authored Feb 09, 2023

Summary:
- Add documentation
- Tweak docsrting
- Fix import

Pull Request resolved: https://github.com/pytorch/audio/pull/3051

Reviewed By: weiwangmeta, atalman, nateanl

Differential Revision: D43166081

Pulled By: mthrok

fbshipit-source-id: 7d77aa34a6318a64824626cff8372f8b9aebf6f9

91b05e2e

01 Feb, 2023 1 commit

Add C++ documentation (#2994) · f663cb28

moto authored Jan 31, 2023

Summary:
Adding C++ documentation. (C++ APIs are categorized as prototype, though it's used by Python beta APIs.)

https://output.circle-artifacts.com/output/job/69654229-a99e-4b15-9ce0-7bc6bcf01101/artifacts/0/docs/libtorchaudio.html

<img width="1202" alt="Screenshot 2023-01-31 at 11 48 47 AM" src="https://user-images.githubusercontent.com/855818/215828167-d23032f8-9e40-4413-b5b1-5cbd12d705e9.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2994

Reviewed By: hwangjeff

Differential Revision: D42876621

Pulled By: mthrok

fbshipit-source-id: d8b8d610b87ec766501baa88b7506368a9905a6a

f663cb28

22 Jan, 2023 1 commit

Make StreamReader return PTS (#2975) · 0dd59e0d

moto authored Jan 22, 2023

Summary:
This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well.

Example

```python
from torchaudio.io import StreamReader

s = StreamReader(...)
s.add_video_stream(...)
for (video_chunk, ) in s.stream():
    # video_chunk is Torch tensor type but has extra attribute of PTS
    print(video_chunk.pts)  # reports the PTS of the first frame of the video chunk.
```

For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition
of Tensor and metadata, but works like a normal tensor in PyTorch operations.

The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83).

It was also suggested to attach metadata directly to Tensor object,
but the possibility to have the collision on torchaudio's metadata and new attributes introduced in
PyTorch cannot be ignored, so we use Tensor subclass implementation.

If any unexpected issue arise from metadata attribute name collision, client code can
fetch the bare Tensor and continue.

Pull Request resolved: https://github.com/pytorch/audio/pull/2975

Reviewed By: hwangjeff

Differential Revision: D42526945

Pulled By: mthrok

fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35

0dd59e0d

13 Jan, 2023 1 commit

Add XLS-R models (#2959) · a5664ca9

Zhaoheng Ni authored Jan 12, 2023

Summary:
XLSR (cross-lingual speech representation) are a set of cross-lingual self-supervised learning models for generating cross-lingual speech representation. It was first proposed in https://arxiv.org/pdf/2006.13979.pdf which is trained on 53 languages (so-called XLSR-53). This PR supports more XLS-R models from https://arxiv.org/pdf/2111.09296.pdf that have more parameters (300M, 1B, 2B) and are trained on 128 languages.

Pull Request resolved: https://github.com/pytorch/audio/pull/2959

Reviewed By: mthrok

Differential Revision: D42397643

Pulled By: nateanl

fbshipit-source-id: 23e8e51a7cde0a226db4f4028db7df8f02b986ce

a5664ca9

10 Dec, 2022 1 commit

Update model documentation structure (#2902) · 9912e54d

moto authored Dec 09, 2022

Summary:
Currently, the documentation page for `torchaudio.models` have separate sections for model definitions and factory functions.

The relationships between models and factory functions are not immediately clear.

This commit moves the list of factory functions to the list of models.

After:
 - https://output.circle-artifacts.com/output/job/242a9521-7460-4043-895b-9995bf5093b5/artifacts/0/docs/generated/torchaudio.models.Wav2Vec2Model.html

<img width="1171" alt="Screen Shot 2022-12-08 at 8 41 03 PM" src="https://user-images.githubusercontent.com/855818/206603743-74a6e368-c3cf-4b87-b854-518a95893f06.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2902

Reviewed By: carolineechen

Differential Revision: D41897800

Pulled By: mthrok

fbshipit-source-id: a3c01d28d80e755596a9bc37c951960eb84870b9

9912e54d

13 Oct, 2022 1 commit

Fix CTCDecoder doc (#2766) · 3e4b961d

moto authored Oct 13, 2022

Summary:
* Document `__call__` instead of `__init__`
* List CTCHypothesis first as it is used in combination with CTCDecoder
* Fix indentation of score method docstring

Pull Request resolved: https://github.com/pytorch/audio/pull/2766

Reviewed By: carolineechen

Differential Revision: D40349388

Pulled By: mthrok

fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c

3e4b961d

03 Oct, 2022 1 commit

Adopt :autosummary: to multiple modules (#2664) · ef1ba56f

moto authored Oct 03, 2022

Summary:
Adopt `:autosummary:` to various modules

    * torchaudio.compliance.kaldi
    * torchaudio.sox_effects
    * torchaudio.utils

Pull Request resolved: https://github.com/pytorch/audio/pull/2664

Reviewed By: nateanl

Differential Revision: D39841873

Pulled By: mthrok

fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac

ef1ba56f

22 Sep, 2022 1 commit

Adopt `:autosummary:` in `torchaudio.datasets` module doc (#2692) · 49b23e15

moto authored Sep 22, 2022

Summary:
* Introduce the mini-index at `torchaudio.datasets` page.
* Standardize the format of return type docstring.

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html

<img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png">

https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict

<img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2692

Reviewed By: carolineechen

Differential Revision: D39687463

Pulled By: mthrok

fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df

49b23e15

21 Sep, 2022 2 commits

Adopt `:autosummary:` in `torchaudio.pipelines` module doc (#2689) · 0b3ddec6

moto authored Sep 21, 2022

Summary:
* Introduce the mini-index at `torchaudio.pipelines` page.
* Add introductions
* Update pipeline tutorials

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html

<img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png">

<img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png">

https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle

<img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2689

Reviewed By: carolineechen

Differential Revision: D39691253

Pulled By: mthrok

fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49

0b3ddec6

Adopt `:autosummary:` in `torchaudio.models` module doc (#2690) · 30c7077b

moto authored Sep 20, 2022

Summary:
* Introduce the mini-index at `torchaudio.models` page.

https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html

<img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png">

<img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2690

Reviewed By: carolineechen

Differential Revision: D39654948

Pulled By: mthrok

fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8

30c7077b

16 Sep, 2022 3 commits

Adopt `:autosummary:` in `torchaudio.transforms` module doc (#2683) · baf354a7

moto authored Sep 16, 2022

Summary:
* Introduce the mini-index at `torchaudio.transforms` page.
* Add "Augmentations" subsection.
* Also updated the overall introduction.

https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html

<img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png">

<img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2683

Reviewed By: carolineechen

Differential Revision: D39574255

Pulled By: mthrok

fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627

baf354a7

Adopt `:autosummary:` in `torchaudio.models.decoder` module doc (#2684) · c89ab0c6

moto authored Sep 16, 2022

Summary:
* Adopts `:autosummary:` in decoder module doc
* Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using.
* Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc.

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html

<img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png">

https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder

<img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2684

Reviewed By: carolineechen

Differential Revision: D39574272

Pulled By: mthrok

fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773

c89ab0c6

Adopt `:autosummary:` in `torchaudio.io` module doc (#2681) · f50a9286

moto authored Sep 15, 2022

Summary:
This commit adopts :autosummary: directive to `torchaudio.io` module.
It adds table of contents on `torchaudio.io` level.

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html
<img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png">

https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader
<img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png">

Pull Request resolved: https://github.com/pytorch/audio/pull/2681

Reviewed By: carolineechen

Differential Revision: D39560459

Pulled By: mthrok

fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd

f50a9286