- 26 Oct, 2023 1 commit
-
-
moto authored
-
- 14 Aug, 2023 1 commit
-
-
moto authored
Summary: * Merge backend doc into torchaudio toplevel doc * Update backend, dispatcher, installation doc Pull Request resolved: https://github.com/pytorch/audio/pull/3555 Reviewed By: huangruizhe Differential Revision: D48326812 Pulled By: mthrok fbshipit-source-id: cc0d7326eacfebd341323b5d613ca1777255748b
-
- 07 Aug, 2023 1 commit
-
-
moto authored
Summary: Port the MMS FA model from tutorial to the library with post-processing module. Pull Request resolved: https://github.com/pytorch/audio/pull/3521 Reviewed By: huangruizhe Differential Revision: D48038285 Pulled By: mthrok fbshipit-source-id: 571cf0fceaaab4790983be2719f1a85805b814f5
-
- 01 Aug, 2023 1 commit
-
-
hwangjeff authored
Summary: Adds pre-trained VGGish inference pipeline ported from https://github.com/harritaylor/torchvggish and https://github.com/tensorflow/models/tree/master/research/audioset. Pull Request resolved: https://github.com/pytorch/audio/pull/3491 Reviewed By: mthrok Differential Revision: D47738130 Pulled By: hwangjeff fbshipit-source-id: 859c1ff1ec1b09dae4e26586169544571657cc67
-
- 28 Jul, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR move `SquimObjective` and `SquimSubjective` models and corresponding factory functions and pre-trained pipelines out of prototype and to the core directory. They will be included in the next official release. Pull Request resolved: https://github.com/pytorch/audio/pull/3512 Reviewed By: mthrok Differential Revision: D47837434 Pulled By: nateanl fbshipit-source-id: d0639f29079f7e1afc30f236849e530c8cadffd8
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 01 Apr, 2023 1 commit
-
-
moto authored
Summary: This commit adds a new feature AudioEffector, which can be used to apply various effects and codecs to waveforms in Tensor. Under the hood it uses StreamWriter and StreamReader to apply filters and encode/decode. This is going to replace the deprecated `apply_codec` and `apply_sox_effect_tensor` functions. It can also perform online, chunk-by-chunk filtering. Tutorial to follow. closes https://github.com/pytorch/audio/issues/3161 Pull Request resolved: https://github.com/pytorch/audio/pull/3163 Reviewed By: hwangjeff Differential Revision: D44576660 Pulled By: mthrok fbshipit-source-id: 2c5cc87082ab431315d29d56d6ac9efaf4cf7aeb
-
- 27 Mar, 2023 1 commit
-
-
hwangjeff authored
Summary: For `StreamWriter`, * Renames arg `config` to codec_config`. * Renames struct `EncodingConfig` and dataclass `EncodeConfig` to `CodecConfig`. * Adds docstrings for arg codec_config`. * Updates `chunk` to `frames` in `write_*_chunk` methods. Pull Request resolved: https://github.com/pytorch/audio/pull/3203 Reviewed By: mthrok Differential Revision: D44350153 Pulled By: hwangjeff fbshipit-source-id: 1b940b1366a43ec0565c362bfcbf62744088b343
-
- 21 Mar, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Add model architecture and factory functions for `SquimSubjective` which predicts subjective evaluation metric scores (e.g. MOS) for speech enhancement task. Pull Request resolved: https://github.com/pytorch/audio/pull/3189 Reviewed By: mthrok Differential Revision: D44267255 Pulled By: nateanl fbshipit-source-id: f8060398b14c625b38ea1bb2417f61aeaec3f1db
-
- 17 Mar, 2023 1 commit
-
-
moto authored
Summary: Adds config object `EncodingConfig` and modifies `StreamWriter` to allow for passing in additional encoder configuration parameters, e.g. bit rate and compression level. Pull Request resolved: https://github.com/pytorch/audio/pull/3179 Pull Request resolved: https://github.com/pytorch/audio/pull/3164 Reviewed By: mthrok Differential Revision: D43861413 Pulled By: hwangjeff fbshipit-source-id: c1682cb2f6e682ab6f1a506511d2be7c7b254161
-
- 08 Mar, 2023 1 commit
-
-
moto authored
Summary: This commit adds fields to OutputStream, which shows the result of fitlers, such as width and height after filtering. Before ``` OutputStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray') ``` After ``` OutputVideoStream( source_index=0, filter_description='fps=3,scale=width=320:height=320,format=pix_fmts=gray', media_type='video', format='gray', width=320, height=320, frame_rate=3.0) ``` Pull Request resolved: https://github.com/pytorch/audio/pull/3155 Reviewed By: nateanl Differential Revision: D43882399 Pulled By: mthrok fbshipit-source-id: 620676b1a06f293fdd56de8203a11120f228fa2d
-
- 24 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3084 Reviewed By: mthrok Differential Revision: D43550150 Pulled By: nateanl fbshipit-source-id: 5c5e3d9461e375be202493e3399ff38ce5cd7690
-
- 09 Feb, 2023 1 commit
-
-
moto authored
Summary: - Add documentation - Tweak docsrting - Fix import Pull Request resolved: https://github.com/pytorch/audio/pull/3051 Reviewed By: weiwangmeta, atalman, nateanl Differential Revision: D43166081 Pulled By: mthrok fbshipit-source-id: 7d77aa34a6318a64824626cff8372f8b9aebf6f9
-
- 01 Feb, 2023 1 commit
-
-
moto authored
Summary: Adding C++ documentation. (C++ APIs are categorized as prototype, though it's used by Python beta APIs.) https://output.circle-artifacts.com/output/job/69654229-a99e-4b15-9ce0-7bc6bcf01101/artifacts/0/docs/libtorchaudio.html <img width="1202" alt="Screenshot 2023-01-31 at 11 48 47 AM" src="https://user-images.githubusercontent.com/855818/215828167-d23032f8-9e40-4413-b5b1-5cbd12d705e9.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2994 Reviewed By: hwangjeff Differential Revision: D42876621 Pulled By: mthrok fbshipit-source-id: d8b8d610b87ec766501baa88b7506368a9905a6a
-
- 22 Jan, 2023 1 commit
-
-
moto authored
Summary: This commit makes `StreamReader` report PTS (presentation time stamp) of the returned chunk as well. Example ```python from torchaudio.io import StreamReader s = StreamReader(...) s.add_video_stream(...) for (video_chunk, ) in s.stream(): # video_chunk is Torch tensor type but has extra attribute of PTS print(video_chunk.pts) # reports the PTS of the first frame of the video chunk. ``` For the backward compatibility, we introduce a `_ChunkTensor`, that is a composition of Tensor and metadata, but works like a normal tensor in PyTorch operations. The implementation of `_ChunkTensor` is based on [TrivialTensorViaComposition](https://github.com/albanD/subclass_zoo/blob/0eeb1d68fb59879029c610bc407f2997ae43ba0a/trivial_tensors.py#L83). It was also suggested to attach metadata directly to Tensor object, but the possibility to have the collision on torchaudio's metadata and new attributes introduced in PyTorch cannot be ignored, so we use Tensor subclass implementation. If any unexpected issue arise from metadata attribute name collision, client code can fetch the bare Tensor and continue. Pull Request resolved: https://github.com/pytorch/audio/pull/2975 Reviewed By: hwangjeff Differential Revision: D42526945 Pulled By: mthrok fbshipit-source-id: b4e9422e914ff328421b975120460f3001268f35
-
- 13 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: XLSR (cross-lingual speech representation) are a set of cross-lingual self-supervised learning models for generating cross-lingual speech representation. It was first proposed in https://arxiv.org/pdf/2006.13979.pdf which is trained on 53 languages (so-called XLSR-53). This PR supports more XLS-R models from https://arxiv.org/pdf/2111.09296.pdf that have more parameters (300M, 1B, 2B) and are trained on 128 languages. Pull Request resolved: https://github.com/pytorch/audio/pull/2959 Reviewed By: mthrok Differential Revision: D42397643 Pulled By: nateanl fbshipit-source-id: 23e8e51a7cde0a226db4f4028db7df8f02b986ce
-
- 10 Dec, 2022 1 commit
-
-
moto authored
Summary: Currently, the documentation page for `torchaudio.models` have separate sections for model definitions and factory functions. The relationships between models and factory functions are not immediately clear. This commit moves the list of factory functions to the list of models. After: - https://output.circle-artifacts.com/output/job/242a9521-7460-4043-895b-9995bf5093b5/artifacts/0/docs/generated/torchaudio.models.Wav2Vec2Model.html <img width="1171" alt="Screen Shot 2022-12-08 at 8 41 03 PM" src="https://user-images.githubusercontent.com/855818/206603743-74a6e368-c3cf-4b87-b854-518a95893f06.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2902 Reviewed By: carolineechen Differential Revision: D41897800 Pulled By: mthrok fbshipit-source-id: a3c01d28d80e755596a9bc37c951960eb84870b9
-
- 13 Oct, 2022 1 commit
-
-
moto authored
Summary: * Document `__call__` instead of `__init__` * List CTCHypothesis first as it is used in combination with CTCDecoder * Fix indentation of score method docstring Pull Request resolved: https://github.com/pytorch/audio/pull/2766 Reviewed By: carolineechen Differential Revision: D40349388 Pulled By: mthrok fbshipit-source-id: 5e512e6c2b29d3533eb62d09b289154ccd1abf4c
-
- 03 Oct, 2022 1 commit
-
-
moto authored
Summary: Adopt `:autosummary:` to various modules * torchaudio.compliance.kaldi * torchaudio.sox_effects * torchaudio.utils Pull Request resolved: https://github.com/pytorch/audio/pull/2664 Reviewed By: nateanl Differential Revision: D39841873 Pulled By: mthrok fbshipit-source-id: ff4fa6976324fca5f35b737b715f976e2a722bac
-
- 22 Sep, 2022 1 commit
-
-
moto authored
Summary: * Introduce the mini-index at `torchaudio.datasets` page. * Standardize the format of return type docstring. https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/datasets.html <img width="936" alt="Screen Shot 2022-09-21 at 6 56 52 PM" src="https://user-images.githubusercontent.com/855818/191475141-a97f2bea-705f-49bc-8c34-6ec869e76793.png"> https://output.circle-artifacts.com/output/job/989328b2-0270-4958-b577-19cf749af3fd/artifacts/0/docs/generated/torchaudio.datasets.CMUDict.html#torchaudio.datasets.CMUDict <img width="1069" alt="Screen Shot 2022-09-21 at 6 57 32 PM" src="https://user-images.githubusercontent.com/855818/191475293-e3302528-27ea-4212-9c12-fd6d900fdf3e.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2692 Reviewed By: carolineechen Differential Revision: D39687463 Pulled By: mthrok fbshipit-source-id: 4175fc15388817d2fe76206188618dd1576281df
-
- 21 Sep, 2022 2 commits
-
-
moto authored
Summary: * Introduce the mini-index at `torchaudio.pipelines` page. * Add introductions * Update pipeline tutorials https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/pipelines.html <img width="1163" alt="Screen Shot 2022-09-20 at 1 23 29 PM" src="https://user-images.githubusercontent.com/855818/191167049-98324e93-2e16-41db-8538-3b5b54eb8224.png"> <img width="1115" alt="Screen Shot 2022-09-20 at 1 23 49 PM" src="https://user-images.githubusercontent.com/855818/191167071-4770f594-2540-43a4-a01c-e983bf59220f.png"> https://output.circle-artifacts.com/output/job/ccc57d95-1930-45c9-b967-c8d477d35f29/artifacts/0/docs/generated/torchaudio.pipelines.RNNTBundle.html#torchaudio.pipelines.RNNTBundle <img width="1108" alt="Screen Shot 2022-09-20 at 1 24 18 PM" src="https://user-images.githubusercontent.com/855818/191167123-51b33a5f-c30c-46bc-b002-b05d2d0d27b7.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2689 Reviewed By: carolineechen Differential Revision: D39691253 Pulled By: mthrok fbshipit-source-id: ddf5fdadb0b64cf2867b6271ba53e8e8c0fa7e49
-
moto authored
Summary: * Introduce the mini-index at `torchaudio.models` page. https://output.circle-artifacts.com/output/job/25e59810-3866-4ece-b1b7-8a10c7a2286d/artifacts/0/docs/models.html <img width="1042" alt="Screen Shot 2022-09-20 at 1 20 50 PM" src="https://user-images.githubusercontent.com/855818/191166816-83314ad1-8b67-475b-aa10-d4cc59126295.png"> <img width="1048" alt="Screen Shot 2022-09-20 at 1 20 58 PM" src="https://user-images.githubusercontent.com/855818/191166829-1ceb65e0-9506-4328-9a2f-8b75b4e54404.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2690 Reviewed By: carolineechen Differential Revision: D39654948 Pulled By: mthrok fbshipit-source-id: 703d1526617596f647c85a7148f41ca55fffdbc8
-
- 16 Sep, 2022 3 commits
-
-
moto authored
Summary: * Introduce the mini-index at `torchaudio.transforms` page. * Add "Augmentations" subsection. * Also updated the overall introduction. https://output.circle-artifacts.com/output/job/1b65246a-403c-4d2c-b97d-d1b582d8b4e5/artifacts/0/docs/transforms.html <img width="721" alt="Screen Shot 2022-09-16 at 5 20 08 PM" src="https://user-images.githubusercontent.com/855818/190591795-97c169db-a95b-480a-8d3c-d80072efa045.png"> <img width="755" alt="Screen Shot 2022-09-16 at 5 20 28 PM" src="https://user-images.githubusercontent.com/855818/190591828-03026918-febd-4194-91aa-7d8f704e17cc.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2683 Reviewed By: carolineechen Differential Revision: D39574255 Pulled By: mthrok fbshipit-source-id: a4beed7cacbb5184bad96efa903a3a1123dab627
-
moto authored
Summary: * Adopts `:autosummary:` in decoder module doc * Hide the constructor signature of `CTCDecoder` as `ctc_decoder` function is the one client code is supposed to be using. * Introduce `children` property to `CTCDecoderLMState` otherwise it does not show up in the doc. https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/models.decoder.html <img width="748" alt="Screen Shot 2022-09-16 at 5 23 22 PM" src="https://user-images.githubusercontent.com/855818/190592409-0c2ec8a4-d2cf-4d76-a965-8a570faaeb1a.png"> https://output.circle-artifacts.com/output/job/7aac5eb9-7d2d-4f63-bcdf-83a6f40b4e5a/artifacts/0/docs/generated/torchaudio.models.decoder.CTCDecoder.html#torchaudio.models.decoder.CTCDecoder <img width="723" alt="Screen Shot 2022-09-16 at 5 23 53 PM" src="https://user-images.githubusercontent.com/855818/190592501-3fad1e07-ae3e-44f5-93be-f33181025390.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2684 Reviewed By: carolineechen Differential Revision: D39574272 Pulled By: mthrok fbshipit-source-id: d977660bd46f5cf98c535adbf2735be896b28773
-
moto authored
Summary: This commit adopts :autosummary: directive to `torchaudio.io` module. It adds table of contents on `torchaudio.io` level. https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/io.html <img width="1094" alt="Screen Shot 2022-09-16 at 7 33 32 AM" src="https://user-images.githubusercontent.com/855818/190520248-27e469f8-7689-4dc2-b591-7b3f08bb4dff.png"> https://output.circle-artifacts.com/output/job/282089d1-c120-4d22-809f-0e0ac0947c37/artifacts/0/docs/generated/torchaudio.io.StreamReader.html#torchaudio.io.StreamReader <img width="1108" alt="Screen Shot 2022-09-16 at 7 33 59 AM" src="https://user-images.githubusercontent.com/855818/190520292-d090fed0-2f18-4961-b9f3-9e4808fd437e.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2681 Reviewed By: carolineechen Differential Revision: D39560459 Pulled By: mthrok fbshipit-source-id: 3de5f22b8d8d0834dfd8bac8619fbfaa44c5f4dd
-