- 07 Jun, 2022 2 commits
-
-
Caroline Chen authored
Summary: ctc decoder has been moved to beta, remove prototype message from tutorial (this is done on the release branch in https://github.com/pytorch/audio/issues/2457) Pull Request resolved: https://github.com/pytorch/audio/pull/2459 Reviewed By: hwangjeff Differential Revision: D36978417 Pulled By: carolineechen fbshipit-source-id: e580c1e8475a1a0aa924d44deea3852adc332a86
-
moto authored
Summary: - Adopt `torchaudio.utils.download_asset` to simplify asset management. - Break down the first section about helper functions. - Use tempfile so that executing tutorial won't leave any artifacts on local file system. Example: https://output.circle-artifacts.com/output/job/b11a0087-8bf9-4999-a74f-b53798eaa77f/artifacts/0/docs/tutorials/audio_io_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2385 Reviewed By: hwangjeff Differential Revision: D36404399 Pulled By: mthrok fbshipit-source-id: 106af34e8ddd22a061aa12767b444b32aef07bad
-
- 03 Jun, 2022 3 commits
-
-
moto authored
Summary: - Adopt `torchaudio.utils.download_asset` to simplify asset management. - Break down the first section about helper functions. - Reduce the number of helper functions https://output.circle-artifacts.com/output/job/d7dd1b93-6dfe-46da-a080-109bfdc63881/artifacts/0/docs/tutorials/audio_data_augmentation_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2388 Reviewed By: carolineechen Differential Revision: D36404405 Pulled By: mthrok fbshipit-source-id: f460ed810519797fce6e2fa7baaee110bddd1d06
-
moto authored
Summary: - Replace mis-use of plot_specgram with plot_sweep, and remove plot_specgram - Move `benchmark_resample` to later section https://output.circle-artifacts.com/output/job/9f7af187-777d-4d75-840f-2630a36295b7/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2386 Reviewed By: carolineechen Differential Revision: D36404403 Pulled By: mthrok fbshipit-source-id: f9df8453e3f531bdc4549b0134e5dbba90653bf7
-
moto authored
Summary: - Adopt torchaudio.utils.download_asset to simplify asset management. - Break down the first section about helper functions. - Reduce the number of helper functions Pull Request resolved: https://github.com/pytorch/audio/pull/2391 Reviewed By: carolineechen, nateanl Differential Revision: D36885626 Pulled By: mthrok fbshipit-source-id: 1306f22ab70ab1e7f74ed7e43bf43150015448b6
-
- 02 Jun, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Use `download_asset` to download audios. - Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules. - Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation. - Visualize the spectrograms and masks. Pull Request resolved: https://github.com/pytorch/audio/pull/2398 Reviewed By: carolineechen Differential Revision: D36549402 Pulled By: nateanl fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5
-
- 01 Jun, 2022 1 commit
-
-
Caroline Chen authored
Summary: Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module. hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well?? Pull Request resolved: https://github.com/pytorch/audio/pull/2410 Reviewed By: mthrok Differential Revision: D36784521 Pulled By: carolineechen fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
-
- 21 May, 2022 1 commit
-
-
moto authored
Summary: This commit adds file-like object support to Streaming API. ## Features - File-like objects are expected to implement `read(self, n)`. - Additionally `seek(self, offset, whence)` is used if available. - Without `seek` method, some formats cannot be decoded properly. - To work around this, one can use the existing `decoder` option to tell what decoder it should use. - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`. - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed. - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods. ## Code structure The approach is very similar to how file-like object is supported in sox-based I/O. In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind, if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.  ## Refactoring involved - Extracted to https://github.com/pytorch/audio/issues/2402 - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding. - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python. - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly. ## TODO: - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding). Pull Request resolved: https://github.com/pytorch/audio/pull/2400 Reviewed By: carolineechen Differential Revision: D36520073 Pulled By: mthrok fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
-
- 13 May, 2022 1 commit
-
-
moto authored
Summary: This commit moves the Streaming API out of prototype module. * The related classes are renamed as following - `Streamer` -> `StreamReader`. - `SourceStream` -> `StreamReaderSourceStream` - `SourceAudioStream` -> `StreamReaderSourceAudioStream` - `SourceVideoStream` -> `StreamReaderSourceVideoStream` - `OutputStream` -> `StreamReaderOutputStream` This change is preemptive measurement for the possibility to add `StreamWriter` API. * Replace BUILD_FFMPEG build arg with USE_FFMPEG We are not building FFmpeg, so USE_FFMPEG is more appropriate --- After https://github.com/pytorch/audio/issues/2377 Remaining TODOs: (different PRs) - [ ] Introduce `is_ffmpeg_binding_available` function. - [ ] Refactor C++ code: - Rename `Streamer` to `StreamReader`. - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`. - Rename `prototype.cpp` to `stream_reader_binding.cpp`. - Introduce `stream_reader` directory. - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381) Pull Request resolved: https://github.com/pytorch/audio/pull/2378 Reviewed By: carolineechen Differential Revision: D36359299 Pulled By: mthrok fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
-
- 12 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies the black-fbsource codemod with the new build of pyfmt. paintitblack Reviewed By: lisroach Differential Revision: D36324783 fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
-
- 28 Apr, 2022 1 commit
-
-
moto authored
Summary: libmad integration should be enabled only from source-build Pull Request resolved: https://github.com/pytorch/audio/pull/2354 Reviewed By: nateanl Differential Revision: D36012035 Pulled By: mthrok fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7
-
- 26 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
- 21 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2339 Reviewed By: nateanl Differential Revision: D35806529 Pulled By: hwangjeff fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
-
- 13 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally. Pull Request resolved: https://github.com/pytorch/audio/pull/2325 Reviewed By: xiaohui-zhang Differential Revision: D35597753 Pulled By: hwangjeff fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c
-
- 25 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/2275 Reviewed By: mthrok Differential Revision: D35115418 Pulled By: carolineechen fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0
-
- 24 Mar, 2022 2 commits
-
-
Caroline Chen authored
Summary: rendered: - [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2278 Reviewed By: mthrok Differential Revision: D35097734 Pulled By: carolineechen fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288 Reviewed By: hwangjeff Differential Revision: D35099492 Pulled By: mthrok fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f
-
- 22 Mar, 2022 1 commit
-
-
Hagen Wierstorf authored
Summary: The calculation of the SNR in tha data augmentation examples seems to be wrong to me:  If we start from the definition of the signal-to-noise ratio using the root mean square value we get: ``` SNR = 20 log10 ( rms(scale * speech) / rms(noise) ) ``` this can be transformed to ``` scale = 10^(SNR/20) rms(noise) / rms(speech) ``` In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have ``` rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2) ``` this would lead us to: ``` 10^(SNR/20) = e^(SNR / 10) ``` which is not true. Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`. For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41. Pull Request resolved: https://github.com/pytorch/audio/pull/2285 Reviewed By: nateanl Differential Revision: D35047737 Pulled By: mthrok fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3
-
- 17 Mar, 2022 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281 Reviewed By: carolineechen Differential Revision: D34939494 Pulled By: mthrok fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d
-
- 10 Mar, 2022 1 commit
-
-
moto authored
Summary: Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202 Pull Request resolved: https://github.com/pytorch/audio/pull/2270 Reviewed By: hwangjeff Differential Revision: D34793460 Pulled By: mthrok fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723
-
- 26 Feb, 2022 1 commit
-
-
moto authored
Summary: This commit adds tutorial for device ASR, and update API for device streaming. The changes for the interface are 1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods. 2. Move `fill_buffer` method to private. When dealing with device stream, there are situations where the device buffer is not ready and the system returns `EAGAIN`. In such case, the previous implementation of `process_packet` method raised an exception in Python layer , but for device ASR, this is inefficient. A better approach is to retry within C++ layer in blocking manner. The new `timeout` parameter serves this purpose. Pull Request resolved: https://github.com/pytorch/audio/pull/2202 Reviewed By: nateanl Differential Revision: D34475829 Pulled By: mthrok fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
-
- 17 Feb, 2022 1 commit
-
-
moto authored
Summary: https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html 1. Add figure to explain the caching 2. Fix the initialization of stream iterator Pull Request resolved: https://github.com/pytorch/audio/pull/2226 Reviewed By: carolineechen Differential Revision: D34265971 Pulled By: mthrok fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4
-
- 15 Feb, 2022 1 commit
-
-
moto authored
Summary: Updating the context cacher so that fetched audio chunk is used for inference immediately. https://github.com/pytorch/audio/pull/2202#discussion_r802838174 Pull Request resolved: https://github.com/pytorch/audio/pull/2213 Reviewed By: hwangjeff Differential Revision: D34235230 Pulled By: mthrok fbshipit-source-id: 6e4aee7cca34ca81e40c0cb13497182f20f7f04e
-
- 09 Feb, 2022 1 commit
-
-
hwangjeff authored
Summary: Yesterday's release of librosa 0.9.0 made args keyword-only and changed default padding from "reflect" to "zero" for some functions. This PR adjusts callsites in our tutorials and tests accordingly. Pull Request resolved: https://github.com/pytorch/audio/pull/2208 Reviewed By: mthrok Differential Revision: D34099793 Pulled By: hwangjeff fbshipit-source-id: 4e2642cdda8aae6d0a928befaf1bbb3873d229bc
-
- 03 Feb, 2022 1 commit
-
-
moto authored
Summary: * tutorial for streaming API https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/streaming_api_tutorial.html * tutorial for online speech recognition with Emformer RNN-T https://541810-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2193 Reviewed By: hwangjeff Differential Revision: D33971312 Pulled By: mthrok fbshipit-source-id: f9b69114255f15eaf4463ca85b3efb0ba321a95f
-
- 02 Feb, 2022 1 commit
-
-
Caroline Chen authored
Summary: resulting tutorial: https://538358-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html - add visualization for timestep alignments - modify section organization for decoder construction Pull Request resolved: https://github.com/pytorch/audio/pull/2188 Reviewed By: mthrok Differential Revision: D33954937 Pulled By: carolineechen fbshipit-source-id: 8f397229d74c994b8793a30623e1de4c19ebd401
-
- 31 Jan, 2022 1 commit
-
-
moto authored
Summary: Changing the URL of tutorial assets to `download.pytorch.org` which is more appropriate for user facing materials. Pull Request resolved: https://github.com/pytorch/audio/pull/2182 Reviewed By: nateanl Differential Revision: D33887839 Pulled By: mthrok fbshipit-source-id: 30569672e8caf30aae5476036dfdadc8ebd436bf
-
- 27 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for CTC lexicon decoder without LM support by adding a non language model `ZeroLM` that returns score 0 for everything. Generalize the decoder class/API a bit to support this, adding it as an option for the kenlm decoder at the moment (will likely be separated out from kenlm when adding support for other kinds of LMs in the future) Pull Request resolved: https://github.com/pytorch/audio/pull/2174 Reviewed By: hwangjeff, nateanl Differential Revision: D33798674 Pulled By: carolineechen fbshipit-source-id: ef8265f1d046011b143597b3b7c691566b08dcde
-
- 26 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: following up on https://github.com/pytorch/audio/pull/2141#discussion_r779055465, adding brief beam search description and linking to resources Pull Request resolved: https://github.com/pytorch/audio/pull/2173 Reviewed By: nateanl Differential Revision: D33791731 Pulled By: carolineechen fbshipit-source-id: 603fdd177c9a3c8276a4692fb7bb385bd01b9bfb
-
- 20 Jan, 2022 1 commit
-
-
yonMaor authored
Summary: Closes https://github.com/pytorch/audio/issues/2162 Pull Request resolved: https://github.com/pytorch/audio/pull/2163 Reviewed By: nateanl Differential Revision: D33666354 Pulled By: mthrok fbshipit-source-id: 3e7a963b9ac85046317df8d5dab91af363e5668b
-
- 07 Jan, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add explanation and demonstration of different beam search decoder parameters. Additionally use a better sample audio file and load in with token list instead of tokens file. Pull Request resolved: https://github.com/pytorch/audio/pull/2141 Reviewed By: mthrok Differential Revision: D33463230 Pulled By: carolineechen fbshipit-source-id: d3dd6452b03d4fc2e095d778189c66f7161e4c68
-
- 29 Dec, 2021 1 commit
-
-
moto authored
Summary: ### Change list * Split the documentation of prototypes * Add a new API reference section dedicated for prototypes. * Hide the signature of KenLMLexiconDecoder constructor. (cc carolineechen ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html#torchaudio.prototype.ctc_decoder.KenLMLexiconDecoder * Hide the signature of RNNT constructor. (cc hwangjeff ) * https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html#torchaudio.prototype.RNNT * Tweak CTC tutorial * Replace hyperlinks to API reference with backlinks * Add `progress=False` to download ### Follow-up RNNT decoder and CTC decode returns their own `Hypothesis` classes. When I tried to add Hypothesis of CTC decode to the documentation, the build process complains that it's ambiguous. I think the Hypothesis classes can be put inside of each decoder. (if TorchScript supports it) or make the name different, but in that case the interface of each Hypothesis has to be generic enough. ### Before https://pytorch.org/audio/main/prototype.html <img width="1390" alt="Screen Shot 2021-12-28 at 1 05 53 PM" src="https://user-images.githubusercontent.com/855818/147594425-6c7f8126-ab76-4edc-a616-a00901e7e9ef.png"> ### After https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.html <img width="1202" alt="Screen Shot 2021-12-28 at 8 37 35 PM" src="https://user-images.githubusercontent.com/855818/147619281-8152b1ae-e127-40b2-a944-dc11b114b629.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.rnnt.html <img width="1415" alt="Screen Shot 2021-12-28 at 8 38 27 PM" src="https://user-images.githubusercontent.com/855818/147619331-077b55b5-c5e9-47ab-bfe6-873e41c738c8.png"> https://489516-90321822-gh.circle-artifacts.com/0/docs/prototype.ctc_decoder.html <img width="1417" alt="Screen Shot 2021-12-28 at 8 39 04 PM" src="https://user-images.githubusercontent.com/855818/147619364-63df3457-a4b2-4223-973f-f4301bd45280.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2108 Reviewed By: hwangjeff, carolineechen, nateanl Differential Revision: D33340816 Pulled By: mthrok fbshipit-source-id: 870edfadbe41d6f8abaf78fdb7017b3980dfe187
-
- 28 Dec, 2021 2 commits
-
-
Caroline Chen authored
Summary: demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html follow-ups: - incorporate `nbest` - demonstrate customizability of different beam search parameters Pull Request resolved: https://github.com/pytorch/audio/pull/2106 Reviewed By: mthrok Differential Revision: D33340946 Pulled By: carolineechen fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7
-
moto authored
Summary: This commit updates the documentation configuration so that if an API (function or class) is used in tutorials, then it automatically add the links to the tutorials. It also adds `py:func:` so that it's easy to jump from tutorials to API reference. Note: the use of `py:func:` is not required to be recognized by Shpinx-gallery. * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#feature-extractions <img width="776" alt="Screen Shot 2021-12-24 at 12 41 43 PM" src="https://user-images.githubusercontent.com/855818/147367407-cd86f114-7177-426a-b5ee-a25af17ae476.png"> * https://482162-90321822-gh.circle-artifacts.com/0/docs/transforms.html#mvdr <img width="769" alt="Screen Shot 2021-12-24 at 12 42 31 PM" src="https://user-images.githubusercontent.com/855818/147367422-01fd245f-2f25-4875-a206-910e17ae0161.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2101 Reviewed By: hwangjeff Differential Revision: D33311283 Pulled By: mthrok fbshipit-source-id: e0c124d2a761e0f8d81c3d14c4ffc836ffffe288
-
- 23 Dec, 2021 1 commit
-
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
- 21 Dec, 2021 1 commit
-
-
moto authored
Summary: 1. Reorder Audio display so that audios are playable from browser in doc 2. Add link to function documentations https://470342-90321822-gh.circle-artifacts.com/0/docs/tutorials/audio_data_augmentation_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2082 Reviewed By: carolineechen Differential Revision: D33227725 Pulled By: mthrok fbshipit-source-id: c7ee360b6f9b84c8e0a9b72193b98487d03b57ab
-
- 11 Nov, 2021 1 commit
-
-
nateanl authored
-
- 10 Nov, 2021 1 commit
-
-
Krishna Kalyan authored
-
- 05 Nov, 2021 2 commits