- 07 Jun, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: The PR contains the CTC fine-tuning recipe of HuBERT Base model. The files include: - lightning module - training script - README and the result table - evaluation scripts Pull Request resolved: https://github.com/pytorch/audio/pull/2352 Reviewed By: hwangjeff Differential Revision: D36915712 Pulled By: nateanl fbshipit-source-id: 0249635ad5e81a8aa2d228c1d5fe84d78b62a15b
-
moto authored
Summary: - Adopt `torchaudio.utils.download_asset` to simplify asset management. - Break down the first section about helper functions. - Use tempfile so that executing tutorial won't leave any artifacts on local file system. Example: https://output.circle-artifacts.com/output/job/b11a0087-8bf9-4999-a74f-b53798eaa77f/artifacts/0/docs/tutorials/audio_io_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2385 Reviewed By: hwangjeff Differential Revision: D36404399 Pulled By: mthrok fbshipit-source-id: 106af34e8ddd22a061aa12767b444b32aef07bad
-
- 04 Jun, 2022 1 commit
-
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2437 Refactors LibriSpeech Lightning datamodule to accommodate different dataset implementations. Reviewed By: carolineechen, nateanl Differential Revision: D36731577 fbshipit-source-id: 4ba91044311fa3f99a928aef6ef411316955f6b5
-
- 03 Jun, 2022 3 commits
-
-
moto authored
Summary: - Adopt `torchaudio.utils.download_asset` to simplify asset management. - Break down the first section about helper functions. - Reduce the number of helper functions https://output.circle-artifacts.com/output/job/d7dd1b93-6dfe-46da-a080-109bfdc63881/artifacts/0/docs/tutorials/audio_data_augmentation_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2388 Reviewed By: carolineechen Differential Revision: D36404405 Pulled By: mthrok fbshipit-source-id: f460ed810519797fce6e2fa7baaee110bddd1d06
-
moto authored
Summary: - Replace mis-use of plot_specgram with plot_sweep, and remove plot_specgram - Move `benchmark_resample` to later section https://output.circle-artifacts.com/output/job/9f7af187-777d-4d75-840f-2630a36295b7/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2386 Reviewed By: carolineechen Differential Revision: D36404403 Pulled By: mthrok fbshipit-source-id: f9df8453e3f531bdc4549b0134e5dbba90653bf7
-
moto authored
Summary: - Adopt torchaudio.utils.download_asset to simplify asset management. - Break down the first section about helper functions. - Reduce the number of helper functions Pull Request resolved: https://github.com/pytorch/audio/pull/2391 Reviewed By: carolineechen, nateanl Differential Revision: D36885626 Pulled By: mthrok fbshipit-source-id: 1306f22ab70ab1e7f74ed7e43bf43150015448b6
-
- 02 Jun, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Use `download_asset` to download audios. - Replace `MVDR` module with new-added `SoudenMVDR` and `RTFMVDR` modules. - Benchmark performances of `F.rtf_evd` and `F.rtf_power` for RTF computation. - Visualize the spectrograms and masks. Pull Request resolved: https://github.com/pytorch/audio/pull/2398 Reviewed By: carolineechen Differential Revision: D36549402 Pulled By: nateanl fbshipit-source-id: dfd6754e6c33246e6991ccc51c4603b12502a1b5
-
- 01 Jun, 2022 1 commit
-
-
Caroline Chen authored
Summary: Move CTC beam search decoder out of prototype to new `torchaudio.models.decoder` module. hwangjeff mthrok any thoughts on the new module + naming, and if we should move rnnt beam search here as well?? Pull Request resolved: https://github.com/pytorch/audio/pull/2410 Reviewed By: mthrok Differential Revision: D36784521 Pulled By: carolineechen fbshipit-source-id: a2ec52f86bba66e03327a9af0c5df8bbefcd67ed
-
- 26 May, 2022 1 commit
-
-
nateanl authored
-
- 23 May, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Replace https://github.com/pytorch/audio/issues/2129 Pull Request resolved: https://github.com/pytorch/audio/pull/2198 Reviewed By: carolineechen Differential Revision: D36544163 Pulled By: nateanl fbshipit-source-id: 3f19ba5b0f2c2b9e93b0603c3b4491c1dbc40ef8
-
- 21 May, 2022 1 commit
-
-
moto authored
Summary: This commit adds file-like object support to Streaming API. ## Features - File-like objects are expected to implement `read(self, n)`. - Additionally `seek(self, offset, whence)` is used if available. - Without `seek` method, some formats cannot be decoded properly. - To work around this, one can use the existing `decoder` option to tell what decoder it should use. - The set of `decoder` and `decoder_option` arguments were added to `add_basic_[audio|video]_stream` method, similar to `add_[audio|video]_stream`. - So as to have the arguments common to both audio and video in front of the rest of the arguments, the order of the arguments are changed. - Also `dtype` and `format` arguments were changed to make them consistent across audio/video methods. ## Code structure The approach is very similar to how file-like object is supported in sox-based I/O. In Streaming API if the input src is string, it is passed to the implementation bound with TorchBind, if the src has `read` attribute, it is passed to the same implementation bound via PyBind 11.  ## Refactoring involved - Extracted to https://github.com/pytorch/audio/issues/2402 - Some implementation in the original TorchBind surface layer is converted to Wrapper class so that they can be re-used from PyBind11 bindings. The wrapper class serves to simplify the binding. - `add_basic_[audio|video]_stream` methods were removed from C++ layer as it was just constructing string and passing it to `add_[audio|video]_stream` method, which is simpler to do in Python. - The original core Streamer implementation kept the use of types in `c10` namespace minimum. All the `c10::optional` and `c10::Dict` were converted to the equivalents of `std` at binding layer. But since they work fine with PyBind11, Streamer core methods deal them directly. ## TODO: - [x] Check if it is possible to stream MP4 (yuv420p) from S3 and directly decode (with/without HW decoding). Pull Request resolved: https://github.com/pytorch/audio/pull/2400 Reviewed By: carolineechen Differential Revision: D36520073 Pulled By: mthrok fbshipit-source-id: a11d981bbe99b1ff0cc356e46264ac8e76614bc6
-
- 15 May, 2022 1 commit
-
-
John Reese authored
Summary: Applies new import merging and sorting from µsort v1.0. When merging imports, µsort will make a best-effort to move associated comments to match merged elements, but there are known limitations due to the diynamic nature of Python and developer tooling. These changes should not produce any dangerous runtime changes, but may require touch-ups to satisfy linters and other tooling. Note that µsort uses case-insensitive, lexicographical sorting, which results in a different ordering compared to isort. This provides a more consistent sorting order, matching the case-insensitive order used when sorting import statements by module name, and ensures that "frog", "FROG", and "Frog" always sort next to each other. For details on µsort's sorting and merging semantics, see the user guide: https://usort.readthedocs.io/en/stable/guide.html#sorting Reviewed By: lisroach Differential Revision: D36402214 fbshipit-source-id: b641bfa9d46242188524d4ae2c44998922a62b4c
-
- 13 May, 2022 1 commit
-
-
moto authored
Summary: This commit moves the Streaming API out of prototype module. * The related classes are renamed as following - `Streamer` -> `StreamReader`. - `SourceStream` -> `StreamReaderSourceStream` - `SourceAudioStream` -> `StreamReaderSourceAudioStream` - `SourceVideoStream` -> `StreamReaderSourceVideoStream` - `OutputStream` -> `StreamReaderOutputStream` This change is preemptive measurement for the possibility to add `StreamWriter` API. * Replace BUILD_FFMPEG build arg with USE_FFMPEG We are not building FFmpeg, so USE_FFMPEG is more appropriate --- After https://github.com/pytorch/audio/issues/2377 Remaining TODOs: (different PRs) - [ ] Introduce `is_ffmpeg_binding_available` function. - [ ] Refactor C++ code: - Rename `Streamer` to `StreamReader`. - Rename `streamer.[h|cpp]` to `stream_reader.[h|cpp]`. - Rename `prototype.cpp` to `stream_reader_binding.cpp`. - Introduce `stream_reader` directory. - [x] Enable FFmpeg in smoke test (https://github.com/pytorch/audio/issues/2381) Pull Request resolved: https://github.com/pytorch/audio/pull/2378 Reviewed By: carolineechen Differential Revision: D36359299 Pulled By: mthrok fbshipit-source-id: 6a57b702996af871e577fb7addbf3522081c1328
-
- 12 May, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform). This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so. - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/2296 Reviewed By: mthrok Differential Revision: D36323217 Pulled By: nateanl fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
-
John Reese authored
Summary: Applies the black-fbsource codemod with the new build of pyfmt. paintitblack Reviewed By: lisroach Differential Revision: D36324783 fbshipit-source-id: 280c09e88257e5e569ab729691165d8dedd767bc
-
- 11 May, 2022 1 commit
-
-
hwangjeff authored
Summary: Modifies the example LibriSpeech Conformer RNN-T recipe as follows: - Moves data loading and transforms logic from lightning module to data module (improves generalizability and reusability of lightning module and data module). - Moves transforms logic from dataloader collator function to dataset (resolves dataloader multiprocessing issues on certain platforms). - Replaces lambda functions with `partial` equivalents (resolves pickling issues in certain runtime environments). - Modifies training script to allow for specifying path model checkpoint to restart training from. Pull Request resolved: https://github.com/pytorch/audio/pull/2366 Reviewed By: mthrok Differential Revision: D36305028 Pulled By: hwangjeff fbshipit-source-id: 0b768da5d5909136c55418bf0a3c2ddd0c5683ba
-
- 28 Apr, 2022 1 commit
-
-
moto authored
Summary: libmad integration should be enabled only from source-build Pull Request resolved: https://github.com/pytorch/audio/pull/2354 Reviewed By: nateanl Differential Revision: D36012035 Pulled By: mthrok fbshipit-source-id: adeda8cbfd418f96245909cae6862b648a6915a7
-
- 26 Apr, 2022 1 commit
-
-
Caroline Chen authored
Summary: Add support for lexicon free decoding based on [fairseq's](https://github.com/pytorch/fairseq/blob/main/examples/speech_recognition/new/decoders/flashlight_decoder.py#L53) implementation. Reached numerical parity with fairseq's decoder in offline experimentation Follow ups - Add pretrained LM support for lex free decoding - Add example in tutorial - Replace flashlight C++ source code with flashlight text submodule - [optional] fairseq compatibility test Pull Request resolved: https://github.com/pytorch/audio/pull/2342 Reviewed By: nateanl Differential Revision: D35856104 Pulled By: carolineechen fbshipit-source-id: b64286550984df906ebb747e82f6fb1f21948ac7
-
- 22 Apr, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode. The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`. The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value. Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same. Pull Request resolved: https://github.com/pytorch/audio/pull/2299 Reviewed By: hwangjeff Differential Revision: D35781538 Pulled By: nateanl fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a
-
- 21 Apr, 2022 1 commit
-
-
hwangjeff authored
Summary: PyTorch Lite, which is becoming a standard for mobile PyTorch usage, does not support containers containing custom classes. Consequently, because TorchAudio's RNN-T decoder currently returns and accepts lists of `Hypothesis` namedtuples, it is not compatible with PyTorch Lite. This PR resolves said incompatibility by changing the underlying implementation of `Hypothesis` to tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2339 Reviewed By: nateanl Differential Revision: D35806529 Pulled By: hwangjeff fbshipit-source-id: 9cbae5504722390511d35e7f9966af2519ccede5
-
- 13 Apr, 2022 2 commits
-
-
hwangjeff authored
Summary: Adds Conformer RNN-T LibriSpeech training recipe to examples directory. Produces 30M-parameter model that achieves the following WER: | | WER | |:-------------------:|-------------:| | test-clean | 0.0310 | | test-other | 0.0805 | | dev-clean | 0.0314 | | dev-other | 0.0827 | Pull Request resolved: https://github.com/pytorch/audio/pull/2329 Reviewed By: xiaohui-zhang Differential Revision: D35578727 Pulled By: hwangjeff fbshipit-source-id: afa9146c5b647727b8605d104d928110a1d3976d
-
hwangjeff authored
Summary: Tutorial notebooks that leverage TorchAudio prototype features don't run as-is on Google Colab due to its runtime's not having nightly builds pre-installed. To make it easier for users to run said notebooks in Colab, this PR adds a code block that installs nightly Pytorch and TorchAudio builds as a comment that users can copy and run locally. Pull Request resolved: https://github.com/pytorch/audio/pull/2325 Reviewed By: xiaohui-zhang Differential Revision: D35597753 Pulled By: hwangjeff fbshipit-source-id: 59914e492ad72e31c0136a48cd88d697e8ea5f6c
-
- 05 Apr, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: The multi-processing works well on MFCC features. However, it sometimes makes the script hang when dumping HuBERT features. Change it to for-loop resolves the issue. Pull Request resolved: https://github.com/pytorch/audio/pull/2311 Reviewed By: mthrok Differential Revision: D35393813 Pulled By: nateanl fbshipit-source-id: afdc14557a1102b20ecd5fafba0964a913250a11
-
- 04 Apr, 2022 2 commits
-
-
Caroline Chen authored
Summary: update example ASR pipeline to use the recently added pretrained LM API for decoding Pull Request resolved: https://github.com/pytorch/audio/pull/2317 Reviewed By: mthrok Differential Revision: D35361354 Pulled By: carolineechen fbshipit-source-id: cac7cf55bd9f86417f319191c1405819fe2a7b46
-
Zhaoheng Ni authored
Summary: Some arguments in `ArgumentParser` are not used in the `lexicon_decoder`. Fix them to use the ones in the parser. Pull Request resolved: https://github.com/pytorch/audio/pull/2315 Reviewed By: carolineechen Differential Revision: D35357678 Pulled By: nateanl fbshipit-source-id: 4e70418cf03708b82bc158cafd9999a80ad08f92
-
- 01 Apr, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: When checkpoint is on GPU device and preprocessing is on CPU, the script will throw an exception error. Fix it to load the model state dictionary into CPU by default. Pull Request resolved: https://github.com/pytorch/audio/pull/2310 Reviewed By: mthrok Differential Revision: D35316903 Pulled By: nateanl fbshipit-source-id: d3e7183400ba133240aa6d205f5c671a421a9fed
-
- 25 Mar, 2022 1 commit
-
-
Caroline Chen authored
Summary: add function to download pretrained files for LibriSpeech 3-gram/4-gram KenLM, tests, and updated tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/2275 Reviewed By: mthrok Differential Revision: D35115418 Pulled By: carolineechen fbshipit-source-id: 83ff22380fce9c753bb4a7b7e3d89aa66c2831c0
-
- 24 Mar, 2022 2 commits
-
-
Caroline Chen authored
Summary: rendered: - [tutorial](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html) - [docs](https://output.circle-artifacts.com/output/job/e7fb5a23-87cf-4dd5-b4a8-8b4f91e20eb4/artifacts/0/docs/prototype.ctc_decoder.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2278 Reviewed By: mthrok Differential Revision: D35097734 Pulled By: carolineechen fbshipit-source-id: 1e5d5fff0b7740757cca358cf3ea44c6488fcd5c
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2288 Reviewed By: hwangjeff Differential Revision: D35099492 Pulled By: mthrok fbshipit-source-id: 955c5e617469009ae2600d2764d601d794ee916f
-
- 22 Mar, 2022 1 commit
-
-
Hagen Wierstorf authored
Summary: The calculation of the SNR in tha data augmentation examples seems to be wrong to me:  If we start from the definition of the signal-to-noise ratio using the root mean square value we get: ``` SNR = 20 log10 ( rms(scale * speech) / rms(noise) ) ``` this can be transformed to ``` scale = 10^(SNR/20) rms(noise) / rms(speech) ``` In the example not `rms` is used but `lambda x: x.norm(p=2)`, but as we have the same length of the speech and noise signal, we have ``` rms(noise) / rms(speech) = noise.norm(p=2) / speech.norm(p=2) ``` this would lead us to: ``` 10^(SNR/20) = e^(SNR / 10) ``` which is not true. Hence I changed `e^(SNR / 10)` to `10^(SNR/20)`. For the proposed SNR values of 20 dB, 10 dB, 3 dB the value of the scale would change from 7.39, 2.72, 1.35 to 10.0, 3.16, 1.41. Pull Request resolved: https://github.com/pytorch/audio/pull/2285 Reviewed By: nateanl Differential Revision: D35047737 Pulled By: mthrok fbshipit-source-id: ac24c8fd48ef06b4b611e35163084644330a3ef3
-
- 17 Mar, 2022 1 commit
-
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2281 Reviewed By: carolineechen Differential Revision: D34939494 Pulled By: mthrok fbshipit-source-id: e97100b95a8e3d3e28805d8fab43b66120c2254d
-
- 10 Mar, 2022 1 commit
-
-
moto authored
Summary: Follo-up on post-commit review from https://github.com/pytorch/audio/issues/2202 Pull Request resolved: https://github.com/pytorch/audio/pull/2270 Reviewed By: hwangjeff Differential Revision: D34793460 Pulled By: mthrok fbshipit-source-id: 039ddeca015fc77b89c571820b7ef2b0857f5723
-
- 08 Mar, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2143 Reviewed By: carolineechen Differential Revision: D34722238 Pulled By: nateanl fbshipit-source-id: 72809c9db91c94d8e853c80ed8522eeffe5ff136
-
- 26 Feb, 2022 1 commit
-
-
moto authored
Summary: This commit adds tutorial for device ASR, and update API for device streaming. The changes for the interface are 1. Add `timeout` and `backoff` parameters to `process_packet` and `stream` methods. 2. Move `fill_buffer` method to private. When dealing with device stream, there are situations where the device buffer is not ready and the system returns `EAGAIN`. In such case, the previous implementation of `process_packet` method raised an exception in Python layer , but for device ASR, this is inefficient. A better approach is to retry within C++ layer in blocking manner. The new `timeout` parameter serves this purpose. Pull Request resolved: https://github.com/pytorch/audio/pull/2202 Reviewed By: nateanl Differential Revision: D34475829 Pulled By: mthrok fbshipit-source-id: bb6d0b125d800f87d189db40815af06fbd4cab59
-
- 24 Feb, 2022 1 commit
-
-
Caroline Chen authored
Summary: fix a style check failure from internal diff Pull Request resolved: https://github.com/pytorch/audio/pull/2258 Reviewed By: nateanl Differential Revision: D34459526 Pulled By: carolineechen fbshipit-source-id: d0e6782b5689c3bf63214a4ec6a75dd757678e0d
-
- 23 Feb, 2022 1 commit
-
-
Binh Tang authored
Summary: We proactively remove references to the deprecated DDP accelerator to prepare for the breaking changes following the release of PyTorch Lighting 1.6 (see T112240890). Differential Revision: D34295318 fbshipit-source-id: 7b2245ca9c7c2900f510722b33af8d8eeda49919
-
- 17 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: - Refactor the current `LibriSpeechRNNTModule`'s unit test. - Add unit tests for `TEDLIUM3RNNTModule` and `MuSTCRNNTModule` - Replace the lambda with partial in `TEDLIUM3RNNTModule` to pass the lightning unit test. Pull Request resolved: https://github.com/pytorch/audio/pull/2240 Reviewed By: mthrok Differential Revision: D34285195 Pulled By: nateanl fbshipit-source-id: 4f20749c85ddd25cbb0eafc1733c64212542338f
-
moto authored
Summary: https://554729-90321822-gh.circle-artifacts.com/0/docs/tutorials/online_asr_tutorial.html 1. Add figure to explain the caching 2. Fix the initialization of stream iterator Pull Request resolved: https://github.com/pytorch/audio/pull/2226 Reviewed By: carolineechen Differential Revision: D34265971 Pulled By: mthrok fbshipit-source-id: 243301e74c4040f4b8cd111b363e70da60e5dae4
-
- 16 Feb, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: This PR adds ``EMFORMER_RNNT_BASE_MUSTC`` support in `pipeline_demo.py`. The bundle is trained on MuST-C release 2.0 dataset. The model preserves the casing and punctuations in the transcript. Here is a screen recording of how it works in streaming and non-streaming modes: https://user-images.githubusercontent.com/8653221/154356521-fe84bdc1-fb0c-41bd-8729-9edbb3224a07.mov Pull Request resolved: https://github.com/pytorch/audio/pull/2248 Reviewed By: hwangjeff Differential Revision: D34282598 Pulled By: nateanl fbshipit-source-id: 42ed7e2623031dfebd176ef0c6bfd70da3c897d4
-
Zhaoheng Ni authored
Summary: - Use dictionary to select the `RNNTBundle` and the corresponding dataset. - Use the dictionary's keys as choices in ArgumentParser Pull Request resolved: https://github.com/pytorch/audio/pull/2239 Reviewed By: mthrok Differential Revision: D34267070 Pulled By: nateanl fbshipit-source-id: 99c7942d5c7c1518694e1ae02a55a7decd87c220
-