- 27 Feb, 2023 5 commits
-
-
Zhaoheng Ni authored
Summary: Add pre-trained pipeline support for `SquimObjective` model. The pre-trained model is trained on DNS 2020 challenge dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/3103 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43611794 Pulled By: nateanl fbshipit-source-id: 0ac76a27e7027a43ffccb158385ddb2409b8526d
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3105 Refactor the construction of Audio/VideoOutputStream Reviewed By: nateanl Differential Revision: D43613013 fbshipit-source-id: 0e112cb1bab2658be68a368099ed00ef318ea4f1
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3106 Refactor Audio/VideoOutputStream. Reviewed By: nateanl Differential Revision: D43613008 fbshipit-source-id: 36c62fe00903066982573866d07de4e79b34240d
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3104 Continuation of StreamWriter refactoring This commit extract Encoder (+muxer) from OutputStream Reviewed By: nateanl Differential Revision: D43610887 fbshipit-source-id: 30a9862b1aabd5af331ce3f33a5815df1decbad1
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3100 Refactor StreamWriter and move OutputStream to dedicated source, then split them into separate audio/video class. Reviewed By: nateanl Differential Revision: D43587337 fbshipit-source-id: 0fdbd1f56a7200dc6849e95eb9678854f5d933b8
-
- 25 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3099 Reviewed By: mthrok Differential Revision: D43596866 Pulled By: nateanl fbshipit-source-id: 43a139bf8ebdf3261414e2855aefc3b53df298ac
-
- 24 Feb, 2023 5 commits
-
-
Vladislav Agafonov authored
Summary: Add `Wav2Vec2DataModule` in self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3081 Reviewed By: mthrok Differential Revision: D43579239 Pulled By: nateanl fbshipit-source-id: 3e935eb9a18ef0259a58940ae466cbdc3baf8494
-
Vladislav Agafonov authored
Summary: Add wav2vec2 loss function in the self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3090 Reviewed By: mthrok Differential Revision: D43579220 Pulled By: nateanl fbshipit-source-id: 4b52792b518ddc5b01c9660c90ceb3c4ad1f0237
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3095 Reviewed By: nateanl Differential Revision: D43544998 Pulled By: mthrok fbshipit-source-id: 4359cdbbdbee53084016a84129cb3d65900b0457
-
moto authored
Summary: This commit is kind of clean up and preparation for future development. We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we use PyBind11 for binding StreamWriter. Pull Request resolved: https://github.com/pytorch/audio/pull/3091 Reviewed By: xiaohui-zhang Differential Revision: D43515714 Pulled By: mthrok fbshipit-source-id: 9097bb104bbf8c1536a5fab6f87447c08b10a7f2
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3084 Reviewed By: mthrok Differential Revision: D43550150 Pulled By: nateanl fbshipit-source-id: 5c5e3d9461e375be202493e3399ff38ce5cd7690
-
- 23 Feb, 2023 5 commits
-
-
moto authored
Summary: This commit is kind of clean up and preparation for future development. We plan to pass around more complicated objects among StreamReader and StreamWriter, and TorchBind is not expressive enough for defining intermediate object, so we want to use PyBind11 for binding StreamReader/Writer. PyBind11 converts Python dict into std::map, while TorchBind converts it into c10::Dict. Because of this descrepancy, conversion from c10::Dict to std::map have to happen in multiple places, and this makes the binding code thicker as it requires to wrapper methods. Using std::map reduces the number of wrapper methods / conversions, because the same method can be bound for file-like object and the others. Pull Request resolved: https://github.com/pytorch/audio/pull/3092 Reviewed By: nateanl Differential Revision: D43524808 Pulled By: mthrok fbshipit-source-id: f7467c66ccd37dbf4abc337bbb18ffaac21a0058
-
G. Sun authored
Summary: This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing. An example for Librispeech can be found in audio/examples/asr/librispeech_biasing. Maintainer's note (mthrok): It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple could cause some issue without running the code, so the code is not changed, though the annotation uses tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2890 Reviewed By: nateanl Differential Revision: D43171447 Pulled By: mthrok fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e
-
mthrok authored
Summary: Remove the Tensor input support from StreamReader Follow up of https://github.com/pytorch/audio/pull/3086 Pull Request resolved: https://github.com/pytorch/audio/pull/3093 Reviewed By: xiaohui-zhang Differential Revision: D43526066 Pulled By: mthrok fbshipit-source-id: 57ba4866c413649173e1c2c3b23ba7de3231b7bc
-
moto authored
Summary: The same functionality can be achieved with passing io.BytesIO to the constructor. Pull Request resolved: https://github.com/pytorch/audio/pull/3086 Reviewed By: nateanl Differential Revision: D43500360 Pulled By: mthrok fbshipit-source-id: 2c6f37d100f50553b283c75c04fe57c8f9c07dc9
-
moto authored
Summary: 1. Fix spacing. 2. Move it to after successful import 3. Add link to the announcement issue Pull Request resolved: https://github.com/pytorch/audio/pull/3089 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D43514075 Pulled By: mthrok fbshipit-source-id: 3b2a24c65c63dab8c12c9c6aa1942a8354b2c0f1
-
- 22 Feb, 2023 3 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3087 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D43509865 Pulled By: nateanl fbshipit-source-id: 569cc2ee8edd9de0b7d255a1e1075ac812b26cc8
-
Zhaoheng Ni authored
Summary: The negative sampling should be applied to unmasked features in masked indices, the PR fixes the logic in ConformerWav2Vec2PretrainModel. Pull Request resolved: https://github.com/pytorch/audio/pull/3085 Reviewed By: mthrok Differential Revision: D43488570 Pulled By: nateanl fbshipit-source-id: 3820400d50b74216bb98ca6a40dc6a7acca01564
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3042 Reviewed By: mthrok Differential Revision: D43405932 Pulled By: nateanl fbshipit-source-id: 88f6dabae35565b699230e9909b8f68f4a57f5c7
-
- 21 Feb, 2023 1 commit
-
-
Chin-Yun Yu authored
Summary: I encountered the following errors when using the filter with gradients being enabled. ```sh Traceback (most recent call last): File "/home/ycy/working/audio/test_backward.py", line 20, in <module> loss.backward() File "/home/ycy/miniconda3/envs/nightly/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward torch.autograd.backward( File "/home/ycy/miniconda3/envs/nightly/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Expected input_signal_windows.is_contiguous() && a_coeff_flipped.is_contiguous() && padded_output_waveform.is_contiguous() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.) ``` This can happen if the outputs from lfilter was used by other operations. ### How to reproduce The following script can reproduce the error on the stable and nightly versions. ```python import torch import torch.nn.functional as F from torchaudio.functional import lfilter a = torch.rand(250, 26, requires_grad=True) b = torch.ones(250, 26, requires_grad=True) x = torch.rand(250, 1024, requires_grad=True) w = torch.eye(1024).unsqueeze(1) y = lfilter(x, a, b, False) y = F.conv_transpose1d( y.t().unsqueeze(0), w, stride=256, ).squeeze() print(y.shape) target = torch.ones_like(y) loss = torch.nn.functional.mse_loss(y, target) loss.backward() ``` ### Cause The inner call of differentiable IIR in the backward pass needs to ensure the input is contiguous. Adding a `contiguous()` call solve the problem. Pull Request resolved: https://github.com/pytorch/audio/pull/3080 Reviewed By: xiaohui-zhang Differential Revision: D43466612 Pulled By: mthrok fbshipit-source-id: 375e0a147988656da47ac8397f7de6eae512a655
-
- 17 Feb, 2023 3 commits
-
-
hwangjeff authored
Summary: Makes lengths input optional for `torchaudio.functional.speed`, `torchaudio.transforms.Speed`, and `torchaudio.transforms.SpeedPerturbation`. Pull Request resolved: https://github.com/pytorch/audio/pull/3072 Reviewed By: nateanl, mthrok Differential Revision: D43371406 Pulled By: hwangjeff fbshipit-source-id: ecb38bcc2bfff5c5a396a37eff238b22238e795a
-
atalman authored
Summary: Same as: https://github.com/pytorch/vision/pull/7263 Pull Request resolved: https://github.com/pytorch/audio/pull/3071 Reviewed By: weiwangmeta Differential Revision: D43377741 Pulled By: atalman fbshipit-source-id: 0dbe0aaa10b9a4bad713563e98642b1a65e9ac07
-
Daniel Walker authored
Summary: This PR adds a precondition check to the `CTCDecoder` that raises a helpful exception when called on a noncontiguous emissions tensor. Currently, noncontiguous tensors can be passed into the CTCDecoder, which in turn passes the tensors to the backing Flashlight C++ library and results in undefined behavior, since Flashlight requires the tensors to be laid out in contiguous memory. The following code demonstrates the problem: ``` import torch from torchaudio.models.decoder import ctc_decoder tokens = ['a', '-', '|'] decoder = ctc_decoder(lexicon=None, tokens=tokens) emissions = torch.rand(len(tokens), 2) # N x T contiguous emissions = emissions.t() # T x N noncontiguous batch = emissions.unsqueeze(0) result = decoder(batch) # undefined behavior!!! ``` I stumbled on the issue accidentally when I noticed the decoder wasn't giving the expected results on my input only to realize, finally, that the tensor I had passed in was noncontiguous. In my case, Flashlight was iterating over unrelated segments of memory where it had expected to find a contiguous tensor. A precondition check will hopefully save others from making the same mistake. Pull Request resolved: https://github.com/pytorch/audio/pull/3074 Reviewed By: nateanl, xiaohui-zhang Differential Revision: D43376011 Pulled By: mthrok fbshipit-source-id: 7c95aa8016d8f9f2d65b5b816a859b28ea4629f5
-
- 16 Feb, 2023 5 commits
-
-
hwangjeff authored
Summary: With the introduction of the backend dispatcher, importing torchaudio fails when ffmpeg is not available. This PR adds guards to resolve these failures. Pull Request resolved: https://github.com/pytorch/audio/pull/3073 Reviewed By: NivekT, mthrok Differential Revision: D43372870 Pulled By: hwangjeff fbshipit-source-id: 7f6c2795430d7aeb742c2feb97984d5273f20aac
-
Zhaoheng Ni authored
Summary: The `BucketizeBatchSampler` may return different iter_list in different node if `shuffle` is `True`, which will cause DPP training hang forever. `shuffle` in `DistributedSampler` only happens in initialization, which means it will assign the same subset to replicas in all training epochs. The PR fixes the two above issues. cc arlofaria Pull Request resolved: https://github.com/pytorch/audio/pull/3068 Reviewed By: mthrok Differential Revision: D43372110 Pulled By: nateanl fbshipit-source-id: a162728406ae995e05d2a07cfc2444fb76cf345e
-
Zhaoheng Ni authored
Summary: In https://github.com/pytorch/audio/issues/2873, layer normalization is applied to waveforms for SSL models trained on large scale datasets. The word error rate is significantly reduced after the change. The PR updates the results for the affected models. Without the change in https://github.com/pytorch/audio/issues/2873, here is the WER result table: | Model | dev-clean | dev-other | test-clean | test-other | |:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:| | [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) | 10.59| 15.62| 9.58| 16.33| | [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) | 2.80| 6.01| 2.82| 6.34| | [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) | 2.36| 4.43| 2.41| 4.96| | [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) | 1.85| 3.46| 2.09| 3.89| | [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) | 2.21| 3.40| 2.26| 4.05| After applying layer normalization, here is the updated result: | Model | dev-clean | dev-other | test-clean | test-other | |:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:| | [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) | 6.77| 10.03| 6.87| 10.51| | [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) | 2.19| 4.55| 2.32| 4.64| | [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) | 1.78| 3.51| 2.03| 3.68| | [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) | 1.77| 3.32| 2.03| 3.68| | [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) | 1.73| 2.72| 1.90| 3.16| Pull Request resolved: https://github.com/pytorch/audio/pull/3070 Reviewed By: mthrok Differential Revision: D43365313 Pulled By: nateanl fbshipit-source-id: 34a60ad2e5eb1299da64ef88ff0208ec8ec76e91
-
moto authored
Summary: Flashlight Text decoder is now available on PyPI and KenLM support is being added at https://github.com/flashlight/text/pull/43 Once this work is merged, we can rely on the official distribution of Flashlight Text package, so we are adding deprecation warning. Once the decoder is fully available, one can install it with ``` pip install flashlight-text pip install git+https://github.com/kpu/kenlm.git ``` Pull Request resolved: https://github.com/pytorch/audio/pull/3055 Reviewed By: hwangjeff, nateanl Differential Revision: D43239150 Pulled By: mthrok fbshipit-source-id: 728cb208b8403100cd4ccd80c6295d454756b414
-
hwangjeff authored
Summary: Adds I/O backend dispatcher that routes I/O requests to FFmpeg, SoX, or Soundfile backend, per library availability. It allows users to specify a backend mapped to a media library, i.e. one of `["ffmpeg", "sox", "soundfile"]`, to use via keyword argument, with FFmpeg being the default. Environment variable `TORCHAUDIO_USE_BACKEND_DISPATCHER` gates enablement of the dispatcher; specifically, if `TORCHAUDIO_USE_BACKEND_DISPATCHER` is explicitly set to `1`, importing TorchAudio makes it accessible via `torchaudio.info`, `torchaudio.load`, and `torchaudio.save`. Pull Request resolved: https://github.com/pytorch/audio/pull/3015 Reviewed By: mthrok Differential Revision: D43258649 Pulled By: hwangjeff fbshipit-source-id: 8f12e4e56b9fa3f0814dd3fed3e1783ab23a53a1
-
- 15 Feb, 2023 5 commits
-
-
Cole Li authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3056 Task #2 from https://github.com/pytorch/audio/issues/2835 Reviewed By: mthrok Differential Revision: D42854156 fbshipit-source-id: e1b3bd992c91fedc55f30a814e16efd7c51e0c80
-
hwangjeff authored
Summary: Relaxes input dimension matching constraint on `convolve` to enable broadcasting for inputs. Pull Request resolved: https://github.com/pytorch/audio/pull/3061 Reviewed By: mthrok Differential Revision: D43298078 Pulled By: hwangjeff fbshipit-source-id: a6cc36674754523b88390fac0a05f06562921319
-
Jeff Hwang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3058 Adds FFmpeg-based save function. Reviewed By: mthrok Differential Revision: D43264858 fbshipit-source-id: ae3f89012bc2520f3de11af65348ba8f77f0acff
-
hwangjeff authored
Summary: Updates tutorial "Audio Data Augmentation" to use two of the newly introduced data augmentation operators in beta: `torchaudio.functional.fftconvolve` and `torchaudio.functional.add_noise`. Pull Request resolved: https://github.com/pytorch/audio/pull/3062 Reviewed By: mthrok Differential Revision: D43298120 Pulled By: hwangjeff fbshipit-source-id: 09ca736a5c67242568515d600b7d31eab32c2df1
-
moto authored
Summary: * Mention context manager in StreamWriter * Add FFmpeg as optional dependency Pull Request resolved: https://github.com/pytorch/audio/pull/3064 Reviewed By: hwangjeff Differential Revision: D43307818 Pulled By: mthrok fbshipit-source-id: 86339d973aba85e090f520e08af65b5d736e3d18
-
- 14 Feb, 2023 4 commits
-
-
Omkar Salpekar authored
Summary: Add triggers for RC branches and tags to all build workflows. This will ensure that the release-candidate builds will run with `CHANNEL=test`. Pull Request resolved: https://github.com/pytorch/audio/pull/3057 Reviewed By: atalman Differential Revision: D43279657 Pulled By: osalpekar fbshipit-source-id: 5abf3994b9b4a4897f53c540bd1db6c3d624b3e0
-
Zhaoheng Ni authored
Summary: - Rename the current `ssl` example to `self_supervised_learning` - Add README to demonstrate how to run the recipe with hubert task Pull Request resolved: https://github.com/pytorch/audio/pull/3060 Reviewed By: mthrok Differential Revision: D43287868 Pulled By: nateanl fbshipit-source-id: 10352682485ef147ca32f4c4c9f9cde995444aa0
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3053 Reviewed By: nateanl Differential Revision: D43238766 Pulled By: mthrok fbshipit-source-id: 4f82878b1c97b0e6a35af75855849b86200e6061
-
Zhaoheng Ni authored
Summary: replicate of https://github.com/pytorch/audio/issues/2644 Pull Request resolved: https://github.com/pytorch/audio/pull/2880 Reviewed By: mthrok Differential Revision: D41633911 Pulled By: nateanl fbshipit-source-id: 73cf145d75c389e996aafe96571ab86dc21f86e5
-
- 11 Feb, 2023 1 commit
-
-
moto authored
Summary: Par https://github.com/pytorch/audio/issues/3040 and https://github.com/pytorch/audio/issues/3041, it turned out Google Colab now has FFmpeg with GPU decoder/encoder preinstalled, and installing FFmpeg manually corrups the environment. This commit updates the tutorial by extracting and moving the how-to-install part to installation/build section. closes https://github.com/pytorch/audio/issues/3041 closes https://github.com/pytorch/audio/issues/3040 Pull Request resolved: https://github.com/pytorch/audio/pull/3050 Reviewed By: nateanl Differential Revision: D43166054 Pulled By: mthrok fbshipit-source-id: 32667f292a796344d5fcde86e8231e15ad904e58
-
- 10 Feb, 2023 1 commit
-
-
Wei Wang authored
Summary: So far Linux and MacOS were tested to work fine out of the box. This PR is created to verify this -- disabled windows jobs and configs for now. Pull Request resolved: https://github.com/pytorch/audio/pull/3039 Reviewed By: osalpekar Differential Revision: D43174745 Pulled By: weiwangmeta fbshipit-source-id: 81766905256e03c5a01cb5448a350f5d409ca4b8
-
- 09 Feb, 2023 1 commit
-
-
moto authored
Summary: - Add documentation - Tweak docsrting - Fix import Pull Request resolved: https://github.com/pytorch/audio/pull/3051 Reviewed By: weiwangmeta, atalman, nateanl Differential Revision: D43166081 Pulled By: mthrok fbshipit-source-id: 7d77aa34a6318a64824626cff8372f8b9aebf6f9
-