- 09 Dec, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: After https://github.com/pytorch/audio/issues/2873, the pre-trained Wav2Vec2 models with larger datasets can get better performances. The PR fixes the integration test of bundle `WAV2VEC2_ASR_LARGE_LV60K_10M` which predicts the word `CURIOUSITY` to `CURIOUSSITY` before but now to `CURIOUSITY` correctly. Pull Request resolved: https://github.com/pytorch/audio/pull/2910 Reviewed By: mthrok Differential Revision: D41881919 Pulled By: nateanl fbshipit-source-id: 236fd00b983a5205c731f3efa31033a6b8257cab
-
atalman authored
Summary: Toggle on/off ffmpeg test if needed By default it ON, hence should not affect any current tests. To toggle ON no change required. To toggle OFF use: ``` smoke_test.py --no-ffmpeg ``` To be used when calling from builder currently. Since we do not install ffmpeg currently. Pull Request resolved: https://github.com/pytorch/audio/pull/2901 Reviewed By: carolineechen, mthrok Differential Revision: D41874976 Pulled By: atalman fbshipit-source-id: c57b19f37c63a1f476f93a5211550e980e67d9c7
-
- 08 Dec, 2022 1 commit
-
-
Grigory Sizov authored
Summary: Part 1 of [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314) This PR ports the generator part of [HiFi GAN](https://arxiv.org/abs/2010.05646v2) from [the original implementation](https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/models.py#L75) Adds tests: - Smoke tests for architectures V1, V2, V3 - Check that output shapes are correct - Check that the model is torchscriptable and scripting doesn't change the output - Check that our code's output matches the original implementation. Here I clone the original repo inside `/tmp` and import necessary objects from inside the test function. On test teardown I restore `PATH`, but don't remove the cloned code, so that it can be reused on subsequent runs - let me know if removing it would be a better practice There are no quantization tests, because the model consists mainly of `Conv1d` and `ConvTransposed1d`, and they are [not supported by dynamic quantization](https://pytorch.org/docs/stable/quantization.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2860 Reviewed By: nateanl Differential Revision: D41433416 Pulled By: sgrigory fbshipit-source-id: f135c560df20f5138f01e3efdd182621edabb4f5
-
- 07 Dec, 2022 2 commits
-
-
hwangjeff authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2889 Reviewed By: xiaohui-zhang Differential Revision: D41760084 Pulled By: hwangjeff fbshipit-source-id: d2f5253e1fae7e7aafa9fa6043c6a7045c5b33a0
-
hwangjeff authored
Summary: Introduces the MUSAN dataset (https://www.openslr.org/17/), which contains music, speech, and noise recordings. Pull Request resolved: https://github.com/pytorch/audio/pull/2888 Reviewed By: xiaohui-zhang Differential Revision: D41762164 Pulled By: hwangjeff fbshipit-source-id: 14d5baaa4d40f065dd5d99bf7f2e0a73aa6c31a9
-
- 06 Dec, 2022 1 commit
-
-
moto authored
Summary: This commit adds `frequency_impulse_response` function, which generates filter from desired frequency response. [Example](https://output.circle-artifacts.com/output/job/5233fda9-dadb-4710-9389-7e8ac20a062f/artifacts/0/docs/tutorials/filter_design_tutorial.html#frequency-sampling) Pull Request resolved: https://github.com/pytorch/audio/pull/2879 Reviewed By: hwangjeff Differential Revision: D41767787 Pulled By: mthrok fbshipit-source-id: 6d5e44c6390e8cf3028994a1b1de590ff3aaf6c2
-
- 04 Dec, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2885 In `_init_hubert_pretrain_model ` method which initialize the hubert pretrain models, `kaiming_normal_` should be applied on `ConvLayerBlock` instead of `LayerNorm` layer. This PR fixes it and adds more unit tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2886 Reviewed By: hwangjeff Differential Revision: D41713801 Pulled By: nateanl fbshipit-source-id: ed199baf7504d06bbf2d31c522ae708a75426a2d
-
- 02 Dec, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds pre-emphasis and de-emphasis functions. Pull Request resolved: https://github.com/pytorch/audio/pull/2871 Reviewed By: carolineechen Differential Revision: D41651097 Pulled By: hwangjeff fbshipit-source-id: 7a3cf6ce68b6ce1b9ae315ddd8bd8ed71acccdf1
-
- 30 Nov, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds functions and transforms for speed and speed perturbation (https://www.isca-speech.org/archive/interspeech_2015/ko15_interspeech.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2829 Reviewed By: xiaohui-zhang Differential Revision: D41285114 Pulled By: hwangjeff fbshipit-source-id: 114740507698e01f35d4beb2c568a2479e847506
-
- 29 Nov, 2022 3 commits
-
-
moto authored
Summary: This commit adds `sinc_impulse_response`, which generates windowed-sinc low-pass filters for given cutoff frequencies. Example usage: - [Filter Design Tutorial](https://output.circle-artifacts.com/output/job/c0085baa-5345-4aeb-bd44-448034caa9e1/artifacts/0/docs/tutorials/filter_design_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2875 Reviewed By: carolineechen Differential Revision: D41586631 Pulled By: mthrok fbshipit-source-id: a9991dbe5b137b0b4679228ec37072a1da7e50bb
-
moto authored
Summary: Currently, fftconvolve only accepts the tensors for the exact same leading dimensions. This commit loosens the restriction to allow shapes that are broadcast-able. This makes the fftconvolve operation more efficient for cases like signal filtering where one operand (waveform) is larger than the other (filter kernel) and the same filter kernels are applied across channels and batches. Pull Request resolved: https://github.com/pytorch/audio/pull/2874 Reviewed By: carolineechen Differential Revision: D41581588 Pulled By: mthrok fbshipit-source-id: c0117e11b979fb53236cc307a970a461b0e50134
-
Caroline Chen authored
Summary: modeled after [paper](https://arxiv.org/pdf/2110.07313.pdf) and internal flow f288347302 internal comparison tests: D40080919 Pull Request resolved: https://github.com/pytorch/audio/pull/2827 Reviewed By: nateanl Differential Revision: D41569046 Pulled By: carolineechen fbshipit-source-id: 43c5313074af05972d93da55b2029c746b75c380
-
- 28 Nov, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: - layer_norm in `EmformerEncoder` is set as default in emformer_hubert_model, change the type to be non-optional. - add `aux_num_out` to emformer_hubert_model to support fine-tuning model. - update unit tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2868 Reviewed By: carolineechen Differential Revision: D41451311 Pulled By: nateanl fbshipit-source-id: 5fa0f19255e4f01e001d62f8689e36f134030083
-
moto authored
Summary: Add `extend_pitch` function that can be used for augmenting fundamental frequencies with its harmonic overtones or inharmonic partials. it can be use for amplitude as well. For example usages, see https://output.circle-artifacts.com/output/job/4ad0c29a-d75a-4244-baad-f5499f11d94b/artifacts/0/docs/tutorials/synthesis_tutorial.html Part of https://github.com/pytorch/audio/issues/2835 Extracted from https://github.com/pytorch/audio/issues/2808 Pull Request resolved: https://github.com/pytorch/audio/pull/2863 Reviewed By: carolineechen Differential Revision: D41543880 Pulled By: mthrok fbshipit-source-id: 4f20e55770b0b3bee825ec07c73f9ec7cb181109
-
- 19 Nov, 2022 1 commit
-
-
moto authored
Summary: Missing from https://github.com/pytorch/audio/issues/2848 Pull Request resolved: https://github.com/pytorch/audio/pull/2864 Reviewed By: carolineechen Differential Revision: D41413381 Pulled By: mthrok fbshipit-source-id: 4377ed4a59504c6ade9ee6f42938a2bc3f04fb73
-
- 18 Nov, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2836 Reviewed By: carolineechen Differential Revision: D41208630 Pulled By: nateanl fbshipit-source-id: 625e1651f0b8a6e20876409739cf7084cb7c748b
-
- 17 Nov, 2022 2 commits
-
-
moto authored
Summary: Add adsr_envelope op, which generates ADSR envelope * Supports generation of the envelope on GPU * Supports optional Hold * Supports polynomial decay <image src='https://download.pytorch.org/torchaudio/doc-assets/adsr_examples.png'> Pull Request resolved: https://github.com/pytorch/audio/pull/2859 Reviewed By: nateanl Differential Revision: D41379601 Pulled By: mthrok fbshipit-source-id: 3717a6e0360d2a24913c2a836c57c5edec1d7b31
-
moto authored
Summary: This commit adds `oscillator_bank` op, which is the core of (differential) digital signal processing ops. The implementation itself is pretty simple, sum instantaneous frequencies, take sin and multiply with amplitudes. Following the magenta implementation, amplitudes for frequency range outside of [-Nyquist, Nyquist] \ are suppressed. The differentiability is tested within frequency range of [- Nyquist, Nyquist], and amplitude range of [-5, 5], which should be enough. For example usages: - https://output.circle-artifacts.com/output/job/129f3e21-41ce-406b-bc6b-833efb3c3141/artifacts/0/docs/tutorials/oscillator_tutorial.html - https://output.circle-artifacts.com/output/job/129f3e21-41ce-406b-bc6b-833efb3c3141/artifacts/0/docs/tutorials/synthesis_tutorial.html Part of https://github.com/pytorch/audio/issues/2835 Extracted from https://github.com/pytorch/audio/issues/2808 Pull Request resolved: https://github.com/pytorch/audio/pull/2848 Reviewed By: carolineechen Differential Revision: D41353075 Pulled By: mthrok fbshipit-source-id: 80e60772fb555760f2396f7df40458803c280225
-
- 15 Nov, 2022 1 commit
-
-
Grigory Sizov authored
Summary: Closes T136364380, follow-up to https://github.com/pytorch/audio/issues/2822 - Added "base", "base+", and "large" bundles for WavLM - Expanded `wav2vec2_pipeline_test.py` to include the new bundles - Added the new bundles to docs in `pipelines.rst` Pull Request resolved: https://github.com/pytorch/audio/pull/2833 Reviewed By: nateanl Differential Revision: D41194796 Pulled By: sgrigory fbshipit-source-id: bf8e96c05b6a81ac5c5a014c46adeeac12685328
-
- 14 Nov, 2022 1 commit
-
-
Caroline Chen authored
Summary: follow up to https://github.com/pytorch/audio/issues/2823 - move bark spectrogram to prototype - decrease autograd test tolerance (passing on circle ci) - add diagram for bark fbanks cc jdariasl Pull Request resolved: https://github.com/pytorch/audio/pull/2843 Reviewed By: nateanl Differential Revision: D41199522 Pulled By: carolineechen fbshipit-source-id: 8e6c2e20fb7b14f39477683b3c6ed8356359a213
-
- 10 Nov, 2022 2 commits
-
-
Julián D. Arias-Londoño authored
Summary: I have added BarkScale transform, which can transform a regular Spectrogram into a BarkSpectrograms similar to MelScale. ahmed-fau opened this requirement in December 2021 with the number (https://github.com/pytorch/audio/issues/2103). The new functionality includes three different well-known approximations of the Bark scale. Pull Request resolved: https://github.com/pytorch/audio/pull/2823 Reviewed By: nateanl Differential Revision: D41162100 Pulled By: carolineechen fbshipit-source-id: b2670c4972e49c9ef424da5d5982576f7a4df831
-
Caroline Chen authored
Summary: internal comparison tests: D40080919 follow up PR for pretrained models https://github.com/pytorch/audio/issues/2827 Pull Request resolved: https://github.com/pytorch/audio/pull/2826 Reviewed By: nateanl Differential Revision: D41160061 Pulled By: carolineechen fbshipit-source-id: f3c478b28c235af53d1d8e21b573c53684a63ac4
-
- 09 Nov, 2022 1 commit
-
-
Grigory Sizov authored
Summary: Closes T136364380 Added [WavLM Model](https://github.com/microsoft/UniSpeech/tree/main/WavLM): - Added `WavLMSelfAttention` class (from [original implementation](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py)) and adjusted existing Encoder and Transformer classes to be compatible with it - Added factory functions `wavlm_model`, `wavlm_base`, `wavlm_large` to `models/wav2vec2/model.py` - Added bundles for base and large models to pipelines. **TODO**: pre-trained model weights are not yet uploaded to `download.pytorch.org`, permissions not granted yet. ## Tests - Expanded HuggingFace integration tests to cover WavLM. For there tests, added JSON configs for base and large models from HF ([base](https://huggingface.co/microsoft/wavlm-base/blob/main/config.json), [large](https://huggingface.co/microsoft/wavlm-large/blob/main/config.json)) into test assets - Expanded TorchScript and quantization tests to cover WavLM ## Comments There are a few workarounds I had to introduce: - Quantization tests for WavLM were breaking down at [`torch.cat`](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR466) ~~until I excluded the arguments of `torch.cat` from quantization [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR368-R369). I haven't found a better way to fix it, let me know if there is one~~ The reason for this seems to be that quantization replaces `.bias` and `.weight` attributes of a `Linear` module with methods. Since we are using weights and biases directly, the code was break. The final solution suggested by nateanl was to define attention weights and biases directly in `WavLMSelfAttention`, skipping the `Linear` layers - ~~WavLM uses position embedding in the first layer of encoder, but not in the subsequent ones. So [UniSpeech](https://github.com/microsoft/UniSpeech/blob/2e9dde8bf815a5f5fd958e3435e5641f59f96928/WavLM/modules.py#L342) and [HF](https://github.com/huggingface/transformers/blob/b047472650cba259621549ac27b18fd2066ce18e/src/transformers/models/wavlm/modeling_wavlm.py#L441-L442) implementations only create this embedding module in the layers where it's used. However, we can't do this here because it breaks TorchScript. So as a solution I add a dummy `Identity` module to `WavLMSelfAttention` when the actual embedding is not needed: [here](https://github.com/pytorch/audio/pull/2822/files#diff-6f1486901c94320ec0610a460dc674638fab9d104a61564ff7b59353a8b8547cR361-R368).~~ Thanks nateanl for resolving this! - I had to add dummy `position_bias` and `key_padding_mask` arguments to `SelfAttention.forward` to make TorchScript tests pass. Since both `SelfAttention` and `WavLMSelfAttention` are called from `EncoderLayer`, they need to have compatible signatures. Having a variable number of arguments with `**kwargs` or checking object class doesn't seem to work with TorchScript, so I instead made both types of attention accept `position_bias` and `key_padding_mask` arguments. Nit: do we still need to specify `__all__` if there are no wildcard imports in `__init__.py`, e.g. in `torchaudio/models/__init__.py`? Pull Request resolved: https://github.com/pytorch/audio/pull/2822 Reviewed By: nateanl Differential Revision: D41121855 Pulled By: sgrigory fbshipit-source-id: 9f4f787e5810010de4e74cb704063a26c66767d7
-
- 08 Nov, 2022 2 commits
-
-
Caroline Chen authored
Summary: Add `fused_log_softmax` argument (default/current behavior = True) to rnnt loss. If setting it to `False`, call `log_softmax` on the logits prior to passing it in to the rnnt loss function. The following should produce the same output: ``` rnnt_loss(logits, targets, logit_lengths, target_lengths, fused_log_softmax=True) ``` ``` log_probs = torch.nn.functional.log_softmax(logits, dim=-1) rnnt_loss(log_probs, targets, logit_lengths, target_lengths, fused_log_softmax=False) ``` testing -- unit tests + get same results on the conformer rnnt recipe Pull Request resolved: https://github.com/pytorch/audio/pull/2798 Reviewed By: xiaohui-zhang Differential Revision: D41083523 Pulled By: carolineechen fbshipit-source-id: e15442ceed1f461bbf06b724aa0561ff8827ad61
-
hwangjeff authored
Summary: Adds `torch.nn.Module`-based implementations for convolution and FFT convolution. Pull Request resolved: https://github.com/pytorch/audio/pull/2811 Reviewed By: carolineechen Differential Revision: D40881937 Pulled By: hwangjeff fbshipit-source-id: bfe8969e6178ad4f58981efd4b2720ac006be8de
-
- 04 Nov, 2022 1 commit
-
-
moto authored
Summary: StreamWriter assumed that frame rate is always expressed as 1/something, which is a reasonable assumption. This commit fixes it by properly computing time_base from frame rate. Address https://github.com/pytorch/audio/issues/2830 Pull Request resolved: https://github.com/pytorch/audio/pull/2831 Reviewed By: carolineechen Differential Revision: D41036084 Pulled By: mthrok fbshipit-source-id: 805881d4cb221ab2c002563aefb986e30fb91609
-
- 31 Oct, 2022 1 commit
-
-
Joao Gomes authored
Summary: cc mthrok Implements precise seek and seek to any frame in torchaudio Pull Request resolved: https://github.com/pytorch/audio/pull/2737 Reviewed By: mthrok Differential Revision: D40546716 Pulled By: jdsgomes fbshipit-source-id: d37da7f55977337eb16a3c4df44ce8c3c102698e
-
- 28 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Introduces argument 'mode' for convolution functions, following SciPy's convention. Pull Request resolved: https://github.com/pytorch/audio/pull/2801 Reviewed By: nateanl Differential Revision: D40805405 Pulled By: hwangjeff fbshipit-source-id: 8f0006ffe9e3945b4b17f44c4cfa1adb265c20ef
-
- 26 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Initializer parameter `onesided` isn't relevant to `MelSpectrogram` — it should always be `True`. In fact, the module already assumes `onesided == True` in the filterbank it generates and fails in its forward pass when `onesided == False`. Accordingly, this PR makes param `onesided` optional and adds a deprecation warning that's fired when the param is provided. Pull Request resolved: https://github.com/pytorch/audio/pull/2797 Reviewed By: carolineechen, xiaohui-zhang Differential Revision: D40731238 Pulled By: hwangjeff fbshipit-source-id: 6eea8eb9d4a85a805162e03ad91682a1946f92cd
-
- 25 Oct, 2022 1 commit
-
-
moto authored
Summary: Addresses https://github.com/pytorch/audio/issues/2790. Previously AVPacket objects had duration==0. `av_interleaved_write_frame` function was inferring the duration of packets by comparing them against the next ones but It could not infer the duration of the last packet, as there is no subsequent frame, thus was omitting it from the final data. This commit fixes it by explicitly setting packet duration = 1 (one frame) only for video. (audio AVPacket contains multiple samples, so it's different. To ensure the correctness for audio, the tests were added.) Pull Request resolved: https://github.com/pytorch/audio/pull/2789 Reviewed By: xiaohui-zhang Differential Revision: D40627439 Pulled By: mthrok fbshipit-source-id: 4d0d827bff518c017b115445e03bdf0bf1e68320
-
- 19 Oct, 2022 2 commits
-
-
Caroline Chen authored
Summary: add ability to load only improvised or only scripted utterances. Pull Request resolved: https://github.com/pytorch/audio/pull/2778 Reviewed By: nateanl Differential Revision: D40511865 Pulled By: carolineechen fbshipit-source-id: e1fe3908ac2aa306ad30c242ddd25762b2268539
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2775 Reviewed By: carolineechen Differential Revision: D40481144 Pulled By: nateanl fbshipit-source-id: 5d0fb2478767704603a3ec28d74160e7892d4d0e
-
- 12 Oct, 2022 1 commit
-
-
Caroline Chen authored
Summary: a couple of circleci unittests are failing during hubert xlarge torchscript test, which has been known to fail on Windows in the past (#65776). this PR disables this test on circleci cc atalman Pull Request resolved: https://github.com/pytorch/audio/pull/2758 Reviewed By: mthrok Differential Revision: D40290535 Pulled By: carolineechen fbshipit-source-id: 5c5fb43434a517b6c439a8cb8e853015d1550a57
-
- 11 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2738 Reviewed By: carolineechen Differential Revision: D40238099 Pulled By: nateanl fbshipit-source-id: c5cc94c2a348a6ef34c04b8dd26114ecb874d73e
-
- 10 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Besides the unit test, the PR also addresses these issues: - The original `LibriMix` dataset only supports "min" mode, which means the audio length is the minimum of all clean sources. It is default for source separation task. Users may also want to use "max" mode which allows for end-to-end separation and recognition. The PR adds ``mode`` argument to let users decide which dataset they want to use. - If the task is ``"enh_both"``, the target is the audios in ``mix_clean`` instead of separate clean sources. The PR fixes it to use ``mix_clean`` as target. Pull Request resolved: https://github.com/pytorch/audio/pull/2659 Reviewed By: carolineechen Differential Revision: D40229227 Pulled By: nateanl fbshipit-source-id: fc07e0d88a245e1367656d3767cf98168a799235
-
- 09 Oct, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2732 Reviewed By: nateanl Differential Revision: D40186996 Pulled By: nateanl fbshipit-source-id: a0ad325b7153c9e580dad2c515730dadbe8840c4
-
- 07 Oct, 2022 1 commit
-
-
hwangjeff authored
Summary: Modifies `info_audio` to compute and return number of frames if not found in stream info. This resolves the `num_frames == 0` issue for mp3 that's cited in https://github.com/pytorch/audio/issues/2524. Pull Request resolved: https://github.com/pytorch/audio/pull/2740 Reviewed By: nateanl Differential Revision: D40168639 Pulled By: nateanl fbshipit-source-id: bb45baa0f9cd56844315b04e40ab9835d825fc24
-
- 21 Sep, 2022 1 commit
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2694 This commit adds Tensor type as input to `StreamReader`. The Tensor is interpreted as byte string buffer. Reviewed By: hwangjeff Differential Revision: D39467630 fbshipit-source-id: 6369eed5e16fbb657568bf6bb80d703483d72f8e
-
- 14 Sep, 2022 1 commit
-
-
Caroline Chen authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2673 Reviewed By: mthrok Differential Revision: D39507612 Pulled By: carolineechen fbshipit-source-id: 3a9ee53f72cabd6e3085c76867017be4a6ed7f53
-
- 13 Sep, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2669 Reviewed By: carolineechen, mthrok Differential Revision: D39433560 Pulled By: nateanl fbshipit-source-id: 5b652b31c00badb37b27a32ac25b422a5bcc74cb
-