- 09 Dec, 2022 2 commits
-
-
Moto Hira authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2905 In StreamWriter, if the tensor format is different from the encoding format, then a FilterGraph object is automatically inserted to convert the format. The FilterGraph object operates on AVFrames. The input AVFrame must be allocated by us, but the output AVFrames is filled by FilterGraph, thus no need to allocate it. Now the output AVFrame is used as input to encoder regardless of whether FilterGraph was inserted. Thus the output AVFrame has to be manually allocated by us when FilterGraph is not used. The current code flips this condition and incorrectly allocates AVFrame when FilterGraph is present and does not allocate otherwise. This commit fix that. Reviewed By: xiaohui-zhang Differential Revision: D41866198 fbshipit-source-id: 40799c147dc8166a979ecfb58ed8e502539a6aed
-
atalman authored
Summary: Toggle on/off ffmpeg test if needed By default it ON, hence should not affect any current tests. To toggle ON no change required. To toggle OFF use: ``` smoke_test.py --no-ffmpeg ``` To be used when calling from builder currently. Since we do not install ffmpeg currently. Pull Request resolved: https://github.com/pytorch/audio/pull/2901 Reviewed By: carolineechen, mthrok Differential Revision: D41874976 Pulled By: atalman fbshipit-source-id: c57b19f37c63a1f476f93a5211550e980e67d9c7
-
- 08 Dec, 2022 4 commits
-
-
Grigory Sizov authored
Summary: Addressed mthrok's comments in https://github.com/pytorch/audio/pull/2833: - Moved model type from `_params` directly into the bundle definition. For now I defined model type as "WavLM" for WavLM bundles and "Wav2Vec2" for everything else. We can also distinguish between different Wav2Vec2 falvours - Hubert, VoxPopuli etc, but at the moment this won't imply any functional differences, so I didn't do it - Expanded the title underline to match the title length Pull Request resolved: https://github.com/pytorch/audio/pull/2895 Reviewed By: nateanl, mthrok Differential Revision: D41799875 Pulled By: sgrigory fbshipit-source-id: 0730d4f91ed60e900643bb74d6cccdd7aa5d7b39
-
Caroline Chen authored
Summary: cc mthrok Pull Request resolved: https://github.com/pytorch/audio/pull/2900 Reviewed By: mthrok Differential Revision: D41839924 Pulled By: carolineechen fbshipit-source-id: ba3ada7d04a86d99e08c9044de05a1c48b05d036
-
Grigory Sizov authored
Summary: Part 1 of [T138011314](https://www.internalfb.com/intern/tasks/?t=138011314) This PR ports the generator part of [HiFi GAN](https://arxiv.org/abs/2010.05646v2) from [the original implementation](https://github.com/jik876/hifi-gan/blob/4769534d45265d52a904b850da5a622601885777/models.py#L75) Adds tests: - Smoke tests for architectures V1, V2, V3 - Check that output shapes are correct - Check that the model is torchscriptable and scripting doesn't change the output - Check that our code's output matches the original implementation. Here I clone the original repo inside `/tmp` and import necessary objects from inside the test function. On test teardown I restore `PATH`, but don't remove the cloned code, so that it can be reused on subsequent runs - let me know if removing it would be a better practice There are no quantization tests, because the model consists mainly of `Conv1d` and `ConvTransposed1d`, and they are [not supported by dynamic quantization](https://pytorch.org/docs/stable/quantization.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2860 Reviewed By: nateanl Differential Revision: D41433416 Pulled By: sgrigory fbshipit-source-id: f135c560df20f5138f01e3efdd182621edabb4f5
-
hwangjeff authored
Summary: Adds feature badges to preemphasis and deemphasis functions Pull Request resolved: https://github.com/pytorch/audio/pull/2892 Reviewed By: carolineechen Differential Revision: D41830782 Pulled By: hwangjeff fbshipit-source-id: 487ce9afa8dc8fe321aa9e02cc88bb1453985d39
-
- 07 Dec, 2022 3 commits
-
-
hwangjeff authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2889 Reviewed By: xiaohui-zhang Differential Revision: D41760084 Pulled By: hwangjeff fbshipit-source-id: d2f5253e1fae7e7aafa9fa6043c6a7045c5b33a0
-
hwangjeff authored
Summary: Introduces the MUSAN dataset (https://www.openslr.org/17/), which contains music, speech, and noise recordings. Pull Request resolved: https://github.com/pytorch/audio/pull/2888 Reviewed By: xiaohui-zhang Differential Revision: D41762164 Pulled By: hwangjeff fbshipit-source-id: 14d5baaa4d40f065dd5d99bf7f2e0a73aa6c31a9
-
Jithun Nair authored
Summary: Dependent on PR https://github.com/pytorch/pytorch/pull/89101 Pull Request resolved: https://github.com/pytorch/audio/pull/2853 Reviewed By: atalman, osalpekar Differential Revision: D41737634 Pulled By: malfet fbshipit-source-id: 715a97a2da8ef309cea78d971b47c07463495683
-
- 06 Dec, 2022 1 commit
-
-
moto authored
Summary: This commit adds `frequency_impulse_response` function, which generates filter from desired frequency response. [Example](https://output.circle-artifacts.com/output/job/5233fda9-dadb-4710-9389-7e8ac20a062f/artifacts/0/docs/tutorials/filter_design_tutorial.html#frequency-sampling) Pull Request resolved: https://github.com/pytorch/audio/pull/2879 Reviewed By: hwangjeff Differential Revision: D41767787 Pulled By: mthrok fbshipit-source-id: 6d5e44c6390e8cf3028994a1b1de590ff3aaf6c2
-
- 04 Dec, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2885 In `_init_hubert_pretrain_model ` method which initialize the hubert pretrain models, `kaiming_normal_` should be applied on `ConvLayerBlock` instead of `LayerNorm` layer. This PR fixes it and adds more unit tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2886 Reviewed By: hwangjeff Differential Revision: D41713801 Pulled By: nateanl fbshipit-source-id: ed199baf7504d06bbf2d31c522ae708a75426a2d
-
- 02 Dec, 2022 1 commit
-
-
hwangjeff authored
Summary: Adds pre-emphasis and de-emphasis functions. Pull Request resolved: https://github.com/pytorch/audio/pull/2871 Reviewed By: carolineechen Differential Revision: D41651097 Pulled By: hwangjeff fbshipit-source-id: 7a3cf6ce68b6ce1b9ae315ddd8bd8ed71acccdf1
-
- 30 Nov, 2022 2 commits
-
-
hwangjeff authored
Summary: Adds functions and transforms for speed and speed perturbation (https://www.isca-speech.org/archive/interspeech_2015/ko15_interspeech.html). Pull Request resolved: https://github.com/pytorch/audio/pull/2829 Reviewed By: xiaohui-zhang Differential Revision: D41285114 Pulled By: hwangjeff fbshipit-source-id: 114740507698e01f35d4beb2c568a2479e847506
-
Andreas Floros authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2873 The original fairseq implementation had an extra layer normalization preprocessings for large/xlarge models. https://github.com/facebookresearch/fairseq/blob/fcca32258c8e8bcc9f9890bf4714fa2f96b6b3e1/fairseq/data/audio/hubert_dataset.py#L355-L357 This commit modifies the pre-trained model bundle to include this preprocessing to the impacted pre-trained models listed bellow. For the sake of keeping the interface identical to the other models, since the additional preprocessing is rather simple, the returned pre-trained model instance is modified ot include the preprocess, instead of adding a method for preprocessing. - WAV2VEC2_LARGE_LV60K - WAV2VEC2_ASR_LARGE_LV60K_10M - WAV2VEC2_ASR_LARGE_LV60K_100H - WAV2VEC2_ASR_LARGE_LV60K_960H - WAV2VEC2_XLSR53 - HUBERT_LARGE - HUBERT_XLARGE - HUBERT_ASR_LARGE - HUBERT_ASR_XLARGE - WAVLM_LARGE Reviewed By: nateanl Differential Revision: D41520183 fbshipit-source-id: 83d72fe692e8b9fc25df144deb4ca946fcd09615
-
- 29 Nov, 2022 5 commits
-
-
moto authored
Summary: This commit adds `sinc_impulse_response`, which generates windowed-sinc low-pass filters for given cutoff frequencies. Example usage: - [Filter Design Tutorial](https://output.circle-artifacts.com/output/job/c0085baa-5345-4aeb-bd44-448034caa9e1/artifacts/0/docs/tutorials/filter_design_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2875 Reviewed By: carolineechen Differential Revision: D41586631 Pulled By: mthrok fbshipit-source-id: a9991dbe5b137b0b4679228ec37072a1da7e50bb
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2878 Reviewed By: carolineechen Differential Revision: D41587081 Pulled By: mthrok fbshipit-source-id: da7f3647083a3566ce94070ce2bd30bf99e1db76
-
moto authored
Summary: This commit adds the tutorial for additive synthesis, using torchaudio's prototype DSP ops. [Review here](https://output.circle-artifacts.com/output/job/3dc83322-832a-4272-9c13-df752c97b660/artifacts/0/docs/tutorials/additive_synthesis_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2877 Reviewed By: carolineechen Differential Revision: D41585425 Pulled By: mthrok fbshipit-source-id: b81283b90e4779c8054fd030a1d8c3d39d676bbd
-
moto authored
Summary: Currently, fftconvolve only accepts the tensors for the exact same leading dimensions. This commit loosens the restriction to allow shapes that are broadcast-able. This makes the fftconvolve operation more efficient for cases like signal filtering where one operand (waveform) is larger than the other (filter kernel) and the same filter kernels are applied across channels and batches. Pull Request resolved: https://github.com/pytorch/audio/pull/2874 Reviewed By: carolineechen Differential Revision: D41581588 Pulled By: mthrok fbshipit-source-id: c0117e11b979fb53236cc307a970a461b0e50134
-
Caroline Chen authored
Summary: modeled after [paper](https://arxiv.org/pdf/2110.07313.pdf) and internal flow f288347302 internal comparison tests: D40080919 Pull Request resolved: https://github.com/pytorch/audio/pull/2827 Reviewed By: nateanl Differential Revision: D41569046 Pulled By: carolineechen fbshipit-source-id: 43c5313074af05972d93da55b2029c746b75c380
-
- 28 Nov, 2022 3 commits
-
-
Zhaoheng Ni authored
Summary: - layer_norm in `EmformerEncoder` is set as default in emformer_hubert_model, change the type to be non-optional. - add `aux_num_out` to emformer_hubert_model to support fine-tuning model. - update unit tests. Pull Request resolved: https://github.com/pytorch/audio/pull/2868 Reviewed By: carolineechen Differential Revision: D41451311 Pulled By: nateanl fbshipit-source-id: 5fa0f19255e4f01e001d62f8689e36f134030083
-
moto authored
Summary: This commits add tutorial for oscillator_bank and adsr_envelope, which will be a basis for DDSP. - [Review here](https://output.circle-artifacts.com/output/job/cf1d3001-88e5-418b-8cf8-ae22b4445dba/artifacts/0/docs/tutorials/oscillator_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2862 Reviewed By: carolineechen Differential Revision: D41559503 Pulled By: mthrok fbshipit-source-id: 3f1689186db7d246de14f228fc2f91bf37db98cd
-
moto authored
Summary: Add `extend_pitch` function that can be used for augmenting fundamental frequencies with its harmonic overtones or inharmonic partials. it can be use for amplitude as well. For example usages, see https://output.circle-artifacts.com/output/job/4ad0c29a-d75a-4244-baad-f5499f11d94b/artifacts/0/docs/tutorials/synthesis_tutorial.html Part of https://github.com/pytorch/audio/issues/2835 Extracted from https://github.com/pytorch/audio/issues/2808 Pull Request resolved: https://github.com/pytorch/audio/pull/2863 Reviewed By: carolineechen Differential Revision: D41543880 Pulled By: mthrok fbshipit-source-id: 4f20e55770b0b3bee825ec07c73f9ec7cb181109
-
- 19 Nov, 2022 1 commit
-
-
moto authored
Summary: Missing from https://github.com/pytorch/audio/issues/2848 Pull Request resolved: https://github.com/pytorch/audio/pull/2864 Reviewed By: carolineechen Differential Revision: D41413381 Pulled By: mthrok fbshipit-source-id: 4377ed4a59504c6ade9ee6f42938a2bc3f04fb73
-
- 18 Nov, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2836 Reviewed By: carolineechen Differential Revision: D41208630 Pulled By: nateanl fbshipit-source-id: 625e1651f0b8a6e20876409739cf7084cb7c748b
-
moto authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2865 Reviewed By: carolineechen Differential Revision: D41403756 Pulled By: mthrok fbshipit-source-id: d193caa90e786f08f28e4cc2df4b4fb77aa8f592
-
- 17 Nov, 2022 4 commits
-
-
hwangjeff authored
Summary: Adds API usage logging to MelSpectrogram and Spectrogram. Pull Request resolved: https://github.com/pytorch/audio/pull/2861 Reviewed By: carolineechen Differential Revision: D41384080 Pulled By: hwangjeff fbshipit-source-id: caf4b0fa6e4cc3954384bfdd08a183b90d07d974
-
moto authored
Summary: Add adsr_envelope op, which generates ADSR envelope * Supports generation of the envelope on GPU * Supports optional Hold * Supports polynomial decay <image src='https://download.pytorch.org/torchaudio/doc-assets/adsr_examples.png'> Pull Request resolved: https://github.com/pytorch/audio/pull/2859 Reviewed By: nateanl Differential Revision: D41379601 Pulled By: mthrok fbshipit-source-id: 3717a6e0360d2a24913c2a836c57c5edec1d7b31
-
vasiliy authored
Summary: This code was added by https://github.com/pytorch/audio/commit/4d0095a528412cfec2a549204fc01d9ebb15df7a Seems that the original code had a typo? Pull Request resolved: https://github.com/pytorch/audio/pull/2858 Test Plan: ``` // the import of `mustc` now succeeds, previously crashed python examples/asr/emformer_rnnt/global_stats.py --model-type librispeech --dataset-path /home/vasiliy/local/librispeech/ ``` Reviewed By: carolineechen Differential Revision: D41355663 Pulled By: nateanl fbshipit-source-id: 92507e529d41b984b9dd400ad24a55d130372b7d
-
moto authored
Summary: This commit adds `oscillator_bank` op, which is the core of (differential) digital signal processing ops. The implementation itself is pretty simple, sum instantaneous frequencies, take sin and multiply with amplitudes. Following the magenta implementation, amplitudes for frequency range outside of [-Nyquist, Nyquist] \ are suppressed. The differentiability is tested within frequency range of [- Nyquist, Nyquist], and amplitude range of [-5, 5], which should be enough. For example usages: - https://output.circle-artifacts.com/output/job/129f3e21-41ce-406b-bc6b-833efb3c3141/artifacts/0/docs/tutorials/oscillator_tutorial.html - https://output.circle-artifacts.com/output/job/129f3e21-41ce-406b-bc6b-833efb3c3141/artifacts/0/docs/tutorials/synthesis_tutorial.html Part of https://github.com/pytorch/audio/issues/2835 Extracted from https://github.com/pytorch/audio/issues/2808 Pull Request resolved: https://github.com/pytorch/audio/pull/2848 Reviewed By: carolineechen Differential Revision: D41353075 Pulled By: mthrok fbshipit-source-id: 80e60772fb555760f2396f7df40458803c280225
-
- 16 Nov, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2847 In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training. Pull Request resolved: https://github.com/pytorch/audio/pull/2854 Reviewed By: carolineechen Differential Revision: D41343486 Pulled By: nateanl fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
- 15 Nov, 2022 3 commits
-
-
Grigory Sizov authored
Summary: Closes T136364380, follow-up to https://github.com/pytorch/audio/issues/2822 - Added "base", "base+", and "large" bundles for WavLM - Expanded `wav2vec2_pipeline_test.py` to include the new bundles - Added the new bundles to docs in `pipelines.rst` Pull Request resolved: https://github.com/pytorch/audio/pull/2833 Reviewed By: nateanl Differential Revision: D41194796 Pulled By: sgrigory fbshipit-source-id: bf8e96c05b6a81ac5c5a014c46adeeac12685328
-
Grigory Sizov authored
Summary: Closes T137506059 Replaces functional multi-head attention in `WavLMSelfAttention` with a module `torch.nn.MultiheadAttention`. The reason is that the latter uses native CPU/CUDA implementation ([BetterTransfomer](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/)) under certain conditions, and can achieve significant speedup. It also simplifies the code in `WavLMSelfAttention` Note: the definition of `bias` parameter in `WavLMSelfAttention.forward` has changed slightly, because in `torch.nn.MultiheadAttention` there is no parameter controlling presence of bias for projections of `k`, `v`, and `q` independently. In WavLM we only use `bias=True`, so it won't have any effect for users of WavLM or tests Pull Request resolved: https://github.com/pytorch/audio/pull/2842 Reviewed By: nateanl Differential Revision: D41186166 Pulled By: sgrigory fbshipit-source-id: e791c68106ad89f96c1abf046de699cb8ec7b595
-
moto authored
Summary: * Add the new official torchaudio logo to documentation/README. * Add a page for download logo. https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/index.html <img width="1068" alt="Screen Shot 2022-11-14 at 10 30 27 AM" src="https://user-images.githubusercontent.com/855818/201738349-9e248f15-dce2-4931-9066-aa898a53d6ad.png"> https://output.circle-artifacts.com/output/job/e9eb1292-7c10-4fef-adc3-ad568802aa59/artifacts/0/docs/logo.html <img width="617" alt="Screen Shot 2022-11-14 at 10 30 47 AM" src="https://user-images.githubusercontent.com/855818/201738420-ad0fda2f-f310-4802-851c-bbdf6c84c045.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2802 Reviewed By: carolineechen Differential Revision: D41295277 Pulled By: mthrok fbshipit-source-id: 6615d00799c9611f875e8485459d800e350b3486
-
- 14 Nov, 2022 2 commits
-
-
moto authored
Summary: Removing LTS mention and packages from README as it is discontinued. Pull Request resolved: https://github.com/pytorch/audio/pull/2844 Reviewed By: hwangjeff, xiaohui-zhang Differential Revision: D41200886 Pulled By: mthrok fbshipit-source-id: 0da0afe68df51826075ce945cf0cf1de901e1c8f
-
Caroline Chen authored
Summary: follow up to https://github.com/pytorch/audio/issues/2823 - move bark spectrogram to prototype - decrease autograd test tolerance (passing on circle ci) - add diagram for bark fbanks cc jdariasl Pull Request resolved: https://github.com/pytorch/audio/pull/2843 Reviewed By: nateanl Differential Revision: D41199522 Pulled By: carolineechen fbshipit-source-id: 8e6c2e20fb7b14f39477683b3c6ed8356359a213
-
- 13 Nov, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2845 Pull Request resolved: https://github.com/pytorch/audio/pull/2846 Reviewed By: carolineechen Differential Revision: D41251624 Pulled By: nateanl fbshipit-source-id: 1a363d2314d6a452f35c109b9730da64ada5a2fd
-
- 11 Nov, 2022 1 commit
-
-
DanilBaibak authored
Summary: Added missed build workflows for MacOS and Linux: - [x] Linux conda - [x] MacOS conda This does not change the existing builds/uploads in CircleCI, and should not break any existing jobs/workflows. This is just to add back workflows for the MacOS and Linux conda builds with Nova. We will create a workflow (most likely in test-infra) that does this comparison between the binaries to ensure there is parity between the binaries before we start uploading with Nova. Pull Request resolved: https://github.com/pytorch/audio/pull/2800 Reviewed By: osalpekar Differential Revision: D41181467 Pulled By: DanilBaibak fbshipit-source-id: a5c5d4dcfdd778b4045203f6016c20fb42daa01b
-
- 10 Nov, 2022 2 commits
-
-
moto authored
Summary: Currently `discard_before_pts=-1` is used to indicate no AVFrame should be skipped. It was reported that some corrupted video can have constant negative pts value. It is technically UB for such corrupted data, but still all the AVFrame should be decoded as long as `seek` is not used. This commit changes the decoder so that it processes AVFrame if `discard_before_pts==-1` disregard of AVFrame::pts value. Pull Request resolved: https://github.com/pytorch/audio/pull/2841 Reviewed By: hwangjeff Differential Revision: D41174442 Pulled By: mthrok fbshipit-source-id: e9d2fab4b0e2bc47146eda8e1dd377a74c087590
-
Omkar Salpekar authored
Summary: Adding Nova Reusable Workflow for M1 Wheels Build. Once this has been running well for a while, we can replace the old `build-m1-binaries.yml` workflow. Pull Request resolved: https://github.com/pytorch/audio/pull/2839 Reviewed By: DanilBaibak Differential Revision: D41195316 Pulled By: osalpekar fbshipit-source-id: f3754043f384b1645e5fcfaebf465f6839f72461
-