"src/diffusers/models/controlnets/controlnet_union.py" did not exist on "30e5e81d58eb9c3979c07e6626bae89c1df8c0e1"
- 26 May, 2023 1 commit
-
-
Lakshmi Krishnan authored
Summary: This commit fixes the following issues affecting streaming decoding quality 1. The `init_b` hypothesis is only regenerated from blank token if no initial hypotheses are provided. 2. Allows the decoder to receive top-K hypothesis to continue decoding from, instead of using just the top hypothesis at each decoding step. This dramatically affects decoding quality especially for speech with long pauses and disfluencies. 3. Some minor errors regarding shape checking for length. This also means that the resulting output is the entire transcript up until that time step, instead of just the incremental change in transcript. Pull Request resolved: https://github.com/pytorch/audio/pull/3295 Reviewed By: nateanl Differential Revision: D46216113 Pulled By: hwangjeff fbshipit-source-id: 8f7efae28dcca4a052f434ca55a2795c9e5ec0b0
-
- 25 May, 2023 1 commit
-
-
Pingchuan Ma authored
Summary: This PR adds AV-ASR recipe which contains sample implementations of training and evaluation pipelines for RNNT based automatic, visual, and audio-visual (ASR, VSR, AV-ASR) models on LRS3. This repository includes both streaming/non-streaming modes. CC stavros99 xiaohui-zhang YumengTao mthrok nateanl hwangjeff Pull Request resolved: https://github.com/pytorch/audio/pull/3278 Reviewed By: nateanl Differential Revision: D46121550 Pulled By: mpc001 fbshipit-source-id: bb44b97ae25e87df2a73a707008be46af4ad0fc6
-
- 23 May, 2023 1 commit
-
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3356 move the forced aligner tutorial to torchaudio, with some formatting changes Reviewed By: mthrok Differential Revision: D46060238 fbshipit-source-id: d90e7db5669a58d1e9ef5c2ec3c6d175b4e394ec
-
- 21 May, 2023 2 commits
-
-
Moto Hira authored
Differential Revision: D45960556 Original commit changeset: 93f2271f7130 Original Phabricator Diff: D45960556 fbshipit-source-id: d22883fbcf9c5f2bb5d49274bcc194bdffaca72a
-
Xiaohui Zhang authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3351 move the forced aligner tutorial to torchaudio, with some formatting changes Reviewed By: vineelpratap, nateanl Differential Revision: D45960556 fbshipit-source-id: 93f2271f71307404e6a7732385cf7d646dc8ceaa
-
- 16 May, 2023 1 commit
-
-
moto authored
Summary: This commit upgrade the version of FFmpeg compiled against TorchAudio binary distribution to 5.0.4. FFmpeg 5.0 was released in Jan 2022, and many package managers provide a version of FFmpeg v5. Conda-forge lists 5.1 for all the platforms TorchAudio supports.https://anaconda.org/conda-forge/ffmpeg Pull Request resolved: https://github.com/pytorch/audio/pull/3298 Reviewed By: hwangjeff Differential Revision: D45865599 Pulled By: mthrok fbshipit-source-id: d95638eb80daaf477a710a992f4ead9b9009bb9b
-
- 10 May, 2023 2 commits
-
-
moto authored
Summary: https://output.circle-artifacts.com/output/job/fbfa6d9a-5014-42ac-8e77-c1e9565747e8/artifacts/0/docs/tutorials/effector_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/3226 Reviewed By: nateanl Differential Revision: D45402724 Pulled By: mthrok fbshipit-source-id: bc9d1bc071f6f5062b9cc35d743b4a3016306262
-
moto authored
Summary: This commit is preparation for landing dispatcher switch in https://github.com/pytorch/audio/issues/3241 Making FFmpeg backend default causes some issues on tutorials, so this commit disable it. The IO tutorial will be updated after https://github.com/pytorch/audio/issues/3241 is landed to accommodate the change. Since it is necessary to mention the changes related to migration in the IO tutorial, I also update the IO documentation to include migration work so that it's easy to redirect. Pull Request resolved: https://github.com/pytorch/audio/pull/3285 Reviewed By: nateanl Differential Revision: D45671237 Pulled By: mthrok fbshipit-source-id: cb541f6bd93cd9920019b8ec83210ea69d34f133
-
- 05 May, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: Add scatter plots for STOI, PESQ, Si-SDR, and MOS scores to demonstrate the performance of `SquimObjective` and `SquimSubjective` models and how close they are to the ground truths. Pull Request resolved: https://github.com/pytorch/audio/pull/3313 Reviewed By: hwangjeff Differential Revision: D45620311 Pulled By: nateanl fbshipit-source-id: cb58ffd3744df4749b9385876da8de0cffd93557
-
- 29 Apr, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR adds a tutorial that demonstrates how to use pre-trained `TorchAudio-SQUIM` pipelines to estimate objective and subjective metric scores (PESQ, STOI, Si-SDR, MOS). Pull Request resolved: https://github.com/pytorch/audio/pull/3279 Reviewed By: hwangjeff Differential Revision: D45415404 Pulled By: nateanl fbshipit-source-id: abcaeadcca0eabc2dca53b607eac6257a701c903
-
- 28 Apr, 2023 1 commit
-
-
Yuekai Zhang authored
Summary: This PR implements a CUDA based ctc prefix beam search decoder. Attach serveral benchmark results using V100 below: |decoder type| model |datasets | decoding time (secs)| beam size | batch size | model unit | subsampling times | vocab size | |--------------|---------|------|-----------------|------------|-------------|------------|-----------------------|------------| | cuctc | conformer nemo |dev clean |7.68s | 8 | 32 | bpe | 4 | 1000| | cuctc | conformer nemo |dev clean (sort by length) |1.6s | 8 | 32 | bpe | 4 | 1000| | cuctc | wav2vec2.0 torchaudio |dev clean |22s | 10 | 1 | char | 2 | 29| | cuctc | conformer espnet |aishell1 test | 5s | 10 | 24 | char | 4 | 4233| Note: 1. The design is to parallel computation through batch and vocab axis, for loop the frames axis. So it's more friendly with smaller sequence lengths, larger vocab size comparing with CPU implementations. 2. WER is the same as CPU implementations. However, it can't decode with LM now. Resolves: https://github.com/pytorch/audio/issues/2957. Pull Request resolved: https://github.com/pytorch/audio/pull/3096 Reviewed By: nateanl Differential Revision: D44709397 Pulled By: mthrok fbshipit-source-id: 3078c54a2b44dc00eb4a81b4c657487eeff8c155
-
- 18 Apr, 2023 1 commit
-
-
nateanl authored
Summary: The PR adds the training recipe of DNN beamforming for multi-channel speech enhancement. Pull Request resolved: https://github.com/pytorch/audio/pull/3036 Reviewed By: hwangjeff Differential Revision: D45061841 Pulled By: nateanl fbshipit-source-id: 48ede5dd579efe200669dbc83e9cb4dea809e4b4
-
- 31 Mar, 2023 1 commit
-
-
Nouran Ali authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/3222 Reviewed By: nateanl Differential Revision: D44539424 Pulled By: mthrok fbshipit-source-id: 8fbcb5f9918c9930c939bcd448493fa5cf604545
-
- 29 Mar, 2023 1 commit
-
-
moto authored
Summary: There is a part of StreamWriter tutorial that warns about corrupted AAC audio output, but this is no longer relevant thus this commit deletes it. Pull Request resolved: https://github.com/pytorch/audio/pull/3214 Reviewed By: nateanl Differential Revision: D44504030 Pulled By: mthrok fbshipit-source-id: 4d26d582e9fb87d4e6fa674c05fe3192bc223eef
-
- 28 Mar, 2023 1 commit
-
-
nateanl authored
Summary: Fix https://github.com/pytorch/audio/issues/3211 Pull Request resolved: https://github.com/pytorch/audio/pull/3212 Reviewed By: mthrok Differential Revision: D44472523 Pulled By: nateanl fbshipit-source-id: eb519b0045e7518ad13863a53271745a80d89a21
-
- 16 Mar, 2023 1 commit
-
-
jiyuntu-eero authored
Summary: Fix https://github.com/pytorch/audio/issues/3166. In `get_trellis` method, the index of blank symbol is regarded as 0 by default. It should be changed to `blank_id`. Pull Request resolved: https://github.com/pytorch/audio/pull/3172 Reviewed By: mthrok Differential Revision: D44090889 Pulled By: nateanl fbshipit-source-id: d119f4ded895d31aeefd59f8d975224870100264
-
- 07 Mar, 2023 1 commit
-
-
Maciej Torhan authored
Summary: In wav2letter example there is passed `momentum` to `Adam` and `AdamW` initializer, which is not a correct parameter. To fix that we need to add `beta_1` and `beta_2` to arguments and replace `momentum` with them. I also added `eps` similar to `Adadelta` initializer. Pull Request resolved: https://github.com/pytorch/audio/pull/3145 Reviewed By: mthrok Differential Revision: D43847713 Pulled By: nateanl fbshipit-source-id: 94f7c48232fabf520cfce81471694cb545d160c6
-
- 02 Mar, 2023 1 commit
-
-
moto authored
Summary: Fix build_doc job https://app.circleci.com/pipelines/github/pytorch/audio/15217/workflows/ce50b317-a59e-4741-b8d2-59129420deb8 - build.ffmpeg.html might not exist when IPython notebook is processed. Changing to main doc URL. - Fix bash cell syntax in HW tutorial - Fix C++ doc - Fix duplicated target name in streamwriter tutorial Pull Request resolved: https://github.com/pytorch/audio/pull/3125 Reviewed By: xiaohui-zhang Differential Revision: D43724078 Pulled By: mthrok fbshipit-source-id: ea7d46ec5e377cf2fbd7c3798df57da73750ac5c
-
- 24 Feb, 2023 2 commits
-
-
Vladislav Agafonov authored
Summary: Add `Wav2Vec2DataModule` in self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3081 Reviewed By: mthrok Differential Revision: D43579239 Pulled By: nateanl fbshipit-source-id: 3e935eb9a18ef0259a58940ae466cbdc3baf8494
-
Vladislav Agafonov authored
Summary: Add wav2vec2 loss function in the self_supervised_learning training recipe to support Wav2Vec2 pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/3090 Reviewed By: mthrok Differential Revision: D43579220 Pulled By: nateanl fbshipit-source-id: 4b52792b518ddc5b01c9660c90ceb3c4ad1f0237
-
- 23 Feb, 2023 1 commit
-
-
G. Sun authored
Summary: This commit adds the implementation of the tree-constrained pointer generator (TCPGen) for contextual biasing. An example for Librispeech can be found in audio/examples/asr/librispeech_biasing. Maintainer's note (mthrok): It seems that TrieNode should be better typed as tuple, but changing the implementation from list to tuple could cause some issue without running the code, so the code is not changed, though the annotation uses tuple. Pull Request resolved: https://github.com/pytorch/audio/pull/2890 Reviewed By: nateanl Differential Revision: D43171447 Pulled By: mthrok fbshipit-source-id: 372bb077d997d720401dbf2dbfa131e6a958e37e
-
- 16 Feb, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: The `BucketizeBatchSampler` may return different iter_list in different node if `shuffle` is `True`, which will cause DPP training hang forever. `shuffle` in `DistributedSampler` only happens in initialization, which means it will assign the same subset to replicas in all training epochs. The PR fixes the two above issues. cc arlofaria Pull Request resolved: https://github.com/pytorch/audio/pull/3068 Reviewed By: mthrok Differential Revision: D43372110 Pulled By: nateanl fbshipit-source-id: a162728406ae995e05d2a07cfc2444fb76cf345e
-
Zhaoheng Ni authored
Summary: In https://github.com/pytorch/audio/issues/2873, layer normalization is applied to waveforms for SSL models trained on large scale datasets. The word error rate is significantly reduced after the change. The PR updates the results for the affected models. Without the change in https://github.com/pytorch/audio/issues/2873, here is the WER result table: | Model | dev-clean | dev-other | test-clean | test-other | |:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:| | [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) | 10.59| 15.62| 9.58| 16.33| | [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) | 2.80| 6.01| 2.82| 6.34| | [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) | 2.36| 4.43| 2.41| 4.96| | [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) | 1.85| 3.46| 2.09| 3.89| | [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) | 2.21| 3.40| 2.26| 4.05| After applying layer normalization, here is the updated result: | Model | dev-clean | dev-other | test-clean | test-other | |:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:| | [WAV2VEC2_ASR_LARGE_LV60K_10M](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_10M) | 6.77| 10.03| 6.87| 10.51| | [WAV2VEC2_ASR_LARGE_LV60K_100H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_100H) | 2.19| 4.55| 2.32| 4.64| | [WAV2VEC2_ASR_LARGE_LV60K_960H](https://pytorch.org/audio/main/generated/torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H.html#torchaudio.pipelines.WAV2VEC2_ASR_LARGE_LV60K_960H) | 1.78| 3.51| 2.03| 3.68| | [HUBERT_ASR_LARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_LARGE.html#torchaudio.pipelines.HUBERT_ASR_LARGE) | 1.77| 3.32| 2.03| 3.68| | [HUBERT_ASR_XLARGE](https://pytorch.org/audio/main/generated/torchaudio.pipelines.HUBERT_ASR_XLARGE.html#torchaudio.pipelines.HUBERT_ASR_XLARGE) | 1.73| 2.72| 1.90| 3.16| Pull Request resolved: https://github.com/pytorch/audio/pull/3070 Reviewed By: mthrok Differential Revision: D43365313 Pulled By: nateanl fbshipit-source-id: 34a60ad2e5eb1299da64ef88ff0208ec8ec76e91
-
- 15 Feb, 2023 1 commit
-
-
hwangjeff authored
Summary: Updates tutorial "Audio Data Augmentation" to use two of the newly introduced data augmentation operators in beta: `torchaudio.functional.fftconvolve` and `torchaudio.functional.add_noise`. Pull Request resolved: https://github.com/pytorch/audio/pull/3062 Reviewed By: mthrok Differential Revision: D43298120 Pulled By: hwangjeff fbshipit-source-id: 09ca736a5c67242568515d600b7d31eab32c2df1
-
- 14 Feb, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: - Rename the current `ssl` example to `self_supervised_learning` - Add README to demonstrate how to run the recipe with hubert task Pull Request resolved: https://github.com/pytorch/audio/pull/3060 Reviewed By: mthrok Differential Revision: D43287868 Pulled By: nateanl fbshipit-source-id: 10352682485ef147ca32f4c4c9f9cde995444aa0
-
- 30 Jan, 2023 1 commit
-
-
Yan Li authored
Summary: Currently there will be a few errors when this tutorial is run with a CUDA device. The reasons being: - The source audio waveform is not properly moved to the GPU. The `to()` method is not in-place for Tensors, so we need to assign the return value of the method call to the variable (otherwise the Tensor would still be on the CPU). - When performing further analysis and displaying of the output audio, we need to move them back from the GPU to the CPU. This is because some of the functions we call require the Tensor to be on the CPU (e.g. `stft()` and `bss_eval_sources()`). Pull Request resolved: https://github.com/pytorch/audio/pull/3017 Reviewed By: mthrok Differential Revision: D42828526 Pulled By: nateanl fbshipit-source-id: c28bc855e79e3363a011f4a35a69aae1764e7762
-
- 19 Jan, 2023 2 commits
-
-
Zhaoheng Ni authored
Summary: TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model. Pull Request resolved: https://github.com/pytorch/audio/pull/2876 Reviewed By: hwangjeff Differential Revision: D42617414 Pulled By: nateanl fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
-
hwangjeff authored
Summary: In the Conformer RNN-T LibriSpeech recipe, there's no need to perform manual optimization. This PR modifies the recipe to use automatic optimization instead. Pull Request resolved: https://github.com/pytorch/audio/pull/2981 Reviewed By: mthrok Differential Revision: D42507228 Pulled By: hwangjeff fbshipit-source-id: 9712add951eba356e39f7e8c8dc3bf584ba48309
-
- 17 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: The mel spectrograms in the TTS tutorial are upside down. The PR fixes it by using `origin="lower"` in imshow. Pull Request resolved: https://github.com/pytorch/audio/pull/2989 Reviewed By: mthrok Differential Revision: D42538349 Pulled By: nateanl fbshipit-source-id: 4388103a49bdfabf1705c1f979d44ecedd5c910a
-
- 16 Jan, 2023 1 commit
-
-
Robin Scheibler authored
Summary: The `examples/source_separation` scripts use inconsistent keyword to indicate the WSJ0_2mix dataset. This PR does the following. 1. Use `wsj0mix` consistently as keyword indicating the WSJ0_2mix dataset 2. Corrects `args.data_dir` to `args.root_dir` in eval.py 3. Modify the parameters of `pytorch_lightning.Trainer` according to latest version (use `accelerator="gpu"` and `devices=args.num_devices`, instead of just `gpus=args.num_devices`) Pull Request resolved: https://github.com/pytorch/audio/pull/2987 Reviewed By: xiaohui-zhang Differential Revision: D42536992 Pulled By: nateanl fbshipit-source-id: 10a80263ad7054b1629d8fa023676b607e633d76
-
- 13 Jan, 2023 1 commit
-
-
moto authored
Summary: Per the suggestion by nateanl, adding the visualization of feature fed to ASR. <img width="688" alt="Screen Shot 2023-01-12 at 8 19 59 PM" src="https://user-images.githubusercontent.com/855818/212215190-23be7553-4c04-40d9-944e-3ee2ff69c49b.png"> Pull Request resolved: https://github.com/pytorch/audio/pull/2974 Reviewed By: nateanl Differential Revision: D42484088 Pulled By: mthrok fbshipit-source-id: 2c839492869416554eac04aa06cd12078db21bd7
-
- 30 Dec, 2022 1 commit
-
-
moto authored
Summary: Artifact: [subtractive_synthesis_tutorial](https://output.circle-artifacts.com/output/job/4c1ce33f-834d-48e0-ba89-2e91acdcb572/artifacts/0/docs/tutorials/subtractive_synthesis_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2934 Reviewed By: carolineechen Differential Revision: D42284945 Pulled By: mthrok fbshipit-source-id: d255b8e8e2a601a19bc879f9e1c38edbeebaf9b3
-
- 17 Dec, 2022 1 commit
-
-
moto authored
Summary: Adds filter design tutorial, which demonstrates `sinc_impulse_response` and `frequency_impulse_response`. Example: - [filter_design_tutorial](https://output.circle-artifacts.com/output/job/bd22c615-9215-4b17-a52c-b171a47f646c/artifacts/0/docs/tutorials/filter_design_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2894 Reviewed By: xiaohui-zhang Differential Revision: D42117658 Pulled By: mthrok fbshipit-source-id: f7dd04980e8557bb6f0e0ec26ac2c7f53314ea16
-
- 16 Dec, 2022 1 commit
-
-
Caroline Chen authored
Summary: resolves https://github.com/pytorch/audio/issues/2891 Rename `resampling_method` options to more accurately describe what is happening. Previously the methods were set to `sinc_interpolation` and `kaiser_window`, which can be confusing as both options actually use sinc interpolation methodology, but differ in the window function used. As a result, rename `sinc_interpolation` to `sinc_interp_hann` and `kaiser_window` to `sinc_interp_kaiser`. Using an old option will throw a warning, and those options will be deprecated in 2 released. The numerical behavior is unchanged. Pull Request resolved: https://github.com/pytorch/audio/pull/2922 Reviewed By: mthrok Differential Revision: D42083619 Pulled By: carolineechen fbshipit-source-id: 9a9a7ea2d2daeadc02d53dddfd26afe249459e70
-
- 29 Nov, 2022 1 commit
-
-
moto authored
Summary: This commit adds the tutorial for additive synthesis, using torchaudio's prototype DSP ops. [Review here](https://output.circle-artifacts.com/output/job/3dc83322-832a-4272-9c13-df752c97b660/artifacts/0/docs/tutorials/additive_synthesis_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2877 Reviewed By: carolineechen Differential Revision: D41585425 Pulled By: mthrok fbshipit-source-id: b81283b90e4779c8054fd030a1d8c3d39d676bbd
-
- 28 Nov, 2022 1 commit
-
-
moto authored
Summary: This commits add tutorial for oscillator_bank and adsr_envelope, which will be a basis for DDSP. - [Review here](https://output.circle-artifacts.com/output/job/cf1d3001-88e5-418b-8cf8-ae22b4445dba/artifacts/0/docs/tutorials/oscillator_tutorial.html) Pull Request resolved: https://github.com/pytorch/audio/pull/2862 Reviewed By: carolineechen Differential Revision: D41559503 Pulled By: mthrok fbshipit-source-id: 3f1689186db7d246de14f228fc2f91bf37db98cd
-
- 17 Nov, 2022 1 commit
-
-
vasiliy authored
Summary: This code was added by https://github.com/pytorch/audio/commit/4d0095a528412cfec2a549204fc01d9ebb15df7a Seems that the original code had a typo? Pull Request resolved: https://github.com/pytorch/audio/pull/2858 Test Plan: ``` // the import of `mustc` now succeeds, previously crashed python examples/asr/emformer_rnnt/global_stats.py --model-type librispeech --dataset-path /home/vasiliy/local/librispeech/ ``` Reviewed By: carolineechen Differential Revision: D41355663 Pulled By: nateanl fbshipit-source-id: 92507e529d41b984b9dd400ad24a55d130372b7d
-
- 16 Nov, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2847 In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training. Pull Request resolved: https://github.com/pytorch/audio/pull/2854 Reviewed By: carolineechen Differential Revision: D41343486 Pulled By: nateanl fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
- 17 Oct, 2022 1 commit
-
-
moto authored
Summary: * Refactor benchmark script * Rename `time` variable to avoid (potential) conflicting with time module * Fix `beta` parameter in benchmark (it was not used previously) * Use `timeit` module for benchmark * Add plot * Move the comment on result at the end * Add link to an explanation of aliasing https://output.circle-artifacts.com/output/job/20b57d2f-3614-4161-a18e-e0c1a537739c/artifacts/0/docs/tutorials/audio_resampling_tutorial.html Pull Request resolved: https://github.com/pytorch/audio/pull/2773 Reviewed By: carolineechen Differential Revision: D40421337 Pulled By: mthrok fbshipit-source-id: b402f84d4517695daeca75fb84ad876ef9354b3a
-