- 02 Sep, 2024 1 commit
-
-
mayp777 authored
-
- 16 Nov, 2022 2 commits
-
-
Zhaoheng Ni authored
Summary: address https://github.com/pytorch/audio/issues/2847 In mixed precision training, the dtype of `mask_embedding` is **not** converted to fp16 automatically. This PR addresses the issue by changing the dtype of `mask_embedding` to `x` to enable mixed precision training. Pull Request resolved: https://github.com/pytorch/audio/pull/2854 Reviewed By: carolineechen Differential Revision: D41343486 Pulled By: nateanl fbshipit-source-id: 4a5cbb429ff8ba5d3c439a3d5acb5094f66bf705
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
- 12 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: following pr https://github.com/pytorch/audio/issues/2716 - For preprocessing - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model. - For pre-training - Normalize the loss based on the total number of masked frames across all GPUs. - Use mixed precision training. fp16 is not well supported in pytorch_lightning. - Log accuracies of masked/unmasked frames during training. - Clip the gradients with norm `10.0`. - For ASR fine-tuning - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio. - Use mixed precision training. - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe. - Update the WER results on LibriSpeech dev and test sets. | | WER% (Viterbi)| WER% (KenLM) | |:-----------------:|--------------:|--------------:| | dev-clean | 10.9 | 4.2 | | dev-other | 17.5 | 9.4 | | test-clean | 10.9 | 4.4 | | test-other | 17.8 | 9.5 | Pull Request resolved: https://github.com/pytorch/audio/pull/2744 Reviewed By: carolineechen Differential Revision: D40282322 Pulled By: nateanl fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
-
- 28 Jul, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - The optimizer in fine-tuning recipe should also be `AdamW`. See https://github.com/pytorch/audio/pull/2412 - Fix the import of `DistributedBatchSampler` in hubert dataset - Fix `dataset_path` in fine-tuning module. Pull Request resolved: https://github.com/pytorch/audio/pull/2588 Reviewed By: carolineechen Differential Revision: D38243423 Pulled By: nateanl fbshipit-source-id: badc88ce9eddfd71270201a65ae89433fae2733f
-
- 07 Jun, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR contains the CTC fine-tuning recipe of HuBERT Base model. The files include: - lightning module - training script - README and the result table - evaluation scripts Pull Request resolved: https://github.com/pytorch/audio/pull/2352 Reviewed By: hwangjeff Differential Revision: D36915712 Pulled By: nateanl fbshipit-source-id: 0249635ad5e81a8aa2d228c1d5fe84d78b62a15b
-
- 26 May, 2022 1 commit
-
-
nateanl authored
-
- 23 May, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: Replace https://github.com/pytorch/audio/issues/2129 Pull Request resolved: https://github.com/pytorch/audio/pull/2198 Reviewed By: carolineechen Differential Revision: D36544163 Pulled By: nateanl fbshipit-source-id: 3f19ba5b0f2c2b9e93b0603c3b4491c1dbc40ef8
-