- 19 Jan, 2023 1 commit
-
-
Zhaoheng Ni authored
Summary: TorchAudio currently has one training recipe for HuBET + LibriSpeech pre-training. It may not suit well when users want to use customized dataset, or use a new training objective (such as contrastive loss in Wav2Vec2). The PR addresses the issue by providing a modularized training recipe for audio self-supervised learning. Users can inject customized model module, loss function, optimizer, lr scheduler, and datamodule for training a SSL model. Pull Request resolved: https://github.com/pytorch/audio/pull/2876 Reviewed By: hwangjeff Differential Revision: D42617414 Pulled By: nateanl fbshipit-source-id: 6413df45a9d106ed1d5ff830bf628c54368c5792
-
- 16 Nov, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - `_get_fileids_paths` in `LibriLightLimited` dataset was changed dataset in https://github.com/pytorch/audio/issues/2653, the absolute path becomes relative paths. This PR fixes the usage in hubert fine-tuning recipe to get correct audio paths. - model options should be `hubert_pretrain_large` and `hubert_pretrain_xlarge` instead of `hubert_large` and `hubert_xlarge`. - The input dimension of CTC linear layer varies depending on the model architecture, update it in lightning module. cc simpleoier Pull Request resolved: https://github.com/pytorch/audio/pull/2851 Reviewed By: carolineechen Differential Revision: D41327998 Pulled By: nateanl fbshipit-source-id: f92248ee84ec860b4e4dbef880c5794b338e1e2d
-
- 12 Oct, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: following pr https://github.com/pytorch/audio/issues/2716 - For preprocessing - The HuBERT feature takes lots of memory which may not fit some machines. Enable to use a subset of feature for training a k-means model. - For pre-training - Normalize the loss based on the total number of masked frames across all GPUs. - Use mixed precision training. fp16 is not well supported in pytorch_lightning. - Log accuracies of masked/unmasked frames during training. - Clip the gradients with norm `10.0`. - For ASR fine-tuning - Normalize the loss based on the total number of batches across all GPUs, same as in the conformer recipe of TorchAudio. - Use mixed precision training. - Add "|" after the end of transcription to capture the silence/word termination, same as in fairseq recipe. - Update the WER results on LibriSpeech dev and test sets. | | WER% (Viterbi)| WER% (KenLM) | |:-----------------:|--------------:|--------------:| | dev-clean | 10.9 | 4.2 | | dev-other | 17.5 | 9.4 | | test-clean | 10.9 | 4.4 | | test-other | 17.8 | 9.5 | Pull Request resolved: https://github.com/pytorch/audio/pull/2744 Reviewed By: carolineechen Differential Revision: D40282322 Pulled By: nateanl fbshipit-source-id: 4723584c912e70e8970149fe09de005385eaab90
-
- 07 Jun, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: The PR contains the CTC fine-tuning recipe of HuBERT Base model. The files include: - lightning module - training script - README and the result table - evaluation scripts Pull Request resolved: https://github.com/pytorch/audio/pull/2352 Reviewed By: hwangjeff Differential Revision: D36915712 Pulled By: nateanl fbshipit-source-id: 0249635ad5e81a8aa2d228c1d5fe84d78b62a15b
-
- 12 May, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform). This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so. - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training. Pull Request resolved: https://github.com/pytorch/audio/pull/2296 Reviewed By: mthrok Differential Revision: D36323217 Pulled By: nateanl fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
-
- 22 Apr, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: When using customized `batch_sampler`, pytorch_lightning can't wrap the distributed sampler onto it. Hence we provide a `DistributedBatchSampler` that supports `BucketizeBatchSampler` in `ddp` mode. The `DistributedBatchSampler` assumes `BucketizeBatchSampler.iter_list` is a list of lists, where each sub-list contains a batch of indices. Setting `shuffle` to `True` will shuffle the lists based on `seed` and current `epoch`. The `shuffle` only happens in the initialization, and won't be changed if user don't reset it. The reason is shuffling `BucketizeBatchSampler` may have a different length than before, do shuffling in ``__iter__`` may result in mismatch between ``__len__`` and the real length value. Hence users need to set `reload_dataloaders_every_n_epochs=1` in pytorch_lightning's Trainer. Then the value of ``__len__`` and the real length is the same. Pull Request resolved: https://github.com/pytorch/audio/pull/2299 Reviewed By: hwangjeff Differential Revision: D35781538 Pulled By: nateanl fbshipit-source-id: 6e8396615497f1aeddab1ee5678830c0445c2b2a
-
- 22 Jan, 2022 1 commit
-
-
Zhaoheng Ni authored
Summary: - Rename `BucketizeSampler` to `BucketizeBatchSampler` - Fix bugs in `BucketizeBatchSampler` - Adjust HuBERTDataset based on the latest `BucketizeBatchSampler`. Pull Request resolved: https://github.com/pytorch/audio/pull/2150 Reviewed By: mthrok Differential Revision: D33689963 Pulled By: nateanl fbshipit-source-id: 203764e9af5b7577ba08ebaa30ba5da3b67fb7e7
-
- 06 Jan, 2022 1 commit
-
-
Elijah Rippeth authored
Summary: This PR: - Replaces the `data_source` with `lengths` - Adds a `shuffle` argument to decide whether to shuffle the samples in the buckets. - Add `max_len` and `min_len` to filter out samples that are > max_len or < min_len. cc nateanl Pull Request resolved: https://github.com/pytorch/audio/pull/2147 Reviewed By: carolineechen Differential Revision: D33454369 Pulled By: nateanl fbshipit-source-id: 3835169ec7f808f8dd9650e7f183f79091efe886
-
- 23 Dec, 2021 1 commit
-
-
Joao Gomes authored
Summary: Pull Request resolved: https://github.com/pytorch/audio/pull/2096 run: `arc lint --apply-patches --paths-cmd 'hg files -I "./**/*.py"'` Reviewed By: mthrok Differential Revision: D33297351 fbshipit-source-id: 7bf5956edf0717c5ca90219f72414ff4eeaf5aa8
-
- 10 Dec, 2021 1 commit
-
-
nateanl authored
Summary: The PR adds PyTorch Lightning based training script for HuBERT Base model. There are two iterations of pre-training and 1 iteration of ASR fine-tuning on LibriSpeech dataset. Pull Request resolved: https://github.com/pytorch/audio/pull/2000 Reviewed By: carolineechen Differential Revision: D33021467 Pulled By: nateanl fbshipit-source-id: 77fe5a751943b56b63d5f1fb4e6ef35946e081db
-