examples/self_supervised_learning/train_hubert.py · 2c9b3e59bd509f86c99db173a1fa8f81c8fb9a89 · OpenDAS / Torchaudio

Fix DDP training in HuBERT recipes (#3068) · 2c9b3e59

Zhaoheng Ni authored Feb 16, 2023

Summary:
The `BucketizeBatchSampler` may return different iter_list in different node if `shuffle` is `True`, which will cause DPP training hang forever.
`shuffle` in `DistributedSampler` only happens in initialization, which means it will assign the same subset to replicas in all training epochs. The PR fixes the two above issues.

cc arlofaria

Pull Request resolved: https://github.com/pytorch/audio/pull/3068

Reviewed By: mthrok

Differential Revision: D43372110

Pulled By: nateanl

fbshipit-source-id: a162728406ae995e05d2a07cfc2444fb76cf345e

2c9b3e59

train_hubert.py 9.09 KB

Replace train_hubert.py