• Zhaoheng Ni's avatar
    Fix DDP training in HuBERT recipes (#3068) · 2c9b3e59
    Zhaoheng Ni authored
    Summary:
    The `BucketizeBatchSampler` may return different iter_list in different node if `shuffle` is `True`, which will cause DPP training hang forever.
    `shuffle` in `DistributedSampler` only happens in initialization, which means it will assign the same subset to replicas in all training epochs. The PR fixes the two above issues.
    
    cc arlofaria
    
    Pull Request resolved: https://github.com/pytorch/audio/pull/3068
    
    Reviewed By: mthrok
    
    Differential Revision: D43372110
    
    Pulled By: nateanl
    
    fbshipit-source-id: a162728406ae995e05d2a07cfc2444fb76cf345e
    2c9b3e59
train_hubert.py 9.09 KB