• Zhaoheng Ni's avatar
    Fix CollateFn in HuBERT pre-training recipe (#2296) · 09639680
    Zhaoheng Ni authored
    Summary:
    - When cropping the waveform and corresponding label, we use the formula `torch.div(audio_start - kernel_size * sample_rate, stride * sample_rate, rounding_mode="floor")` to align the audio start and label start indices. However, sometimes the value can be negative, which result in an empty label. The training example will hurt the performance after zero-padding (i.e., the labels are all zero for the input waveform).
    This PR fixes the bug by checking if `label_start` is negative, and change it to zero if so.
    - If `pad` is True, the `length` should be the length of each waveform instead of the max length. Fix it to make the model ignore the padding component in pre-training.
    
    Pull Request resolved: https://github.com/pytorch/audio/pull/2296
    
    Reviewed By: mthrok
    
    Differential Revision: D36323217
    
    Pulled By: nateanl
    
    fbshipit-source-id: 1ffa71e39bbc0e8dee55c3b829911bc2e785b423
    09639680
hubert_dataset.py 17.3 KB