Add feature_grad_mult argument to HuBERTPretrainModel (#2335)
Summary: In Wav2Vec2 and HuBERT model training, the convolutional feature extraction layers use `group_norm` for normalization in `Base` model, while they use `layer_norm` in `Large` and `XLarge` models. For `Base` model, the gradients of feature extraction layers will be unstable in pre-training, thus we need to scale down the gradient by multiplying 0.1. In this PR, we add such argument to `HuBERTPretrainModel` to control the gradient of feature extractor layers. We also put the argument in the factory functions (`hubert_pretrain_base`, `hubert_pretrain_large`, and `hubert_pretrain_xlarge`. The reason is in finetuning, the feature extractor's parameters are fixed, we can multiply the gradient with 0.0 to avoid back propagating gradients. Pull Request resolved: https://github.com/pytorch/audio/pull/2335 Reviewed By: xiaohui-zhang, mthrok Differential Revision: D35646928 Pulled By: nateanl fbshipit-source-id: 6a9563e227aac6e3127b634357946d860f26c994
Showing
Please register or sign in to comment