• Matthijs Hollemans's avatar
    TTS fine-tuning for SpeechT5 (#21824) · ac2bc50a
    Matthijs Hollemans authored
    
    
    * wrong argument name
    
    * append eos_token_id
    
    * all tokenizers need mask and ctc_blank tokens
    
    * remove reduction factor from feature extractor
    
    * add proper TTS loss
    
    * did shifting the wrong way around
    
    * mask out padded portions
    
    * remove logits again (don't really need it)
    
    * fix unit tests
    
    * fixup
    
    * pad also returns the decoder attention mask, since that's useful to have
    
    * clean up feature extractor logic
    
    * pad can handle TTS task too
    
    * remove stop_labels from loss calculation
    
    * simplify logic
    
    * fixup
    
    * do -100 masking properly
    
    * small STFT optimization (calculate mel filterbanks only once)
    
    * replace torchaudio fbanks with audio_utils
    
    * remove torchaudio dependency
    
    * simplify & speed up the STFT
    
    * don't serialize window and mel filters
    
    * output cross attentions when generating speech
    
    * add guided attention loss
    
    * fix failing test
    
    * Update src/transformers/models/speecht5/feature_extraction_speecht5.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * Update src/transformers/models/speecht5/modeling_speecht5.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * change type annotation of attention_mask to LongTensor
    
    * extract loss into class
    
    * remove unused frame_signal_scale argument
    
    * use config object in loss class
    
    * fix type annotations in doc comments
    
    * change optional to just bool
    
    * implement missing tokenizer method
    
    * add deprecation warning
    
    * Update src/transformers/models/speecht5/feature_extraction_speecht5.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/models/speecht5/feature_extraction_speecht5.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * add deprecation warning for stop_labels
    
    ---------
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    ac2bc50a
test_modeling_speecht5.py 63.1 KB