• Kamil Akesbi's avatar
    Support generating with fallback for short form audio in Whisper (#30984) · 89575b56
    Kamil Akesbi authored
    
    
    * remove is_shortform
    
    * adapt _retrieve_max_frames_and_seek for short_form
    
    * return bos token in short and long form
    
    * add decoder_input_ids to short form audios
    
    * add eos token for  short form
    
    * handle short form token_timestamps
    
    * no need to return scores
    
    * add is_shortform conditions
    
    * handle when max_new_tokens is None - short form
    
    * handle assistant decoding
    
    * fix
    
    * handle return_dict_in_generate
    
    * handle split_by_batch for encoder_attentions attribute
    
    * handle num_beams>1
    
    * handle num_return_sequences>1 in generate_with_fallback
    
    * handle num_return_sequences>1 with return_dict_in_generate=True
    
    * raise error if max_new_tokens + decoder_inputs_ids > max_target_pos
    
    * fix
    
    * apply review suggestions
    
    * fix
    
    * Update src/transformers/models/whisper/generation_whisper.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * Update src/transformers/models/whisper/generation_whisper.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * Update src/transformers/models/whisper/generation_whisper.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * fix
    
    * logits for both short form and long form
    
    * handle if logits_processor is None
    
    * test
    
    * apply review changes to num_return_sequences
    
    * add _expand_variables_for_generation
    
    * remove short form commented section
    
    * update comments
    
    * uncomment num_beams line in generate_with_fallback
    
    * update assistant decoding
    
    * handle return_segment with short form generation
    
    * up
    
    * fix output format is_shortform
    
    * overwrite beam_sample test
    
    * update _set_return_timestamps
    
    * apply review suggestions
    
    * apply review suggestions
    
    * remove seek_outputs_short_form
    
    * fix _stack_split_outputs
    
    * fix stack dim in _stack_split_outputs
    
    * update tests
    
    * fix past_key_values + beam tests
    
    * fix
    
    * clean _expand_variables_for_generation
    
    * make style
    
    * fix slow tests
    
    * make style
    
    * max_length condition
    
    * make style
    
    * add slow tests for shortform fallback
    
    * Update src/transformers/models/whisper/generation_whisper.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * Update src/transformers/models/whisper/generation_whisper.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * apply review changes
    
    * Update src/transformers/models/whisper/generation_whisper.py
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * up
    
    * fix slow tests
    
    * apply review suggestions
    
    * update test
    
    * make style
    
    * small fix
    
    * fix
    
    * fix test_new_cache_format
    
    * fix past_key_values
    
    * fix
    
    * make style
    
    * fix slow tests
    
    * fix
    
    ---------
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    89575b56
test_modeling_whisper.py 254 KB