• Matthijs Hollemans's avatar
    add word-level timestamps to Whisper (#23205) · cd927a47
    Matthijs Hollemans authored
    * let's go!
    
    * initial implementation of token-level timestamps
    
    * only return a single timestamp per token
    
    * remove token probabilities
    
    * fix return type
    
    * fix doc comment
    
    * strip special tokens
    
    * rename
    
    * revert to not stripping special tokens
    
    * only support models that have alignment_heads
    
    * add integration test
    
    * consistently name it token-level timestamps
    
    * small DTW tweak
    
    * initial support for ASR pipeline
    
    * fix pipeline doc comments
    
    * resolve token timestamps in pipeline with chunking
    
    * change warning when no final timestamp is found
    
    * return word-level timestamps
    
    * fixup
    
    * fix bug that skipped final word in each chunk
    
    * fix failing unit tests
    
    * merge punctuations into the words
    
    * also return word tokens
    
    * also return token indices
    
    * add (failing) unit test for combine_tokens_into_words
    
    * make combine_tokens_into_words private
    
    * restore OpenAI's punctuation rules
    
    * add pipeline tests
    
    * make requested changes
    
    * PR review changes
    
    * fix failing pipeline test
    
    * small stuff from PR
    
    * only return words and their timestamps, not segments
    
    * move alignment_heads into generation config
    
    * forgot to set alignment_heads in pipeline tests
    
    * tiny comment fix
    
    * grr
    cd927a47
test_pipelines_automatic_speech_recognition.py 56.5 KB