• Arthur's avatar
    Whisper Timestamp processor and prediction (#20620) · bb300ac6
    Arthur authored
    
    
    * add draft logit processor
    
    * add template functions
    
    * update timesapmt processor parameters
    
    * draft script
    
    * simplify code
    
    * cleanup
    
    * fixup and clean
    
    * update pipeline
    
    * style
    
    * clean up previous idea
    
    * add tokenization utils
    
    * update tokenizer and asr output
    
    * fit whisper type
    
    * style and update test
    
    * clean test
    
    * style test
    
    * update tests
    
    * update error test
    
    * udpate code (not based on review yet)
    
    * update tokenization
    
    * update asr pipeline
    
    * update code
    
    * cleanup and update test
    
    * fmt
    
    * remove text verificatino
    
    * cleanup
    
    * cleanup
    
    * add model test
    
    * update tests
    
    * update code add docstring
    
    * update code and add docstring
    
    * fix pipeline tests
    
    * add draft logit processor
    
    add template functions
    
    update timesapmt processor parameters
    
    draft script
    
    simplify code
    
    cleanup
    
    fixup and clean
    
    update pipeline
    
    style
    
    clean up previous idea
    
    add tokenization utils
    
    update tokenizer and asr output
    
    fit whisper type
    
    style and update test
    
    clean test
    
    style test
    
    update tests
    
    update error test
    
    udpate code (not based on review yet)
    
    update tokenization
    
    update asr pipeline
    
    update code
    
    cleanup and update test
    
    fmt
    
    remove text verificatino
    
    cleanup
    
    cleanup
    
    add model test
    
    update tests
    
    update code add docstring
    
    update code and add docstring
    
    fix pipeline tests
    
    * Small update.
    
    * Fixup.
    
    * Tmp.
    
    * More support.
    
    * Making `forced_decoder_ids` non mandatory for users to set.
    
    * update and fix first bug
    
    * properly process sequence right after merge if last
    
    * tofo
    
    * allow list inputs + compute begin index better
    
    * start adding tests
    
    * add the 3 edge cases
    
    * style
    
    * format sequences
    
    * fixup
    
    * update
    
    * update
    
    * style
    
    * test passes, edge cases should be good
    
    * update last value
    
    * remove Trie
    
    * update tests and expec ted values
    
    * handle bigger chunk_length
    
    * clean tests a bit
    
    * refactor chunk iter and clean pipeline
    
    * update tests
    
    * style
    
    * refactor chunk iter and clean pipeline
    
    * upade
    
    * resolve comments
    
    * Apply suggestions from code review
    Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
    
    * take stride right into account
    
    * update test expected values
    
    * Update code based on review
    Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
    Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
    Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
    bb300ac6
test_tokenization_whisper.py 12.9 KB