• amyeroberts's avatar
    Add TF whisper (#19378) · e3f028f3
    amyeroberts authored
    
    
    * simplify loop
    
    * add featur extractor
    
    * add model
    
    * start conversion
    
    * add dropout
    
    * initial commit of test files
    
    * copnversion for all models
    
    * update processor for correct padding
    
    * update feature extraction
    
    * update integration test logits match
    
    * fmnt: off for the logits
    
    * on the fly mel bank
    
    * small nit
    
    * update test
    
    * update tokenizer
    
    * nit feature extraction
    
    * update
    
    * update tokenizer test
    
    * adds logit processor and update tokenizer to get supress tokens
    
    * style
    
    * clean convert
    
    * revert to original modeling tf utils
    
    * Update
    
    * update
    
    * nit
    
    * clean convert file
    
    * update tests and nits
    
    * quality
    
    * slow generation test
    
    * ffn_dim to allow customization
    
    * update readme
    
    * add to toctreee
    
    * start fixing integration tests
    
    * update tests and code
    
    * fix feature extractor
    
    * fix config tests common
    
    * update code to fix tests
    
    * fix feature exctractor
    
    * nit feature extraction
    
    * update test for new feature extractor
    
    * style
    
    * add absrtact
    
    * large logits wioth custom decoder input ids
    
    * wraap around is otrch available
    
    * fix feature extractor
    
    * correct logits for whisper small.en
    
    * nit
    
    * fix encoder_attentino_mask
    
    * some fixes
    
    * remove unnecessary inputs
    
    * nits
    
    * add normalizer file
    
    * update etst tokenization
    
    * fix attention mask not defined
    
    * fix generate
    
    * remove uncoder attention mask useless
    
    * update test modeling whisper
    
    * update condfig to add second non supress tokens
    
    * nits on feature exrtactor
    
    * nit for test tokenizers
    
    * update etsts
    
    * update tests
    
    * update tokenization test
    
    * fixup
    
    * invalidated hf token. Clean convert openai to whisper
    
    * fix logit tests
    
    * fixup
    
    * Add model to README
    
    * Fix doc tests
    
    * clean merge
    
    * revert toc_tree changes
    
    * remove useless LogitProcessor
    
    * Update whisper .mdx
    
    * update config file doc
    
    * update configuration docstring
    
    * update test tokenization
    
    * update test tokenization
    
    * update tokenization whisper
    Added copied from where needed
    
    * update feature extraction
    
    * nit test name
    
    * style
    
    * quality
    
    * remove get suppress tokens and update non_speech tokens global variables
    
    * Update src/transformers/models/whisper/feature_extraction_whisper.py
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * clean modeling whisper and test
    Removed the attention mask arguments that are deprecated
    
    * fix large test
    
    * Add multilingual audio test, and translate test
    
    * style
    
    * fix larg multilingual test
    
    * nits
    
    * add copied from for attention layer
    
    * remove attention masks in doc
    
    * add english normalizer
    
    * Update docs/source/en/model_doc/whisper.mdx
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * update tokenization test
    
    * remove copied from in whisper attention : no bias in k_proj only
    
    * wrap around dependencies in english normalizer
    
    * style
    
    * correct import generation logits
    
    * for now, wrap feature extractor with torch
    
    * remove torch depencies for feature extraction and style
    
    * Update src/transformers/models/whisper/convert_openai_whisper_to_tfms.py
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * Update src/transformers/models/whisper/configuration_whisper.py
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * Update docs/source/en/model_doc/whisper.mdx
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * fixup
    
    * nit
    
    * update logitds
    
    * style
    
    * nit
    
    * nits and fix final tests
    
    * add `is_more_itertools_available` to utils
    
    * quality
    
    * add begin supress tokens, supress tokens to generate args and config
    
    * clean supressTokensLogitProcessor in generation logits
    
    * Nit naming
    
    * add supressTokensAtBegin
    
    * udpate tests, supress tokens to None or correct values
    
    * nit and style
    
    * update RAG to fit test and generate_logit
    
    * add copy pasted statment on english normalizer
    
    * add arguments to config_common_kwargs
    
    * Update src/transformers/generation_utils.py
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * Update src/transformers/generation_logits_process.py
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * revert changes based on reviews
    
    * update doc and nits
    
    * Update src/transformers/models/whisper/configuration_whisper.py
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * more nits
    
    * last nits
    
    * update test configuration common
    
    * add BART name in decoder attention mask documentation
    
    * Update src/transformers/models/whisper/modeling_whisper.py
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    
    * style
    
    * nit
    
    * nit
    
    * add english.json file to git
    
    * nits on documentation
    
    * nit
    
    * nits
    
    * last styling
    
    * add main toctree file
    
    * remove sentence piece dependency
    
    * clean init file
    
    * fix tokenizer that has no dependencies on sentencepiece
    
    * update whisper init file, nit
    
    * remove english.json file
    
    * add get decoder prompt id
    
    * All weights loading
    
    * Remove hanging pdb
    
    * Fixup and tidy up
    
    * Use same copied from as PT model
    
    * Remove whitespace changes
    
    * Remove torch references
    
    * Tie embeddings
    
    * Remove logits processor input to generate
    
    * Update logit values
    
    * revert changes and add forced logit processor
    
    * nit
    
    * clean normalizer
    
    * remove protected
    
    * Add logit processors and update generation code & tests
    
    * Some tidy up
    
    * Update docstring
    
    * update
    
    * update based on review
    
    * Update src/transformers/models/whisper/configuration_whisper.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update src/transformers/models/whisper/configuration_whisper.py
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * Update to reflect changes on the PT model branch
    
    * Tidy up
    
    * Remove extra whitespace
    
    * Fix test - make input ids small enough we can append
    
    * Include upstream changes on main
    
    * PR comments - add batch tests, remove comments & defaults
    
    * Fix model output imports
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/generation_tf_logits_process.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update tests/models/whisper/test_modeling_tf_whisper.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update docstring example
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
    
    * Remove changes to adjust_logits_during_generation function
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * Tidy up imports that don't require TF
    
    * Update tests - skip and no more skip
    
    * Update tests/generation/test_generation_tf_logits_process.py
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    
    * Update src/transformers/models/whisper/modeling_tf_whisper.py
    Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
    
    * Add training flags
    
    * Add (skipped) XLA generation tests
    
    * Add embedding correctness test
    
    * Add constant ids for generation tests
    
    * Make logits finding a bit tidier
    
    * Remove unused args
    
    * xla generation enabled
    
    * Don't skip XLA tests anymore
    
    * Fix tests - add position ids to expected signature and update rag generation
    
    * Undo method reorder
    
    * Remove added whitespace
    
    * Remove copy-paste gradient checkopint ref
    
    * Remove
    
    * Trigger CI - (issue with refs when pulling)
    Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarNielsRogge <niels.rogge1@gmail.com>
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    Co-authored-by: default avatarNielsRogge <48327001+NielsRogge@users.noreply.github.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
    Co-authored-by: default avatarMatt <Rocketknight1@users.noreply.github.com>
    Co-authored-by: default avatarJoao Gante <joao@huggingface.co>
    e3f028f3
README_ko.md 65.6 KB