• Arthur's avatar
    [CLAP] Add CLAP to the library (#21370) · c236a621
    Arthur authored
    
    
    * add model like clip
    
    * update
    
    * text model ok
    
    * clap text works
    
    * some refactor
    
    - `CLAPVision` to `CLAPAudio`
    - refactor kwargs of audio modules
    
    * more refactor
    
    * more refactor
    
    * more refactor
    
    * correct fusion
    
    * more refactor
    
    * new modules
    
    * add basic processor
    
    * fixup
    
    * remove whisper copioed from
    
    * audio logits match
    
    * add doc
    
    * correct filters mel and add maxlength
    
    * style
    
    * few fixes
    
    * forward passes
    
    * fixup
    
    * fixup
    
    * some clean up
    
    * remove mels form the dictionnary
    
    * pad after the repeat
    
    * update padding when dsmaller
    
    * fix padding
    
    * style
    
    * use swin patch merging
    
    * use copied from swin
    
    * processor with any tokenizer
    
    * more copied from
    
    * some clean up
    
    * more refactor
    
    * fix mel when rand_trunc
    
    * style
    
    * remove unused imports
    
    * update processing
    
    * remove image processing tests
    
    * add testing fiel
    
    * fixmodeling issues
    
    * replace with `is_longer`
    
    * clap in serialization
    
    * more refactor
    
    * `make fixup`
    
    * make fixup
    
    * fix feature extractor
    
    * update test feature extractor
    
    * `make fixup`
    
    * clean up config
    
    * more clean up
    
    * more cleanup
    
    * update tests
    
    * refactor tests and inits
    
    * removeCLAP vision config
    
    * remove CLAP from image procssing auto and dummy vision objects
    
    * update inits
    
    * style
    
    * re order classes in modeling clap
    
    * Use roberta tokenizer as the other weights are not open sourced
    
    * small cleaup
    
    * remove tokenization CLAP
    
    * processor tokenizr is roberta
    
    * update feature extraction doc
    
    * remove vclap from model zero shot
    
    * update f_min and f_max to frequency_xx
    
    * some changes
    
    - fix modeling keys
    - add `is_longer` in the forward pass
    - make fixup
    
    * make fixup
    
    * consistent behavior ebtween rand_crop and fusion
    
    * add numpy resize and bilinear and documentation
    
    * move resizing to image utils
    
    * clean feature extraction
    
    * import resize from correct file
    
    * resize in image transforms
    
    * update
    
    * style
    
    * style
    
    * nit
    
    * remove unused arguments form the feature extractor
    
    * style
    
    * few fixes + make fixup
    
    * oops
    
    * fix more tests
    
    * add zero shot audio classification pipeline
    
    * update zeroshot classification pipeline
    
    * fixup
    
    * fix copies
    
    * all CI tests pass
    
    * make fixup + fix docs
    
    * fix docs
    
    * fix docs
    
    * update tests pip;eline
    
    * update zero shot pipeline
    
    * update feature extraction clap
    
    * update tokenization auto
    
    * use nested simplify
    
    * update pipeline tests
    
    * Apply suggestions from code review
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * split in two lines
    
    * fixes
    
    * refactor
    
    * clean up
    
    * add integration tests
    
    * update config docstring
    
    * style
    
    * update processor
    
    * fix processor test
    
    * fix feat extractor tests
    
    * update docs
    
    * Apply suggestions from code review
    Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
    
    * fix readmes
    
    * fix tips
    
    * Update src/transformers/models/auto/configuration_auto.py
    
    * update doc and remove todo -> properly explained
    
    * fix idx and typo
    
    * typoe
    
    * cleanup config
    
    * cleanup tests, styles and doc
    
    * ignore docstyle on image transform
    
    * add conversion script
    
    * remove the `clap` indx in favor of `CLAP`
    
    * update __init
    
    * nits
    
    * Update src/transformers/pipelines/__init__.py
    
    * fix bug
    
    * clarifiy config
    
    * fix copy
    
    * fix init
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * fix model output
    
    * fix comment
    
    * make fixup
    
    * make fixup
    
    * rename to `Clap`
    
    * replace to `Clap`
    
    * replace to `Clap`
    
    * repo consistency
    
    * again repo-consistency
    
    * make fixup
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * add config
    
    * changes
    
    * update conversion
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    
    * remove unused function
    
    * update based on code reviews
    
    * style
    
    * more comments
    
    * cleanup
    
    * clean up
    
    * style
    
    * apply suggestions
    
    * Empty commit
    
    * pipeline will be added in a different PR
    
    * update calls to audio utils functions
    
    * update pipeline init
    
    * style
    
    * style
    
    * styling again
    
    * use pad
    
    * fix repo-consistency
    
    * update utils and add doc for audio utils
    
    * clean up resize by using torch. update inits accordingly
    
    * style
    
    * CLap's  tokenizer is RobertA
    
    * add audio utils to internal toctreee
    
    * update totctree
    
    * style
    
    * update documentation and normalize naming accross audio utils and feature extraction clap
    
    * style
    
    * clean up
    
    * update doc and typos
    
    * fix doctest
    
    * update modelin code, got rid of a lot of reshaping
    
    * style on added doc audio utils
    
    * update modeling clap
    
    * style
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    
    * docstringvariables with CLAP
    
    * rename key
    
    * update modeling CLAP
    
    * update audio utils docstring
    
    * update processing clap
    
    * fix readmes
    
    * fix toctree
    
    * udpate configuration clap
    
    * fix init
    
    * make fixup
    
    * fix
    
    * fix
    
    * update naming
    
    * update
    
    * update checkpoint path
    
    * Apply suggestions from code review
    
    * Major refactoring
    
    * Update src/transformers/models/clap/configuration_clap.py
    
    * merge
    
    ---------
    Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
    Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    c236a621
README_zh-hans.md 76.4 KB