".github/vscode:/vscode.git/clone" did not exist on "5c9394b54cb3179da98e5ac8f1a3ab1ea18d5e1d"
  1. 21 Aug, 2023 2 commits
    • Susnato Dhar's avatar
      Add Pop2Piano (#21785) · 450a181d
      Susnato Dhar authored
      
      
      * init commit
      
      * config updated also some modeling
      
      * Processor and Model config combined
      
      * extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested
      
      * model loading successful!
      
      * feature extractor done!
      
      * FE can now be called from HF
      
      * postprocessing added in fe file
      
      * same as prev commit
      
      * Pop2PianoConfig doc done
      
      * cfg docs slightly changed
      
      * fe docs done
      
      * batched
      
      * batched working!
      
      * temp
      
      * v1
      
      * checking
      
      * trying to go with generate
      
      * with generate and model tests passed
      
      * before rebasing
      
      * .
      
      * tests done docs done remaining others & nits
      
      * nits
      
      * LogMelSpectogram shifted to FeatureExtractor
      
      * is_tf rmeoved from pop2piano/init
      
      * import solved
      
      * tokenization tests added
      
      * minor fixed regarding modeling_pop2piano
      
      * tokenizer changed to only return midi_object and other changes
      
      * Updated paper abstract(Camera-ready version) (#2)
      
      * more comments and nits
      
      * ruff changes
      
      * code quality fix
      
      * sg comments
      
      * t5 change added and rebased
      
      * comments except batching
      
      * batching done
      
      * comments
      
      * small doc fix
      
      * example removed from modeling
      
      * ckpt
      
      * forward it compatible with fe and generation done
      
      * comments
      
      * comments
      
      * code-quality fix(maybe)
      
      * ckpts changed
      
      * doc file changed from mdx to md
      
      * test fixes
      
      * tokenizer test fix
      
      * changes
      
      * nits done main changes remaining
      
      * code modified
      
      * Pop2PianoProcessor added with tests
      
      * other comments
      
      * added Pop2PianoProcessor to dummy_objects
      
      * added require_onnx to modeling file
      
      * changes
      
      * update .md file
      
      * remove extra line in index.md
      
      * back to the main index
      
      * added pop2piano to index
      
      * Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too
      
      * changes
      
      * added return types to 2 tokenizer methods
      
      * the PR build test might work now
      
      * added backends
      
      * PR build fix
      
      * vocab added
      
      * comments
      
      * refactored vocab into 1 file
      
      * added conversion script
      
      * comments
      
      * essentia version changed in .md
      
      * comments
      
      * more tokenizer tests added
      
      * minor fix
      
      * tests extended for outputs acc check
      
      * small fix
      
      ---------
      Co-authored-by: default avatarJongho Choi <sweetcocoa@snu.ac.kr>
      450a181d
    • Yoach Lacombe's avatar
      correct TTS pipeline docstrings snippet (#25587) · 2c1bcbf5
      Yoach Lacombe authored
      * correct TTS pipeline docstrings snippet
      
      * add text_to_audio.py pipelines to documentation tests
      2c1bcbf5
  2. 17 Jul, 2023 1 commit
    • Yoach Lacombe's avatar
      Add bark (#24086) · f42a35e6
      Yoach Lacombe authored
      
      
      * first raw version of the bark integration
      
      * working code on small models with single run
      
      * add converting script from suno weights 2 hf
      
      * many changes
      
      * correct past_kv output
      
      * working implementation for inference
      
      * update the converting script according to the architecture changes
      
      * add a working end-to-end inference code
      
      * remove some comments and make small changes
      
      * remove unecessary comment
      
      * add docstrings and ensure no unecessary intermediary output during audio generation
      
      * remove done TODOs
      
      * make style + add config docstrings
      
      * modification for batch inference support on the whole model
      
      * add details to .generation_audio method
      
      * add copyright
      
      * convert EncodecModel from original library to transformers implementation
      
      * add two class in order to facilitate model and sub-models loading from the hub
      
      * add support of loading the whole model
      
      * add BarkProcessor
      
      * correct modeling according to processor output
      
      * Add proper __init__ and auto support
      
      * Add up-to-date copyright/license message
      
      * add relative import instead of absolute
      
      * cleaner head_dim computation
      
      * small comment removal or changes
      
      * more verbose LayerNorm init method
      
      * specify eps for clearer comprehension
      
      * more verbose variable naming in the MLP module
      
      * remove unecessary BarkBlock parameter
      
      * clearer code in the forward pass of the BarkBlock
      
      * remove _initialize_modules method for cleaner code
      
      * Remove unnecessary methods from sub-models
      
      * move code to remove unnecessary function
      
      * rename a variable for clarity and change an assert
      
      * move code and change variable name for clarity
      
      * remove unnecessary asserts
      
      * correct small bug
      
      * correct a comment
      
      * change variable names for clarity
      
      * remove asserts
      
      * change import from absolute to relative
      
      * correct small error due to comma missing + correct import
      
      * Add attribute Bark config
      
      * add first version of tests
      
      * update attention_map
      
      * add tie_weights and resize_token_embeddings for fineModel
      
      * correct getting attention_mask in generate_text_semantic
      
      * remove Bark inference trick
      
      * leave more choices in barkProcessor
      
      * remove _no_split_modules
      
      * fixe error in forward of block and introduce clearer notations
      
      * correct converting script with last changes
      
      * make style + add draft bark.mdx
      
      * correct BarkModelTest::test_generate_text_semantic
      
      * add Bark in main README
      
      * add dummy_pt_objects for Bark
      
      * add missing models in the main init
      
      * correct test_decoder_model_past_with_large_inputs
      
      * disable torchscript test
      
      * change docstring of BarkProcessor
      
      * Add test_processor_bark
      
      * make style
      
      * correct copyrights
      
      * add bark.mdx + make style, quality and consistency
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Remove unnecessary test method
      
      * simply logic of a test
      
      * Only check first ids for slow audio generation
      
      * split full end-to-end generation tests
      
      * remove unneccessary comment
      
      * change submodel names for clearer naming
      
      * remove ModuleDict from modeling_bark
      
      * combine two if statements
      
      * ensure that an edge misued won't happen
      
      * modify variable name
      
      * move code snippet to the right place (coarse instead of semantic)
      
      * change BarkSemanticModule -> BarkSemanticModel
      
      * align BarkProcessor with transformers paradigm
      
      * correct BarkProcessor tests with last commit changes
      
      * change _validate_voice_preset to an instance method instead of a class method
      
      * tie_weights already called with post_init
      
      * add codec_model config to configuration
      
      * update bark modeling tests with recent BarkProcessor changes
      
      * remove SubModelPretrainedModel + change speakers embeddings prompt type in BarkModel
      
      * change absolute imports to relative
      
      * remove TODO
      
      * change docstrings
      
      * add examples to docs and docstrings
      
      * make style
      
      * uses BatchFeature in BarkProcessor insteads of dict
      
      * continue improving docstrings and docs + make style
      
      * correct docstrings examples
      
      * more comprehensible speaker_embeddings load/Save
      
      * rename speaker_embeddings_dict -> speaker_embeddings
      
      * correct bark.mdx + add bark to documentation_tests
      
      * correct docstrings configuration_bark
      
      * integrate last nit suggestions
      
      * integrate BarkGeneration configs
      
      * make style
      
      * remove bark tests from documentation_tests.txt because timeout - tested manually
      
      * add proper generation config initialization
      
      * small bark.mdx documentation changes
      
      * rename bark.mdx -> bark.md
      
      * add torch.no_grad behind BarkModel.generate_audio()
      
      * replace assert by ValueError in convert_suno_to_hf.py
      
      * integrate a series of short comments from reviewer
      
      * move SemanticLogitsProcessors and remove .detach() from Bark docs and docstrings
      
      * actually remove SemanticLogitsProcessor from modeling_bark.oy
      
      * BarkProcessor returns a single output instead of tuple + correct docstrings
      
      * make style + correct bug
      
      * add initializer_range to BarkConfig + correct slow modeling tests
      
      * add .clone() to history_prompt.coarse_prompt to avoid modifying input array
      
      * Making sure no extra "`" are present
      
      * remove extra characters in modeling_bark.py
      
      * Correct output if history_prompt is None
      
      * remove TODOs
      
      * remove ravel comment
      
      * completing generation_configuration_bark.py docstrings
      
      * change docstrings - number of audio codebooks instead of Encodec codebooks
      
      * change 'bias' docstrings in configuration_bark.py
      
      * format code
      
      * rename BarkModel.generate_audio -> BarkModel.generate_speech
      
      * modify AutoConfig instead of EncodecConfig in BarkConfig
      
      * correct AutoConfig wrong init
      
      * refactor BarkModel and sub-models generate_coarse, generate_fine, generate_text_semantic
      
      * remove SemanticLogitsProcessor and replace it with SuppressTokensLogitsProcessor
      
      * move nb_codebook related config arguments to BarkFineConfig
      
      * rename bark.mdx -> bark.md
      
      * correcting BarkModelConfig from_pretrained + remove keys_to_ignore
      
      * correct bark.md with correct hub path
      
      * correct code bug in bark.md
      
      * correct list tokens_to_suppress
      
      * modify Processor to load nested speaker embeddings in a safer way
      
      * correct batch sampling in BarkFineModel.generate_fine
      
      * Apply suggestions from code review
      
      Small docstrings correction and code improvements
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * give more details about num_layers in docstrings
      
      * correct indentation mistake
      
      * correct submodelconfig order of docstring variables
      
      * put audio models in alphabetical order in utils/check_repo.my
      
      * remove useless line from test_modeling_bark.py
      
      * makes BarkCoarseModelTest inherits from (ModelTesterMixin, GenerationTesterMixin, unittest.TestCase) instead of BarkSemanticModelTest
      
      * make a Tester class for each sub-model instead of inheriting
      
      * add test_resize_embeddings=True for Bark sub-models
      
      * add Copied from transformers.models.gpt_neo.modeling_gpt_neo.GPTNeoSelfAttention._split_heads
      
      * remove 'Copied fom Bark' comment
      
      * remove unneccessary comment
      
      * change np.min -> min in modeling_bark.py
      
      * refactored all custom layers to have Bark prefix
      
      * add attention_mask as an argument of generate_text_semantic
      
      * refactor sub-models start docstrings to have more precise config class definition
      
      * move _tied_weights_keys overriding
      
      * add docstrings to generate_xxx in modeling_bark.py
      
      * add loading whole BarkModel to convert_suno_to_hf
      
      * refactor attribute and variable names
      
      * make style convert_suno
      
      * update bark checkpoints
      
      * remove never entered if statement
      
      * move bark_modeling docstrings after BarkPretrainedModel class definition
      
      * refactor modeling_bark.py: kv -> key_values
      
      * small nits - code refactoring and removing unecessary lines from _init_weights
      
      * nits - replace inplace method by variable assigning
      
      * remove *optional* when necessary
      
      * remove some lines in generate_speech
      
      * add default value for optional parameter
      
      * Refactor preprocess_histories_before_coarse -> preprocess_histories
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * correct usage after refactoring
      
      * refactor Bark's generate_xxx -> generate and modify docstrings and tests accordingly
      
      * update docstrings python in configuration_bark.py
      
      * add bark files in utils/documentation_test.txt
      
      * correct docstrings python snippet
      
      * add the ability to use parameters in the form of e.g coarse_temperature
      
      * add semantic_max_new_tokens in python snippet in docstrings for quicker generation
      
      * Reformate sub-models kwargs in BakModel.generate
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * correct kwargs in BarkModel.generate
      
      * correct attention_mask kwarg in BarkModel.generate
      
      * add tests for sub-models args in BarkModel.generate and correct BarkFineModel.test_generate_fp16
      
      * enrich BarkModel.generate docstrings with a description of how to use the kwargs
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      f42a35e6
  3. 13 Jul, 2023 1 commit
  4. 12 Jul, 2023 1 commit
  5. 04 Jul, 2023 1 commit
  6. 29 Jun, 2023 1 commit
    • Sanchit Gandhi's avatar
      Add Musicgen (#24109) · 1c1c9075
      Sanchit Gandhi authored
      
      
      * Add Audiocraft
      
      * add cross attention
      
      * style
      
      * add for lm
      
      * convert and verify
      
      * introduce t5
      
      * split configs
      
      * load t5 + lm
      
      * clean conversion
      
      * copy from t5
      
      * style
      
      * start pattern provider
      
      * make generation work
      
      * style
      
      * fix pos embs
      
      * propagate shape changes
      
      * propagate shape changes
      
      * style
      
      * delay pattern: pad tokens at end
      
      * audiocraft -> musicgen
      
      * fix inits
      
      * add mdx
      
      * style
      
      * fix pad token in processor
      
      * override generate and add todos
      
      * add init to test
      
      * undo pattern delay mask after gen
      
      * remove cfg logits processor
      
      * remove cfg logits processor
      
      * remove logits processor in favour of mask
      
      * clean pos embs
      
      * make fix copies
      
      * update readmes
      
      * clean pos emb
      
      * refactor encoder/decoder
      
      * make fix copies
      
      * update conversion
      
      * fix config imports
      
      * update config docs
      
      * make style
      
      * send pattern mask to device
      
      * pattern mask with delay
      
      * recover prompted audio tokens
      
      * fix docstrings
      
      * laydown test file
      
      * pattern edge case
      
      * remove t5 ref
      
      * add processing class
      
      * config refactor
      
      * better pattern comment
      
      * check if mask is not present
      
      * check if mask is not present
      
      * refactor to auto class
      
      * remove encoder configs
      
      * fix processor
      
      * processor import
      
      * start updating conversion
      
      * start updating tests
      
      * make style
      
      * convert t5, encodec, lm
      
      * convert as composite
      
      * also convert processor
      
      * run generate
      
      * classifier free gen
      
      * comments and clean up
      
      * make style
      
      * docs for logit proc
      
      * docstring for uncond gen
      
      * start lm tests
      
      * work tests
      
      * let the lm generate
      
      * refactor: reshape inside forward
      
      * undo greedy loop changes
      
      * from_enc_dec -> from_sub_model
      
      * fix input id shapes in docstrings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * undo generate changes
      
      * from sub model config
      
      * Update src/transformers/models/musicgen/modeling_musicgen.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make generate work again
      
      * generate uncond -> get uncond inputs
      
      * remove prefix allowed tokens fn
      
      * better error message
      
      * logit proc checks
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * make decoder only tests work
      
      * composite fast tests
      
      * make style
      
      * uncond generation
      
      * feat extr padding
      
      * make audio prompt work
      
      * fix inputs docstrings
      
      * unconditional inputs: dict -> model output
      
      * clean up tests
      
      * more clean up tests
      
      * make style
      
      * t5 encoder -> auto text encoder
      
      * remove comments
      
      * deal with frames
      
      * fix auto text
      
      * slow tests
      
      * nice mdx
      
      * remove can generate
      
      * todo - hub id
      
      * convert m/l
      
      * make fix copies
      
      * only import generation with torch
      
      * ignore decoder from tests
      
      * don't wrap uncond inputs
      
      * make style
      
      * cleaner uncond inputs
      
      * add example to musicgen forward
      
      * fix docs
      
      * ignore MusicGen Model/ForConditionalGeneration in auto mapping
      
      * add doc section to toctree
      
      * add to doc tests
      
      * add processor tests
      
      * fix push to hub in conversion
      
      * tips for decoder only loading
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix conversion for s / m / l checkpoints
      
      * import stopping criteria from module
      
      * remove from pipeline tests
      
      * fix uncond docstring
      
      * decode audio method
      
      * fix docs
      
      * org: sanchit-gandhi -> facebook
      
      * fix max pos embeddings
      
      * remove auto doc (not compatible with shapes)
      
      * bump max pos emb
      
      * make style
      
      * fix doc
      
      * fix config doc
      
      * fix config doc
      
      * ignore musicgen config from docstring
      
      * make style
      
      * fix config
      
      * fix config for doctest
      
      * consistent from_sub_models
      
      * don't automap decoder
      
      * fix mdx save audio file
      
      * fix mdx save audio file
      
      * processor batch decode for audio
      
      * remove keys to ignore
      
      * update doc md
      
      * update generation config
      
      * allow changes for default generation config
      
      * update tests
      
      * make style
      
      * fix docstring for uncond
      
      * fix processor test
      
      * fix processor test
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      1c1c9075
  7. 20 Jun, 2023 1 commit
  8. 16 Jun, 2023 1 commit
  9. 14 Jun, 2023 1 commit
    • Matthijs Hollemans's avatar
      [WIP] add EnCodec model (#23655) · 0c3fdccf
      Matthijs Hollemans authored
      
      
      * boilerplate stuff
      
      * messing around with the feature extractor
      
      * fix feature extractor
      
      * unit tests for feature extractor
      
      * rename speech to audio
      
      * quick-and-dirty import of Meta's code
      
      * import weights (sort of)
      
      * cleaning up
      
      * more cleaning up
      
      * move encoder/decoder args into config
      
      * cleanup model
      
      * rename EnCodec -> Encodec
      
      * RVQ parameters in config
      
      * add slow test
      
      * add lstm init and test_init
      
      * Add save & load
      
      * finish EncodecModel
      
      * remove decoder_input_values as they are ont used anywhere (not removed from doc yet)
      
      * fix test feature extraction model name
      
      * Add better slow test
      
      * Fix tests
      
      * some fixup and cleaning
      
      * Improve further
      
      * cleaning up quantizer
      
      * fix up conversion script
      
      * test don't pass, _encode_fram does not work
      
      * update tests with output per encode and decode
      
      * more cleanup
      
      * rename _codebook
      
      * remove old config cruft
      
      * ratios & hop_length
      
      * use ModuleList instead of Sequential
      
      * clean up resnet block
      
      * update types
      
      * update tests
      
      * fixup
      
      * quick cleanup
      
      * fix padding
      
      * more styl,ing
      
      * add patrick feedback
      
      * fix copies
      
      * fixup
      
      * fix lstm
      
      * fix shape issues
      
      * fixup
      
      * rename conv layers
      
      * fixup
      
      * fix decoding
      
      * small conv refactoring
      
      * remove norm_params
      
      * simplify conv layers
      
      * rename conv layers
      
      * stuff
      
      * Clean up
      
      * Add padding logic
      
      use padding mask
      
      small conv refactoring
      
      remove norm_params
      
      simplify conv layers
      
      rename conv layers
      
      stuff
      
      add batched test
      
      update
      
      Clean up
      
      merge and update for padding
      
      fix padding
      
      fixup
      
      * clean up more
      
      * clean up more
      
      * More clean ups
      
      * cleanup convolutions
      
      * typo
      
      * fix typos
      
      * fixup
      
      * build PR doc?
      
      * start refactoring docstring
      
      * fix don't pad when no strid and chunk
      
      * update docstring
      
      * update docstring
      
      * nits
      
      * update going to lunch
      
      * update config and model
      
      * fix broken testse (becaue of the config changes)
      
      * fix scale computation
      
      * fixu[
      
      * only return dict if speciefied or if config returns it
      
      * remove todos
      
      * update defaults in config
      
      * update conversion script
      
      * fix doctest
      
      * more docstring + fixup
      
      * nits on batched_tests
      
      * more nits
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * update basxed on review
      
      * fix update
      
      * updaet tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fixup
      
      * add overlap and chunl_length_s
      
      * cleanup feature extraction
      
      * teste edge cases truncation and padding
      
      * correct processor values
      
      * update config encodec, nits
      
      * fix tests
      
      * fixup
      
      * fix 24Hz test
      
      * elle tests are green
      
      * fix fixup
      
      * Apply suggestions from code review
      
      * revert readme changes
      
      * fixup
      
      * add example
      
      * use facebook checkpoints
      
      * fix typo
      
      * no pipeline tests
      
      * use slef.pad everywhere we can
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * update based on review
      
      * update
      
      * update mdx
      
      * fix bug and tests
      
      * fixup
      
      * fix doctest
      
      * remove comment
      
      * more nits
      
      * add more coverage for `test_truncation_and_padding`
      
      * fixup
      
      * add last test
      
      * fix text
      
      * nits
      
      * Update tests/models/encodec/test_modeling_encodec.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * take care of the last comments
      
      * typo
      
      * fix test
      
      * nits
      
      * fixup
      
      * Update src/transformers/models/encodec/feature_extraction_encodec.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatararthur.zucker@gmail.com <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      0c3fdccf
  10. 31 May, 2023 1 commit
    • Denisa Roberts's avatar
      Add TensorFlow implementation of EfficientFormer (#22620) · 88f50a1e
      Denisa Roberts authored
      * Add tf code for efficientformer
      
      * Fix return dict bug - return last hidden state after last stage
      
      * Fix corresponding return dict bug
      
      * Override test tol
      
      * Change default values of training to False
      
      * Set training to default False X3
      
      * Rm axis from ln
      
      * Set init in dense projection
      
      * Rm debug stuff
      
      * Make style; all tests pass.
      
      * Modify year to 2023
      
      * Fix attention biases codes
      
      * Update the shape list logic
      
      * Add a batch norm eps config
      
      * Remove extract comments in test files
      
      * Add conditional attn and hidden states return for serving output
      
      * Change channel dim checking logic
      
      * Add exception for withteacher model in training mode
      
      * Revert layer count for now
      
      * Add layer count for conditional layer naming
      
      * Transpose for conv happens only in main layer
      
      * Make tests smaller
      
      * Make style
      
      * Update doc
      
      * Rm from_pt
      
      * Change to actual expect image class label
      
      * Remove stray print in tests
      
      * Update image processor test
      
      * Remove the old serving output logic
      
      * Make style
      
      * Make style
      
      * Complete test
      88f50a1e
  11. 13 Apr, 2023 3 commits
  12. 04 Apr, 2023 1 commit
  13. 28 Mar, 2023 1 commit
  14. 22 Mar, 2023 1 commit
  15. 21 Mar, 2023 2 commits
  16. 02 Mar, 2023 1 commit
    • amyeroberts's avatar
      Use PyAV instead of Decord in examples (#21572) · 3412f597
      amyeroberts authored
      * Use PyAV instead of Decord
      
      * Get frame indices
      
      * Fix number of frames
      
      * Update src/transformers/models/videomae/image_processing_videomae.py
      
      * Fix up
      
      * Fix copies
      
      * Update timesformer doctests
      
      * Update docstrings
      3412f597
  17. 01 Mar, 2023 1 commit
  18. 22 Feb, 2023 1 commit
  19. 20 Feb, 2023 1 commit
    • tanreinama's avatar
      add GPTSAN model (reopen) (#21291) · f56174ac
      tanreinama authored
      * add GPTSAN-Japanese
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN (update for review)
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * fix typo in comment text
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * add GPTSAN
      
      * fix document and comments
      
      * fix class name GPTSAN->GPTSan
      
      * fix import and test for tokenizer
      f56174ac
  20. 16 Feb, 2023 1 commit
    • Arthur's avatar
      [CLAP] Add CLAP to the library (#21370) · c236a621
      Arthur authored
      
      
      * add model like clip
      
      * update
      
      * text model ok
      
      * clap text works
      
      * some refactor
      
      - `CLAPVision` to `CLAPAudio`
      - refactor kwargs of audio modules
      
      * more refactor
      
      * more refactor
      
      * more refactor
      
      * correct fusion
      
      * more refactor
      
      * new modules
      
      * add basic processor
      
      * fixup
      
      * remove whisper copioed from
      
      * audio logits match
      
      * add doc
      
      * correct filters mel and add maxlength
      
      * style
      
      * few fixes
      
      * forward passes
      
      * fixup
      
      * fixup
      
      * some clean up
      
      * remove mels form the dictionnary
      
      * pad after the repeat
      
      * update padding when dsmaller
      
      * fix padding
      
      * style
      
      * use swin patch merging
      
      * use copied from swin
      
      * processor with any tokenizer
      
      * more copied from
      
      * some clean up
      
      * more refactor
      
      * fix mel when rand_trunc
      
      * style
      
      * remove unused imports
      
      * update processing
      
      * remove image processing tests
      
      * add testing fiel
      
      * fixmodeling issues
      
      * replace with `is_longer`
      
      * clap in serialization
      
      * more refactor
      
      * `make fixup`
      
      * make fixup
      
      * fix feature extractor
      
      * update test feature extractor
      
      * `make fixup`
      
      * clean up config
      
      * more clean up
      
      * more cleanup
      
      * update tests
      
      * refactor tests and inits
      
      * removeCLAP vision config
      
      * remove CLAP from image procssing auto and dummy vision objects
      
      * update inits
      
      * style
      
      * re order classes in modeling clap
      
      * Use roberta tokenizer as the other weights are not open sourced
      
      * small cleaup
      
      * remove tokenization CLAP
      
      * processor tokenizr is roberta
      
      * update feature extraction doc
      
      * remove vclap from model zero shot
      
      * update f_min and f_max to frequency_xx
      
      * some changes
      
      - fix modeling keys
      - add `is_longer` in the forward pass
      - make fixup
      
      * make fixup
      
      * consistent behavior ebtween rand_crop and fusion
      
      * add numpy resize and bilinear and documentation
      
      * move resizing to image utils
      
      * clean feature extraction
      
      * import resize from correct file
      
      * resize in image transforms
      
      * update
      
      * style
      
      * style
      
      * nit
      
      * remove unused arguments form the feature extractor
      
      * style
      
      * few fixes + make fixup
      
      * oops
      
      * fix more tests
      
      * add zero shot audio classification pipeline
      
      * update zeroshot classification pipeline
      
      * fixup
      
      * fix copies
      
      * all CI tests pass
      
      * make fixup + fix docs
      
      * fix docs
      
      * fix docs
      
      * update tests pip;eline
      
      * update zero shot pipeline
      
      * update feature extraction clap
      
      * update tokenization auto
      
      * use nested simplify
      
      * update pipeline tests
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * split in two lines
      
      * fixes
      
      * refactor
      
      * clean up
      
      * add integration tests
      
      * update config docstring
      
      * style
      
      * update processor
      
      * fix processor test
      
      * fix feat extractor tests
      
      * update docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix readmes
      
      * fix tips
      
      * Update src/transformers/models/auto/configuration_auto.py
      
      * update doc and remove todo -> properly explained
      
      * fix idx and typo
      
      * typoe
      
      * cleanup config
      
      * cleanup tests, styles and doc
      
      * ignore docstyle on image transform
      
      * add conversion script
      
      * remove the `clap` indx in favor of `CLAP`
      
      * update __init
      
      * nits
      
      * Update src/transformers/pipelines/__init__.py
      
      * fix bug
      
      * clarifiy config
      
      * fix copy
      
      * fix init
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix model output
      
      * fix comment
      
      * make fixup
      
      * make fixup
      
      * rename to `Clap`
      
      * replace to `Clap`
      
      * replace to `Clap`
      
      * repo consistency
      
      * again repo-consistency
      
      * make fixup
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * add config
      
      * changes
      
      * update conversion
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * remove unused function
      
      * update based on code reviews
      
      * style
      
      * more comments
      
      * cleanup
      
      * clean up
      
      * style
      
      * apply suggestions
      
      * Empty commit
      
      * pipeline will be added in a different PR
      
      * update calls to audio utils functions
      
      * update pipeline init
      
      * style
      
      * style
      
      * styling again
      
      * use pad
      
      * fix repo-consistency
      
      * update utils and add doc for audio utils
      
      * clean up resize by using torch. update inits accordingly
      
      * style
      
      * CLap's  tokenizer is RobertA
      
      * add audio utils to internal toctreee
      
      * update totctree
      
      * style
      
      * update documentation and normalize naming accross audio utils and feature extraction clap
      
      * style
      
      * clean up
      
      * update doc and typos
      
      * fix doctest
      
      * update modelin code, got rid of a lot of reshaping
      
      * style on added doc audio utils
      
      * update modeling clap
      
      * style
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * docstringvariables with CLAP
      
      * rename key
      
      * update modeling CLAP
      
      * update audio utils docstring
      
      * update processing clap
      
      * fix readmes
      
      * fix toctree
      
      * udpate configuration clap
      
      * fix init
      
      * make fixup
      
      * fix
      
      * fix
      
      * update naming
      
      * update
      
      * update checkpoint path
      
      * Apply suggestions from code review
      
      * Major refactoring
      
      * Update src/transformers/models/clap/configuration_clap.py
      
      * merge
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      c236a621
  21. 15 Feb, 2023 1 commit
    • Susnato Dhar's avatar
      Add Ernie-M Model to huggingface (#21349) · 0c9c8472
      Susnato Dhar authored
      * config and tokenization(fast too) changed and ErnieEncoder added
      
      * Slow Tokenization Added
      
      * Tokenizer(slow) is now working and Fast Tokenizer removed
      
      * Added Config code
      
      * Added Base Model and utils
      
      * ErnieMModel is now working
      
      * All added except tests
      
      * All tests passed except ErnieUIEM
      
      * All tests passed
      
      * all fixes done
      
      * all fixes done
      
      * fixed MAP
      
      * fixed check_code_quality
      
      * fixed Build PR Documentation issue
      
      * Added changes(comments) and also updated to the latest upstream/main
      
      * Added fixup
      
      * Added # Copied comments
      
      * Added fixup
      
      * Added more comments and some nits
      
      * Added fixup
      
      * Fixed README_hd.md
      
      * Added more fixes
      
      * ErnieMTokenizer (being sentencepiece) protected and other docs edited
      
      * Added code_quality fix
      
      * Fixed for
      
      * Added more fix
      
      * modified AZ
      
      * ernie-m tokenization test added!
      
      * attention mask part fixed(with 0->self.config.pad_token_id)
      
      * applied make fixup
      0c9c8472
  22. 10 Feb, 2023 3 commits
  23. 09 Feb, 2023 1 commit
    • NielsRogge's avatar
      Add BLIP-2 (#21441) · d7f1e7c0
      NielsRogge authored
      
      
      * First draft
      
      * More improvements
      
      * More improvements
      
      * Improve conversion script
      
      * Convert all weights
      
      * Make forward pass work
      
      * Make logits match
      
      * More improvements
      
      * More improvements
      
      * More improvements
      
      * Use get_input_embeddings
      
      * Improve some more
      
      * Improve model tests
      
      * Improve model tests
      
      * More improvements
      
      * Fix processor
      
      * Update files
      
      * Update prepare_inputs_for_generation
      
      * More improvements
      
      * Fix copies
      
      * More fixes
      
      * Make fixup
      
      * More improvements
      
      * Add support for seq2seq language model
      
      * More improvements
      
      * Fix test
      
      * More improvements
      
      * Improve conversion script
      
      * Remove some todo's
      
      * Fix README's
      
      * Improve conversion script
      
      * Fix generation
      
      * Fix style and remove Blip2Model
      
      * Fix model outputs
      
      * More improvements
      
      * Set eos_token_id in config
      
      * Fix quality
      
      * Small improvements
      
      * Add processor tests
      
      * More improvements
      
      * Apply suggestions
      
      * Apply suggestions
      
      * Add integration test
      
      * Update image URL
      
      * Add integration test
      
      * Fix model_type
      
      * Update style
      
      * Improve docs
      
      * Add doc tests
      
      * Fix copies
      
      * Remove tests which are passing
      
      * Improve some more
      
      * Add tests for seq2seq language models
      
      * Minor fix
      
      * Convert more checkpoints
      
      * finalize CI
      
      * Fix blip and blip2 processors
      
      * add `accelerate` support for `blip2`
      
      * clean up
      
      * make style
      
      * Update conversion script
      
      * Update conversion script some more
      
      * Update organization
      
      * revert toc file
      
      * add blip-2 to toc file
      
      * Some more improvements
      
      * Fix docstring
      
      * Improve docs
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      d7f1e7c0
  24. 03 Feb, 2023 1 commit
    • Matthijs Hollemans's avatar
      [WIP] add SpeechT5 model (#18922) · e4bacf66
      Matthijs Hollemans authored
      * make SpeechT5 model by copying Wav2Vec2
      
      * add paper to docs
      
      * whoops added docs in wrong file
      
      * remove SpeechT5Tokenizer + put CTC back in the name
      
      * remove deprecated class
      
      * remove unused docstring
      
      * delete SpeechT5FeatureExtractor, use Wav2Vec2FeatureExtractor instead
      
      * remove classes we don't need right now
      
      * initial stab at speech encoder prenet
      
      * add more speech encoder prenet stuff
      
      * improve SpeechEncoderPrenet
      
      * add encoder (not finished yet)
      
      * add relative position bias to self-attention
      
      * add encoder CTC layers
      
      * fix formatting
      
      * add decoder from BART, doesn't work yet
      
      * make it work with generate loop
      
      * wrap the encoder into a speech encoder class
      
      * wrap the decoder in a text decoder class
      
      * changed my mind
      
      * changed my mind again ;-)
      
      * load decoder weights, make it work
      
      * add weights for text decoder postnet
      
      * add SpeechT5ForCTC model that uses only the encoder
      
      * clean up EncoderLayer and DecoderLayer
      
      * implement _init_weights in SpeechT5PreTrainedModel
      
      * cleanup config + Encoder and Decoder
      
      * add head + cross attention masks
      
      * improve doc comments
      
      * fixup
      
      * more cleanup
      
      * more fixup
      
      * TextDecoderPrenet works now, thanks Kendall
      
      * add CTC loss
      
      * add placeholders for other pre/postnets
      
      * add type annotation
      
      * fix freeze_feature_encoder
      
      * set padding tokens to 0 in decoder attention mask
      
      * encoder attention mask downsampling
      
      * remove features_pen calculation
      
      * disable the padding tokens thing again
      
      * fixup
      
      * more fixup
      
      * code review fixes
      
      * rename encoder/decoder wrapper classes
      
      * allow checkpoints to be loaded into SpeechT5Model
      
      * put encoder into wrapper for CTC model
      
      * clean up conversion script
      
      * add encoder for TTS model
      
      * add speech decoder prenet
      
      * add speech decoder post-net
      
      * attempt to reconstruct the generation loop
      
      * add speech generation loop
      
      * clean up generate_speech
      
      * small tweaks
      
      * fix forward pass
      
      * enable always dropout on speech decoder prenet
      
      * sort declaration
      
      * rename models
      
      * fixup
      
      * fix copies
      
      * more fixup
      
      * make consistency checker happy
      
      * add Seq2SeqSpectrogramOutput class
      
      * doc comments
      
      * quick note about loss and labels
      
      * add HiFi-GAN implementation (from Speech2Speech PR)
      
      * rename file
      
      * add vocoder to TTS model
      
      * improve vocoder
      
      * working on tokenizer
      
      * more better tokenizer
      
      * add CTC tokenizer
      
      * fix decode and batch_code in CTC tokenizer
      
      * fix processor
      
      * two processors and feature extractors
      
      * use SpeechT5WaveformFeatureExtractor instead of Wav2Vec2
      
      * cleanup
      
      * more cleanup
      
      * even more fixup
      
      * notebooks
      
      * fix log-mel spectrograms
      
      * support reduction factor
      
      * fixup
      
      * shift spectrograms to right to create decoder inputs
      
      * return correct labels
      
      * add labels for stop token prediction
      
      * fix doc comments
      
      * fixup
      
      * remove SpeechT5ForPreTraining
      
      * more fixup
      
      * update copyright headers
      
      * add usage examples
      
      * add SpeechT5ProcessorForCTC
      
      * fixup
      
      * push unofficial checkpoints to hub
      
      * initial version of tokenizer unit tests
      
      * add slow test
      
      * fix failing tests
      
      * tests for CTC tokenizer
      
      * finish CTC tokenizer tests
      
      * processor tests
      
      * initial test for feature extractors
      
      * tests for spectrogram feature extractor
      
      * fixup
      
      * more fixup
      
      * add decorators
      
      * require speech for tests
      
      * modeling tests
      
      * more tests for ASR model
      
      * fix imports
      
      * add fake tests for the other models
      
      * fixup
      
      * remove jupyter notebooks
      
      * add missing SpeechT5Model tests
      
      * add missing tests for SpeechT5ForCTC
      
      * add missing tests for SpeechT5ForTextToSpeech
      
      * sort tests by name
      
      * fix Hi-Fi GAN tests
      
      * fixup
      
      * add speech-to-speech model
      
      * refactor duplicate speech generation code
      
      * add processor for SpeechToSpeech model
      
      * add usage example
      
      * add tests for speech-to-speech model
      
      * fixup
      
      * enable gradient checkpointing for SpeechT5FeatureEncoder
      
      * code review
      
      * push_to_hub now takes repo_id
      
      * improve doc comments for HiFi-GAN config
      
      * add missing test
      
      * add integration tests
      
      * make number of layers in speech decoder prenet configurable
      
      * rename variable
      
      * rename variables
      
      * add auto classes for TTS and S2S
      
      * REMOVE CTC!!!
      
      * S2S processor does not support save/load_pretrained
      
      * fixup
      
      * these models are now in an auto mapping
      
      * fix doc links
      
      * rename HiFiGAN to HifiGan, remove separate config file
      
      * REMOVE auto classes
      
      * there can be only one
      
      * fixup
      
      * replace assert
      
      * reformat
      
      * feature extractor can process input and target at same time
      
      * update checkpoint names
      
      * fix commit hash
      e4bacf66
  25. 31 Jan, 2023 1 commit
    • NielsRogge's avatar
      Add DETA (#20983) · 5451f889
      NielsRogge authored
      * First draft
      
      * Add initial draft of conversion script
      
      * Convert all weights
      
      * Fix config
      
      * Add image processor
      
      * Fix DetaImageProcessor
      
      * Run make fix copies
      
      * Remove timm dependency
      
      * Fix dummy objects
      
      * Improve loss function
      
      * Remove conv_encoder attribute
      
      * Update conversion scripts
      
      * Improve postprocessing + docs
      
      * Fix copied from statements
      
      * Add tests
      
      * Improve postprocessing
      
      * Improve postprocessing
      
      * Update READMEs
      
      * More improvements
      
      * Fix rebase
      
      * Add is_torchvision_available
      
      * Add torchvision dependency
      
      * Fix typo and README
      
      * Fix bug
      
      * Add copied from
      
      * Fix style
      
      * Apply suggestions
      
      * Fix thanks to @ydshieh
      
      * Fix another dependency check
      
      * Simplify image processor
      
      * Add scipy
      
      * Improve code
      
      * Add threshold argument
      
      * Fix bug
      
      * Set default threshold
      
      * Improve integration test
      
      * Add another integration test
      
      * Update setup.py
      
      * Address review
      
      * Improve deformable attention function
      
      * Improve copied from
      
      * Use relative imports
      
      * Address review
      
      * Replace assertions
      
      * Address review
      
      * Update dummies
      
      * Remove dummies
      
      * Address comments, update READMEs
      
      * Remove custom kernel code
      
      * Add image processor tests
      
      * Add requires_backends
      
      * Add minor comment
      
      * Update scripts
      
      * Update organization name
      
      * Fix defaults, add doc tests
      
      * Add id2label for object 365
      
      * Fix tests
      
      * Update task guide
      5451f889
  26. 26 Jan, 2023 1 commit
  27. 25 Jan, 2023 1 commit
  28. 19 Jan, 2023 1 commit
    • Jitesh Jain's avatar
      Add OneFormer Model (#20577) · 5b949623
      Jitesh Jain authored
      * Add Oneformer Model
      
      * Add OneFormer Tests
      
      * Add UNIVERSAL_SEGMENTATION_MAPPING
      
      * Fix config
      
      * 馃悰 Fix error encountered while writing tests
      
      * 馃敤 Fix instance segmentation post processing
      
      * Format Files and Add Documentation
      
      * Add Documentation mdx file
      
      * Run make fixup
      
      * Run make fix-copies
      
      * Remove unnecessary code
      
      * Format modeling_oneformer.py
      
      * Add OneFormer to ImageSegmentationPipeline
      
      * Format files
      
      * Add Demo link to Readme
      
      * Fix fomatting errors
      
      * Fix test failures
      
      * Update Table in index.mdx
      
      * Fix version
      
      * Fix style
      
      * Remove OneFormer from TF
      
      * Fix Imports
      
      * Fix dummy objects
      
      * Fix tests
      
      * Add newline
      
      * Remove OneFormerFeatureExtractor
      
      * Remove CUDA Kernels
      
      * Use AutoBackbone for Swin
      
      * Fix description
      
      * Use Image Processor
      
      * Fix copies
      
      * Fix formatting
      
      * Fix import order
      
      * Fix flake8 errors
      
      * Fix doc errors
      
      * Add Hindi Readme entry
      
      * Update supported backbones
      
      * Update supported backbones
      
      * Undo Changes
      
      * Fix type of config
      
      * Fix isort
      
      * Fix auto.mdx
      
      * Fix swin config
      
      * Replace DinatBackbone with AutoBackbone
      
      * Use SwinBackbone
      
      * Use SwinBackbone
      
      * Fix conversion script
      
      * Fix arguments
      
      * Add argument description
      
      * Fix style
      
      * Add OneFormerProcessor
      
      * Fix OneFormerProcessor Tests
      
      * Fix mapping
      
      * Fix imports
      
      * Fix inits
      
      * Fix style
      
      * Fix comment
      
      * Fix docstring
      
      * Move OneFormer to MultiModal
      
      * Fix Copies
      
      * Remove size divisor
      
      * Fix check_repo.py
      
      * Fix copies
      
      * Add Processor for Testing Pipeline
      
      * Fix padding for tokens
      
      * Fix variables
      
      * Fix formatting with correct black version
      
      * Add Image Processor Test
      
      * Apply suggestions
      
      * Revert common modeling
      
      * Add check for task
      
      * Fix conversion script
      
      * Fix initialization order
      
      * Fix tests
      
      * Undo Pipeline Changes
      
      * Fix layers in MLP
      
      * Fix copies
      
      * Update image paths
      
      * Fix copies
      
      * Apply suggestions
      5b949623
  29. 16 Jan, 2023 1 commit
    • NielsRogge's avatar
      Add UperNet (#20648) · 4ed89d48
      NielsRogge authored
      
      
      * First draft
      
      * More improvements
      
      * Add convnext backbone
      
      * Add conversion script
      
      * Add more improvements
      
      * Comment out to_dict
      
      * Add to_dict method
      
      * Add default config
      
      * Fix config
      
      * Fix backbone
      
      * Fix backbone some more
      
      * Add docs, auto mapping, tests
      
      * Fix some tests
      
      * Fix more tests
      
      * Fix more tests
      
      * Add conversion script
      
      * Improve conversion script
      
      * Add support for getting reshaped undownsampled hidden states
      
      * Fix forward pass
      
      * Add print statements
      
      * Comment out set_shift_and_window_size
      
      * More improvements
      
      * Correct downsampling layers conversion
      
      * Fix style
      
      * First draft
      
      * Fix conversion script
      
      * Remove config attribute
      
      * Fix more tests
      
      * Update READMEs
      
      * Update ConvNextBackbone
      
      * Fix ConvNext tests
      
      * Align ConvNext with Swin
      
      * Remove files
      
      * Fix index
      
      * Improve docs
      
      * Add output_attentions to model forward
      
      * Add backbone mixin, improve tests
      
      * More improvements
      
      * Update init_weights
      
      * Fix interpolation of logits
      
      * Add UperNetImageProcessor
      
      * Improve image processor
      
      * Fix image processor
      
      * Remove print statements
      
      * Remove script
      
      * Update import
      
      * Add image processor tests
      
      * Remove print statements
      
      * Fix test
      
      * Add integration test
      
      * Add convnext integration test
      
      * Update docstring
      
      * Fix README
      
      * Simplify config
      
      * Apply suggestions
      
      * Improve docs
      
      * Rename class
      
      * Fix test_initialization
      
      * Fix import
      
      * Address review
      
      * Fix confg
      
      * Convert all checkpoints
      
      * Fix default backbone
      
      * Usage same processor as segformer
      
      * Apply suggestions
      
      * Fix init_weights, update conversion scripts
      
      * Improve config
      
      * Use Auto API instead of creating a new image processor
      
      * Fix docs
      
      * Add doctests
      
      * Remove ResNetConfig dependency
      
      * Add always_partition argument
      
      * Fix rebase茅
      
      * Improve docs
      
      * Convert checkpoints
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MBP.localdomain>
      4ed89d48
  30. 08 Jan, 2023 1 commit
  31. 05 Jan, 2023 1 commit
  32. 03 Jan, 2023 1 commit
    • NielsRogge's avatar
      Add GIT (GenerativeImage2Text) (#20295) · 9c6f7485
      NielsRogge authored
      
      
      * First draft
      
      * Make model instantiation work
      
      * Fix copied from statement
      
      * More fixes
      
      * Add correct output head
      
      * Improve configuration
      
      * Add conversion script
      
      * Improve conversion script
      
      * Remove token_type_ids
      
      * Fix conversion of projection layers
      
      * Convert all weights
      
      * Use cats image
      
      * Make logits match
      
      * Generate caption on cats image
      
      * Add GITProcessor
      
      * Update conversion script
      
      * Add support for more checkpoints
      
      * Fix conversion script
      
      * Add initial tests
      
      * Remove cross-attention
      
      * More improvements
      
      * Remove is_decoder
      
      * Improve model tests
      
      * Improve tests
      
      * Improve model outputs
      
      * Fix model outputs equivalence
      
      * Fix more tests
      
      * Remove unused code
      
      * Use generate to generate text, no use of cache for now
      
      * Use generate more appropriately
      
      * Fix config tests
      
      * Fix style
      
      * Add support for use_cache
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Fix style
      
      * Fix GIT vision encoder
      
      * Update README
      
      * Fix integration test
      
      * Set bos and eos token ids
      
      * Improve docs
      
      * Improve code
      
      * Add support for provided attention_mask
      
      * Add copied from statement
      
      * Fix gradient checkpointing test
      
      * Set model_input_names
      
      * Investigate model_input_names
      
      * Remove script
      
      * Fix model inputs
      
      * Fix docstring
      
      * Rename GIT to Git
      
      * Support more models
      
      * Add support for textvqa model
      
      * Add video support
      
      * Extend conversion script for video
      
      * Add support for large variant
      
      * Add support for more models
      
      * Fix config archive map
      
      * Update integration test
      
      * Fix README
      
      * Fix CLIP mean and std
      
      * Update processor
      
      * Fix use_cache for video, thanks @gante
      
      * Remove print statements
      
      * Remove assertion
      
      * Add processor tests
      
      * Fix model_input_names
      
      * Use Auto API for processor
      
      * Fix processor tests
      
      * Fix integration test
      
      * Fix pipeline test
      
      * Make tests faster
      
      * Update conversion script
      
      * Update conversion script
      
      * Convert more checkpoints
      
      * Update conversion script
      
      * Fix typo
      
      * Update docstrings
      
      * Improve code snippets
      
      * Fix doc tests
      
      * Add more code examples茅
      
      * Fix doc tests
      
      * Add integration tests
      
      * Fix unused variable
      
      * revert
      
      * Add GIT to Japanese README
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      9c6f7485
  33. 21 Dec, 2022 2 commits