1. 12 Sep, 2023 1 commit
  2. 25 Aug, 2023 1 commit
    • Arthur's avatar
      [`CodeLlama`] Add support for `CodeLlama` (#25740) · 015f8e11
      Arthur authored
      
      
      * add all
      
      * Revert "Delete .github directory"
      
      This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.
      
      * make conversion script backward compatible
      
      * fixup
      
      * more styling
      
      * copy to llama changes
      
      * fix repo consistency
      
      * nits
      
      * document correct classes
      
      * updates
      
      * more fixes
      
      * nits
      
      * update auto mappings
      
      * add readmes
      
      * smallupdates
      
      * llama-code replace with llama_code
      
      * make fixup
      
      * updates to the testsing suite
      
      * fix fast nits
      
      * more small fixes
      
      * fix decode
      
      * fix template processing
      
      * properly reset the normalizer
      
      * nits processor
      
      * tokenization tests pass
      
      * styling
      
      * last tests
      
      * additional nits
      
      * one test is left
      
      * nits
      
      Co-authored-by faabian <faabian@users.noreply.github.com>
      
      * update failing test
      
      * fixup
      
      * remove decode infilling users should handle it on their onw after generation, padding can be a problem
      
      * update
      
      * make test slow and more meaningfull
      
      * fixup
      
      * doc update
      
      * fixup
      
      * Apply suggestions from code review
      
      * add kwargs doc
      
      * tokenizer requires `requires_backend`
      
      * type requires_backends
      
      * CodeLlama instead of LlamaCode
      
      * more name cahnges
      
      * nits
      
      * make doctests happy
      
      * small pipeline nits
      
      * last nit
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * update
      
      * add codellama to toctree
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      015f8e11
  3. 18 Aug, 2023 1 commit
    • Stas Bekman's avatar
      new model: IDEFICS via HuggingFaceM4 (#24796) · 6c811a32
      Stas Bekman authored
      
      
      * rename
      
      * restore
      
      * mappings
      
      * unedited tests+docs
      
      * docs
      
      * fixes
      
      * fix auto-sync breakage
      
      * cleanup
      
      * wip
      
      * wip
      
      * add fetch_images
      
      * remove einops dependency
      
      * update
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * re-add
      
      * add batching
      
      * rework
      
      * fix
      
      * improve
      
      * add Leo as I am extending his work
      
      * cleanup
      
      * fix
      
      * cleanup
      
      * slow-test
      
      * fix
      
      * fix
      
      * fixes
      
      * deal with warning
      
      * rename modified llama classes
      
      * rework fetch_images
      
      * alternative implementation
      
      * cleanup
      
      * strict version
      
      * cleanup
      
      * [`IDEFICS`]聽Fix idefics ci (#25056)
      
      * Fix IDEFICS CI
      
      * fix test file
      
      * fixup
      
      * some changes to make tests pass
      
      * fix
      
      * fixup
      
      * Update src/transformers/models/idefics/configuration_idefics.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * remove compat checks
      
      * style
      
      * explain that Idefics is not for training from scratch
      
      * require pt>=2.0
      
      * fix idefics vision config (#25092)
      
      * fix idefics vision config
      
      * fixup
      
      * clean
      
      * Update src/transformers/models/idefics/configuration_idefics.py
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * cleanup
      
      * style
      
      * cleanup
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * upcase
      
      * sequence of images
      
      * handle the case with no images
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * support pure lm take 2
      
      * support tokenizer options
      
      * parameterize num_channels
      
      * fix upcase
      
      * s|IdeficsForCausalLM|IdeficsForVisionText2Text|g
      
      * manual to one line
      
      * addressing review
      
      * unbreak
      
      * remove clip dependency
      
      * fix test
      
      * consistency
      
      * PIL import
      
      * Idefics prefix
      
      * Idefics prefix
      
      * hack to make tests work
      
      * style
      
      * fix
      
      * fix
      
      * revert
      
      * try/finally
      
      * cleanup
      
      * clean up
      
      * move
      
      * [`IDEFICS`] Fix idefics config refactor (#25149)
      
      * refactor config
      
      * nuke init weights
      
      * more refactor
      
      * oops
      
      * remove visual question answering pipeline support
      
      * Update src/transformers/models/idefics/clip.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Update src/transformers/models/idefics/modeling_idefics.py
      
      * cleanup
      
      * mv clip.py vision.py
      
      * tidyup
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      Co-authored-by: default avatarStas Bekman <stas@stason.org>
      
      * fix
      
      * license
      
      * condition on pt
      
      * fix
      
      * style
      
      * fix
      
      * rm torchvision dependency, allow custom transforms
      
      * address review
      
      * rework device arg
      
      * add_eos_token
      
      * s/transforms/transform/
      
      * fix top level imports
      
      * fix return value
      
      * cleanup
      
      * cleanup
      
      * fix
      
      * style
      
      * license
      
      * license
      
      * Update src/transformers/models/idefics/image_processing_idefics.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * add a wrapper to freeze vision layears
      
      * tidyup
      
      * use the correct std/mean settings
      
      * parameterize values from config
      
      * add tests/models/idefics/test_image_processing_idefics.py
      
      * add test_processor_idefics.py
      
      * cleanup
      
      * cleanups
      
      * fix
      
      * fix
      
      * move to the right group
      
      * style
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * add perceiver config
      
      * reset
      
      * missing arg docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * address review comments
      
      * inject automatic end of utterance tokens (#25218)
      
      * inject automatic end of utterance tokens
      
      * fix
      
      * fix
      
      * fix
      
      * rework to not use the config
      
      * not end_of_utterance_token at the end
      
      * Update src/transformers/models/idefics/processing_idefics.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * address review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      
      * [`Idefics`] add image_embeddings option in generate-related methods (#25442)
      
      * add image_embeddings option in generate-related methods
      
      * style
      
      * rename image_embeddings and allow perceiver embeddings precomputation
      
      * compute embeddings within generate
      
      * make is_encoder_decoder= True the default in config
      
      * nested if else fix
      
      * better triple check
      
      * switch if elif order for pixel values / img embeds
      
      * update model_kwargs perceiver only at the end
      
      * use _prepare_model_inputs instead of encoder_decoder logic
      
      * fix comment typo
      
      * fix config default for is_encoder_decoder
      
      * style
      
      * add typehints
      
      * precompute in forward
      
      * doc builder
      
      * style
      
      * pop instead of get image hidden states
      
      * Trigger CI
      
      * Update src/transformers/models/idefics/modeling_idefics.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/idefics/modeling_idefics.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix * + indentation + style
      
      * simplify a bit the use_resampler logic using comments
      
      * update diocstrings
      
      * Trigger CI
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix rebase changes
      
      * unbreak #25237 - to be fixed in follow up PRs
      
      * is_composition = False
      
      * no longer needed
      
      ---------
      Co-authored-by: default avatarleot13 <leo.tronchon@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      6c811a32
  4. 13 Aug, 2023 1 commit
  5. 09 Aug, 2023 1 commit
  6. 25 Jul, 2023 2 commits
    • Sebastian Husch Lee's avatar
      [`T5`, `MT5`, `UMT5`] Add [T5, MT5, UMT5]ForSequenceClassification (#24726) · 8f36ab3e
      Sebastian Husch Lee authored
      * Initial addition of t5forsequenceclassification
      
      * Adding imports and adding tests
      
      * Formatting
      
      * Running make fix-copies
      
      * Adding mt5forseq
      
      * Formatting
      
      * run make fix-copies
      
      * Adding to docs
      
      * Add model_parallel
      
      * Fix bug
      
      * Fix
      
      * Remove TODO
      
      * Fixing tests for T5ForSequenceClassification
      
      * Undo changes to dependency_versions_table.py
      
      * Change classification head to work with T5Config directly
      
      * Change seq length to let tests pass
      
      * PR comments for formatting
      
      * Formatting
      
      * Initial addition of UMT5ForSequenceClassification
      
      * Adding to inits and formatting
      
      * run make fix-copies
      
      * Add doc for UMT5ForSeqClass
      
      * Update UMT5 config
      
      * Fix docs
      
      * Skip torch fx test for SequenceClassification
      
      * Formatting
      
      * Add skip to UMT5 tests as well
      
      * Fix umt5 tests
      
      * Running make fix-copies
      
      * PR comments
      
      * Fix for change to sentence_representation
      
      * Rename seq_len to hidden_size since that's what it is
      
      * Use base_model to follow format of the rest of the library
      
      * Update docs
      
      * Extract the decoder_input_ids changes and make one liner
      
      * Make one-liner
      8f36ab3e
    • Arthur's avatar
      [`MPT`] Add MosaicML's `MPT` model to transformers (#24629) · dcb183f4
      Arthur authored
      
      
      * draft add new model like
      
      * some cleaning of the config
      
      * nits
      
      * add nested configs
      
      * nits
      
      * update
      
      * update
      
      * added layer norms + triton kernels
      
      * consider only LPLayerNorm for now.
      
      * update
      
      * all keys match.
      
      * Update
      
      * fixing nits here and there
      
      * working forward pass.
      
      * removed einops dependency
      
      * nits
      
      * format
      
      * add alibi
      
      * byebye head mask
      
      * refactor attention
      
      * nits.
      
      * format
      
      * fix nits.
      
      * nuke ande updates
      
      * nuke tokenizer test
      
      * don't reshape query with kv heads
      
      * added a bit of documentation.
      
      * remove unneeded things
      
      * nuke more stuff
      
      * nit
      
      * logits match - same generations
      
      * rm unneeded methods
      
      * 1 remaining failing CI test
      
      * nit
      
      * fix nits
      
      * fix docs
      
      * fix docs
      
      * rm tokenizer
      
      * fixup
      
      * fixup
      
      * fixup and fix tests
      
      * fixed configuration object.
      
      * use correct activation
      
      * few minor fixes
      
      * clarify docs a bit
      
      * logits match 脿 1e-12
      
      * skip and unskip a test
      
      * added some slow tests.
      
      * fix readme
      
      * add more details
      
      * Update docs/source/en/model_doc/mpt.md
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix configuration issues
      
      * more fixes in config
      
      * added more models
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove unneeded position ids
      
      * fix some  comments
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * revert suggestion
      
      * mpt alibi + added batched generation
      
      * Update src/transformers/models/mpt/__init__.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * remove init config
      
      * Update src/transformers/models/mpt/configuration_mpt.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix nit
      
      * add another slow test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fits in one line
      
      * some refactor because make fixup doesn't pass
      
      * add ft notebook
      
      * update md
      
      * correct doc path
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      dcb183f4
  7. 24 Jul, 2023 1 commit
    • Rinat's avatar
      Pvt model (#24720) · a03d13c8
      Rinat authored
      * pull and push updates
      
      * add docs
      
      * fix modeling
      
      * Add and run test
      
      * make copies
      
      * add task
      
      * fix tests and fix small issues
      
      * Checks on a Pull Request
      
      * fix docs
      
      * add desc pvt.md
      a03d13c8
  8. 18 Jul, 2023 1 commit
    • NielsRogge's avatar
      Add DINOv2 (#24016) · 3ec10e6c
      NielsRogge authored
      * First draft
      
      * More improvements
      
      * Convert patch embedding layer
      
      * Convert all weights
      
      * Make conversion work
      
      * Improve conversion script
      
      * Fix style
      
      * Make all tests pass
      
      * Add image processor to auto mapping
      
      * Add swiglu ffn
      
      * Add image processor to conversion script
      
      * Fix conversion of giant model
      
      * Fix documentation
      
      * Fix style
      
      * Fix tests
      
      * Address comments
      
      * Address more comments
      
      * Remove unused arguments
      
      * Remove more arguments
      
      * Rename parameters
      
      * Include mask token
      
      * Address comments
      
      * Add docstring
      
      * Transfer checkpoints
      
      * Empty commit
      3ec10e6c
  9. 11 Jul, 2023 2 commits
  10. 10 Jul, 2023 1 commit
  11. 05 Jul, 2023 1 commit
  12. 03 Jul, 2023 1 commit
    • Arthur's avatar
      [`Umt5`] Add google's umt5 to `transformers` (#24477) · 799df10a
      Arthur authored
      
      
      * add tokenization template
      
      * update conversion script
      
      * update modeling code
      
      * update
      
      * update convert checkpoint
      
      * update modeling
      
      * revert changes on convert script
      
      * new conversion script for new format
      
      * correct position bias
      
      * cleaning a bit
      
      * Credit co authors
      Co-authored-by: default avataragemagician <ahmed.elnaggar@tum.de>
      
      Co-authored-by: stefan-it
      <>
      
      * styling
      
      * Add docq
      
      * fix copies
      
      * add co author
      
      * Other Author
      
      * Merge branch 'main' of https://github.com/huggingface/transformers
      
       into add-umt5
      
      * add testing
      
      * nit
      
      * Update docs/source/en/model_doc/umt5.mdx
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      
      * fix t5
      
      * actual fix?
      
      * revert wrong changes
      
      * remove
      
      * update test
      
      * more fixes
      
      * revert some changes
      
      * add SPIECE_UNDERLINE
      
      * add a commone xample
      
      * upfate
      
      * fix copies
      
      * revert changes on t5 conversion script
      
      * revert bytefallback changes since there was no addition yet
      
      * fixup
      
      * fixup
      
      * ingore umt5 cutom testing folder
      
      * fix readmes
      
      * revertT5 changes
      
      * same outputs
      
      * fixup
      
      * update example
      
      * Apply suggestions from code review
      
      * style
      
      * draft addition of all new files
      
      * current update
      
      * fix attention and stuff
      
      * finish refactoring
      
      * auto config
      
      * fixup
      
      * more nits
      
      * add umt5 to init
      
      * use md format
      
      * Update README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * revert changes on mt5
      
      * revert mt4 changes
      
      * update test
      
      * more fixes
      
      * add to mapping
      
      * fix-copies
      
      * fix copies
      
      * foix retain grad
      
      * fix some tests
      
      * nits
      
      * done
      
      * Update src/transformers/models/umt5/modeling_umt5.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update docs/source/en/model_doc/umt5.md
      
      * Update src/transformers/models/umt5/__init__.py
      
      * Update docs/source/en/model_doc/umt5.md
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      
      * Update src/transformers/models/umt5/modeling_umt5.py
      
      * update conversion script + use google checkpoints
      
      * nits
      
      * update test and modelling
      
      * stash slow convert
      
      * update fixupd
      
      * don't change slow
      
      ---------
      
      Co-authored-by: stefan-it <>
      Co-authored-by: default avatarStefan Schweter <stefan@schweter.it>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      799df10a
  13. 29 Jun, 2023 3 commits
    • amyeroberts's avatar
      Removal of deprecated vision methods and specify deprecation versions (#24570) · b324557a
      amyeroberts authored
      * Removal of deprecated methods and specify versions
      
      * Fix tests
      b324557a
    • Sanchit Gandhi's avatar
      Add Musicgen (#24109) · 1c1c9075
      Sanchit Gandhi authored
      
      
      * Add Audiocraft
      
      * add cross attention
      
      * style
      
      * add for lm
      
      * convert and verify
      
      * introduce t5
      
      * split configs
      
      * load t5 + lm
      
      * clean conversion
      
      * copy from t5
      
      * style
      
      * start pattern provider
      
      * make generation work
      
      * style
      
      * fix pos embs
      
      * propagate shape changes
      
      * propagate shape changes
      
      * style
      
      * delay pattern: pad tokens at end
      
      * audiocraft -> musicgen
      
      * fix inits
      
      * add mdx
      
      * style
      
      * fix pad token in processor
      
      * override generate and add todos
      
      * add init to test
      
      * undo pattern delay mask after gen
      
      * remove cfg logits processor
      
      * remove cfg logits processor
      
      * remove logits processor in favour of mask
      
      * clean pos embs
      
      * make fix copies
      
      * update readmes
      
      * clean pos emb
      
      * refactor encoder/decoder
      
      * make fix copies
      
      * update conversion
      
      * fix config imports
      
      * update config docs
      
      * make style
      
      * send pattern mask to device
      
      * pattern mask with delay
      
      * recover prompted audio tokens
      
      * fix docstrings
      
      * laydown test file
      
      * pattern edge case
      
      * remove t5 ref
      
      * add processing class
      
      * config refactor
      
      * better pattern comment
      
      * check if mask is not present
      
      * check if mask is not present
      
      * refactor to auto class
      
      * remove encoder configs
      
      * fix processor
      
      * processor import
      
      * start updating conversion
      
      * start updating tests
      
      * make style
      
      * convert t5, encodec, lm
      
      * convert as composite
      
      * also convert processor
      
      * run generate
      
      * classifier free gen
      
      * comments and clean up
      
      * make style
      
      * docs for logit proc
      
      * docstring for uncond gen
      
      * start lm tests
      
      * work tests
      
      * let the lm generate
      
      * refactor: reshape inside forward
      
      * undo greedy loop changes
      
      * from_enc_dec -> from_sub_model
      
      * fix input id shapes in docstrings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * undo generate changes
      
      * from sub model config
      
      * Update src/transformers/models/musicgen/modeling_musicgen.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make generate work again
      
      * generate uncond -> get uncond inputs
      
      * remove prefix allowed tokens fn
      
      * better error message
      
      * logit proc checks
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * make decoder only tests work
      
      * composite fast tests
      
      * make style
      
      * uncond generation
      
      * feat extr padding
      
      * make audio prompt work
      
      * fix inputs docstrings
      
      * unconditional inputs: dict -> model output
      
      * clean up tests
      
      * more clean up tests
      
      * make style
      
      * t5 encoder -> auto text encoder
      
      * remove comments
      
      * deal with frames
      
      * fix auto text
      
      * slow tests
      
      * nice mdx
      
      * remove can generate
      
      * todo - hub id
      
      * convert m/l
      
      * make fix copies
      
      * only import generation with torch
      
      * ignore decoder from tests
      
      * don't wrap uncond inputs
      
      * make style
      
      * cleaner uncond inputs
      
      * add example to musicgen forward
      
      * fix docs
      
      * ignore MusicGen Model/ForConditionalGeneration in auto mapping
      
      * add doc section to toctree
      
      * add to doc tests
      
      * add processor tests
      
      * fix push to hub in conversion
      
      * tips for decoder only loading
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix conversion for s / m / l checkpoints
      
      * import stopping criteria from module
      
      * remove from pipeline tests
      
      * fix uncond docstring
      
      * decode audio method
      
      * fix docs
      
      * org: sanchit-gandhi -> facebook
      
      * fix max pos embeddings
      
      * remove auto doc (not compatible with shapes)
      
      * bump max pos emb
      
      * make style
      
      * fix doc
      
      * fix config doc
      
      * fix config doc
      
      * ignore musicgen config from docstring
      
      * make style
      
      * fix config
      
      * fix config for doctest
      
      * consistent from_sub_models
      
      * don't automap decoder
      
      * fix mdx save audio file
      
      * fix mdx save audio file
      
      * processor batch decode for audio
      
      * remove keys to ignore
      
      * update doc md
      
      * update generation config
      
      * allow changes for default generation config
      
      * update tests
      
      * make style
      
      * fix docstring for uncond
      
      * fix processor test
      
      * fix processor test
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      1c1c9075
    • amyeroberts's avatar
      Update old existing feature extractor references (#24552) · ae454f41
      amyeroberts authored
      * Update old existing feature extractor references
      
      * Typo
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Apply suggestions from code review
      
      * Address comments from review - update 'feature extractor'
      Co-authored by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
      ae454f41
  14. 28 Jun, 2023 1 commit
  15. 27 Jun, 2023 1 commit
  16. 26 Jun, 2023 1 commit
  17. 20 Jun, 2023 1 commit
  18. 19 Jun, 2023 1 commit
  19. 14 Jun, 2023 1 commit
  20. 09 Jun, 2023 1 commit
  21. 02 Jun, 2023 2 commits
  22. 24 May, 2023 1 commit
  23. 12 May, 2023 1 commit
  24. 10 May, 2023 1 commit
  25. 09 May, 2023 1 commit
    • Sylvain Gugger's avatar
      Add RWKV-4 (#22797) · b4d4d6fe
      Sylvain Gugger authored
      
      
      * First draft of RWKV-4
      
      * Add support for generate
      
      * Style post-rebase
      
      * Properly use state
      
      * Write doc
      
      * Fix doc
      
      * More math
      
      * Add model to README, dummies and clean config
      
      * Fix init
      
      * multiple fixes:
      
      - fix common tests
      - fix configuraion default values
      - add CI test for checking state computation
      - fix some CI tests
      
      * correct tokenizer
      
      * some tweaks
      
      - fix config docstring
      - fix failing tests
      
      * fix CI tests
      
      - add output_attention / output_hidden_states
      - override test_initialization
      - fix failing CIs
      
      * fix conversion script
      
      - fix sharded case
      - add new arguments
      
      * add slow tests + more fixes on conversion script
      
      * add another test
      
      * final fixes
      
      * change single name variable
      
      * add mock attention mask for pipeline to work
      
      * correct eos token id
      
      * fix nits
      
      * add checkpoints
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add `tie_word_embeddings` in docstring
      
      * change tensor name
      
      * fix final nits
      
      * Trigger CI
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b4d4d6fe
  26. 04 May, 2023 2 commits
  27. 03 May, 2023 1 commit
  28. 02 May, 2023 1 commit
  29. 01 May, 2023 1 commit
  30. 28 Apr, 2023 1 commit
  31. 27 Apr, 2023 2 commits
  32. 24 Apr, 2023 1 commit
  33. 23 Apr, 2023 1 commit