1. 04 Dec, 2023 1 commit
  2. 17 Nov, 2023 1 commit
  3. 10 Nov, 2023 1 commit
    • Susnato Dhar's avatar
      Add Phi-1 and Phi-1_5 (#26170) · e1c3ac25
      Susnato Dhar authored
      * only dir not even init
      
      * init
      
      * tokenizer removed and reference of codegen added
      
      * modeling file updated a lot remaining app_rotary_emb
      
      * conversion script done
      
      * conversion script fixed, a lot of factoring done and most tests pass
      
      * added token_clf and extractive_QA_head
      
      * integration tests pass
      
      * flash attn tests pass!
      
      * config done
      
      * more docs in modeling file
      
      * some style fix
      
      * style and others
      
      * doc test error fix
      
      * more doc fix
      
      * some attention fixes
      
      * most fixes
      
      * style and other fixes
      
      * docs fix and config
      
      * doc fix
      
      * some comments
      
      * conversion script updated
      
      * conversion script updated
      
      * Revert "conversion script updated"
      
      This reverts commit e92378c54084ec0747041b113083d1746ecb6c7f.
      
      * final comments
      
      * add Phi to language_modeling.md
      
      * edit phi.md file
      
      * rebase and fix
      
      * removed phi-1.5 example
      
      * changed model_type from 'phi'->'mixformer-sequential'
      
      * small change
      
      * small change
      
      * revert \small change
      
      * changed mixformer-sequential->phi
      
      * small change
      
      * added phi-1.5 example instead of phi-1
      
      * doc test might pass now
      
      * rebase and small change
      
      * added the dropout layer
      
      * more fixes
      
      * modified .md file
      
      * very very small doc change
      e1c3ac25
  4. 01 Nov, 2023 1 commit
  5. 18 Oct, 2023 1 commit
    • Pablo Montalvo's avatar
      Add fuyu model (#26911) · caa0ff0b
      Pablo Montalvo authored
      
      
      * initial commit
      
      * add processor, add fuyu naming
      
      * add draft processor
      
      * fix processor
      
      * remove dropout to fix loading of weights
      
      * add image processing fixes from Pedro
      
      * fix
      
      * fix processor
      
      * add basic processing fuyu test
      
      * add documentation and TODO
      
      * address comments, add tests, add doc
      
      * replace assert with torch asserts
      
      * add Mixins and fix tests
      
      * clean imports
      
      * add model tester, clean imports
      
      * fix embedding test
      
      * add updated tests from pre-release model
      
      * Processor: return input_ids used for inference
      
      * separate processing and model tests
      
      * relax test tolerance for embeddings
      
      * add test for logit comparison
      
      * make sure fuyu image processor is imported in the init
      
      * fix formattingh
      
      * more formatting issues
      
      * and more
      
      * fixups
      
      * remove some stuff
      
      * nits
      
      * update init
      
      * remove the fuyu file
      
      * Update integration test with release model
      
      * Update conversion script.
      
      The projection is not used, as confirmed by the authors.
      
      * improve geenration
      
      * Remove duplicate function
      
      * Trickle down patches to model call
      
      * processing fuyu updates
      
      * remove things
      
      * fix prepare_inputs_for_generation to fix generate()
      
      * remove model_input
      
      * update
      
      * add generation tests
      
      * nits
      
      * draft leverage automodel and autoconfig
      
      * nits
      
      * fix dtype patch
      
      * address comments, update READMEs and doc, include tests
      
      * add working processing test, remove refs to subsequences
      
      * add tests, remove Sequence classification
      
      * processing
      
      * update
      
      * update the conversion script
      
      * more processing cleanup
      
      * safe import
      
      * take out ModelTesterMixin for early release
      
      * more cl;eanup
      
      * more cleanup
      
      * more cleanup
      
      * and more
      
      * register a buffer
      
      * nits
      
      * add postprocessing of generate output
      
      * nits
      
      * updates
      
      * add one working test
      
      * fix test
      
      * make fixup works
      
      * fixup
      
      * Arthur's updates
      
      * nits
      
      * update
      
      * update
      
      * fix processor
      
      * update tests
      
      * passe more fixups
      
      * fix
      
      * nits
      
      * don't import torch
      
      * skip fuyu config for now
      
      * fixup done
      
      * fixup
      
      * update
      
      * oups
      
      * nits
      
      * Use input embeddings
      
      * no buffer
      
      * update
      
      * styling processing fuyu
      
      * fix test
      
      * update licence
      
      * protect torch import
      
      * fixup and update not doctested
      
      * kwargs should be passed
      
      * udpates
      
      * update the impofixuprts in the test
      
      * protect import
      
      * protecting imports
      
      * protect imports in type checking
      
      * add testing decorators
      
      * protect top level import structure
      
      * fix typo
      
      * fix check init
      
      * move requires_backend to functions
      
      * Imports
      
      * Protect types
      
      ---------
      Co-authored-by: default avatarPedro Cuenca <pedro@huggingface.co>
      Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarLysandre <lysandre@huggingface.co>
      caa0ff0b
  6. 27 Sep, 2023 1 commit
  7. 12 Sep, 2023 1 commit
  8. 25 Aug, 2023 1 commit
    • Arthur's avatar
      [`CodeLlama`] Add support for `CodeLlama` (#25740) · 015f8e11
      Arthur authored
      
      
      * add all
      
      * Revert "Delete .github directory"
      
      This reverts commit 9b0ff7b052e2b20b629a26fb13606b78a42944d1.
      
      * make conversion script backward compatible
      
      * fixup
      
      * more styling
      
      * copy to llama changes
      
      * fix repo consistency
      
      * nits
      
      * document correct classes
      
      * updates
      
      * more fixes
      
      * nits
      
      * update auto mappings
      
      * add readmes
      
      * smallupdates
      
      * llama-code replace with llama_code
      
      * make fixup
      
      * updates to the testsing suite
      
      * fix fast nits
      
      * more small fixes
      
      * fix decode
      
      * fix template processing
      
      * properly reset the normalizer
      
      * nits processor
      
      * tokenization tests pass
      
      * styling
      
      * last tests
      
      * additional nits
      
      * one test is left
      
      * nits
      
      Co-authored-by faabian <faabian@users.noreply.github.com>
      
      * update failing test
      
      * fixup
      
      * remove decode infilling users should handle it on their onw after generation, padding can be a problem
      
      * update
      
      * make test slow and more meaningfull
      
      * fixup
      
      * doc update
      
      * fixup
      
      * Apply suggestions from code review
      
      * add kwargs doc
      
      * tokenizer requires `requires_backend`
      
      * type requires_backends
      
      * CodeLlama instead of LlamaCode
      
      * more name cahnges
      
      * nits
      
      * make doctests happy
      
      * small pipeline nits
      
      * last nit
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * update
      
      * add codellama to toctree
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      015f8e11
  9. 18 Aug, 2023 1 commit
    • Stas Bekman's avatar
      new model: IDEFICS via HuggingFaceM4 (#24796) · 6c811a32
      Stas Bekman authored
      
      
      * rename
      
      * restore
      
      * mappings
      
      * unedited tests+docs
      
      * docs
      
      * fixes
      
      * fix auto-sync breakage
      
      * cleanup
      
      * wip
      
      * wip
      
      * add fetch_images
      
      * remove einops dependency
      
      * update
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * re-add
      
      * add batching
      
      * rework
      
      * fix
      
      * improve
      
      * add Leo as I am extending his work
      
      * cleanup
      
      * fix
      
      * cleanup
      
      * slow-test
      
      * fix
      
      * fix
      
      * fixes
      
      * deal with warning
      
      * rename modified llama classes
      
      * rework fetch_images
      
      * alternative implementation
      
      * cleanup
      
      * strict version
      
      * cleanup
      
      * [`IDEFICS`] Fix idefics ci (#25056)
      
      * Fix IDEFICS CI
      
      * fix test file
      
      * fixup
      
      * some changes to make tests pass
      
      * fix
      
      * fixup
      
      * Update src/transformers/models/idefics/configuration_idefics.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * remove compat checks
      
      * style
      
      * explain that Idefics is not for training from scratch
      
      * require pt>=2.0
      
      * fix idefics vision config (#25092)
      
      * fix idefics vision config
      
      * fixup
      
      * clean
      
      * Update src/transformers/models/idefics/configuration_idefics.py
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * cleanup
      
      * style
      
      * cleanup
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * upcase
      
      * sequence of images
      
      * handle the case with no images
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      
      * support pure lm take 2
      
      * support tokenizer options
      
      * parameterize num_channels
      
      * fix upcase
      
      * s|IdeficsForCausalLM|IdeficsForVisionText2Text|g
      
      * manual to one line
      
      * addressing review
      
      * unbreak
      
      * remove clip dependency
      
      * fix test
      
      * consistency
      
      * PIL import
      
      * Idefics prefix
      
      * Idefics prefix
      
      * hack to make tests work
      
      * style
      
      * fix
      
      * fix
      
      * revert
      
      * try/finally
      
      * cleanup
      
      * clean up
      
      * move
      
      * [`IDEFICS`] Fix idefics config refactor (#25149)
      
      * refactor config
      
      * nuke init weights
      
      * more refactor
      
      * oops
      
      * remove visual question answering pipeline support
      
      * Update src/transformers/models/idefics/clip.py
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      
      * Update src/transformers/models/idefics/modeling_idefics.py
      
      * cleanup
      
      * mv clip.py vision.py
      
      * tidyup
      
      ---------
      Co-authored-by: default avatarStas Bekman <stas00@users.noreply.github.com>
      Co-authored-by: default avatarStas Bekman <stas@stason.org>
      
      * fix
      
      * license
      
      * condition on pt
      
      * fix
      
      * style
      
      * fix
      
      * rm torchvision dependency, allow custom transforms
      
      * address review
      
      * rework device arg
      
      * add_eos_token
      
      * s/transforms/transform/
      
      * fix top level imports
      
      * fix return value
      
      * cleanup
      
      * cleanup
      
      * fix
      
      * style
      
      * license
      
      * license
      
      * Update src/transformers/models/idefics/image_processing_idefics.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * add a wrapper to freeze vision layears
      
      * tidyup
      
      * use the correct std/mean settings
      
      * parameterize values from config
      
      * add tests/models/idefics/test_image_processing_idefics.py
      
      * add test_processor_idefics.py
      
      * cleanup
      
      * cleanups
      
      * fix
      
      * fix
      
      * move to the right group
      
      * style
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * add perceiver config
      
      * reset
      
      * missing arg docs
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * address review comments
      
      * inject automatic end of utterance tokens (#25218)
      
      * inject automatic end of utterance tokens
      
      * fix
      
      * fix
      
      * fix
      
      * rework to not use the config
      
      * not end_of_utterance_token at the end
      
      * Update src/transformers/models/idefics/processing_idefics.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * address review
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * Update src/transformers/image_processing_utils.py
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      
      * [`Idefics`] add image_embeddings option in generate-related methods (#25442)
      
      * add image_embeddings option in generate-related methods
      
      * style
      
      * rename image_embeddings and allow perceiver embeddings precomputation
      
      * compute embeddings within generate
      
      * make is_encoder_decoder= True the default in config
      
      * nested if else fix
      
      * better triple check
      
      * switch if elif order for pixel values / img embeds
      
      * update model_kwargs perceiver only at the end
      
      * use _prepare_model_inputs instead of encoder_decoder logic
      
      * fix comment typo
      
      * fix config default for is_encoder_decoder
      
      * style
      
      * add typehints
      
      * precompute in forward
      
      * doc builder
      
      * style
      
      * pop instead of get image hidden states
      
      * Trigger CI
      
      * Update src/transformers/models/idefics/modeling_idefics.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/idefics/modeling_idefics.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix * + indentation + style
      
      * simplify a bit the use_resampler logic using comments
      
      * update diocstrings
      
      * Trigger CI
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fix rebase changes
      
      * unbreak #25237 - to be fixed in follow up PRs
      
      * is_composition = False
      
      * no longer needed
      
      ---------
      Co-authored-by: default avatarleot13 <leo.tronchon@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarVictor SANH <victorsanh@gmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      6c811a32
  10. 25 Jul, 2023 1 commit
  11. 11 Jul, 2023 1 commit
    • Matt's avatar
      Falcon port (#24523) · b3ab3fac
      Matt authored
      
      
      * Initial commit
      
      * Update src/transformers/models/falcon/configuration_falcon.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/configuration_falcon.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Cleanup config docstring
      
      * Update src/transformers/models/falcon/configuration_falcon.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * Convert to relative imports
      
      * Remove torch < 1.8 warning
      
      * Restructure cos_sin header
      
      * qkv -> query, key, value
      
      * Refactor attention calculation
      
      * Add a couple of config variables to account for the different checkpoints
      
      * Successful merging of the code paths!
      
      * Fix misplaced line in the non-parallel attention path
      
      * Update config and tests
      
      * Add a pad_token_id when testing
      
      * Support output_attentions when alibi is None
      
      * make fixup
      
      * Skip KV cache shape test
      
      * No more _keys_to_ignore_on_load_missing
      
      * Simplify self attention a bit
      
      * Simplify self attention a bit
      
      * make fixup
      
      * stash commit
      
      * Some more attention mask updates
      
      * Should pass all tests except assisted generation!
      
      * Add big model generation test
      
      * make fixup
      
      * Add temporary workaround for test
      
      * Test overrides for assisted generation
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update tests/models/falcon/test_modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Test overrides for assisted generation
      
      * Add generation demo
      
      * Update copyright
      
      * Make the docstring model actually small
      
      * Add module-level docstring
      
      * Remove all assertions
      
      * Add copied from bloom
      
      * Reformat the QKV layer
      
      * Add copied from bloom
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Remove unused line and reformat
      
      * No single letter variables
      
      * Cleanup return names
      
      * Add copied from line
      
      * Remove the deprecated arguments blocks
      
      * Change the embeddings test to an alibi on/off test
      
      * Remove position_ids from FalconForQA
      
      * Remove old check for token type IDs
      
      * Fix the alibi path when multi_query is False
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/falcon/modeling_falcon.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update tests/models/falcon/test_modeling_falcon.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update config naming
      
      * Fix typo for new_decoder_architecture
      
      * Add some comments
      
      * Fix docstring
      
      * Fix docstring
      
      * Create range in the right dtype from the start
      
      * Review comment cleanup
      
      * n_head_kv -> num_kv_heads
      
      * self.alibi -> self.use_alibi
      
      * self.num_kv -> self.num_kv_heads
      
      * Reorder config args
      
      * Made alibi arguments Optional
      
      * Add all model docstrings
      
      * Add extra checkpoints
      
      * Add author info for Falcon
      
      * Stop removing token_type_ids because our checkpoints shouldn't return it anymore
      
      * Add one hopeful comment for the future
      
      * Fix typo
      
      * Update tests, fix cache issue for generation
      
      * Use -1e9 instead of -inf to avoid float overflow
      
      * Recompute the rotary embeddings much less often
      
      * Re-enable disabled tests
      
      * One final fix to attention mask calculation, and update tests
      
      * Cleanup targeting falcon-40b equivalency
      
      * Post-rebase docs update
      
      * Update docstrings, especially in the config
      
      * More descriptive variable names, and comments where we can't rename them
      
      ---------
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b3ab3fac
  12. 29 Jun, 2023 1 commit
    • Sanchit Gandhi's avatar
      Add Musicgen (#24109) · 1c1c9075
      Sanchit Gandhi authored
      
      
      * Add Audiocraft
      
      * add cross attention
      
      * style
      
      * add for lm
      
      * convert and verify
      
      * introduce t5
      
      * split configs
      
      * load t5 + lm
      
      * clean conversion
      
      * copy from t5
      
      * style
      
      * start pattern provider
      
      * make generation work
      
      * style
      
      * fix pos embs
      
      * propagate shape changes
      
      * propagate shape changes
      
      * style
      
      * delay pattern: pad tokens at end
      
      * audiocraft -> musicgen
      
      * fix inits
      
      * add mdx
      
      * style
      
      * fix pad token in processor
      
      * override generate and add todos
      
      * add init to test
      
      * undo pattern delay mask after gen
      
      * remove cfg logits processor
      
      * remove cfg logits processor
      
      * remove logits processor in favour of mask
      
      * clean pos embs
      
      * make fix copies
      
      * update readmes
      
      * clean pos emb
      
      * refactor encoder/decoder
      
      * make fix copies
      
      * update conversion
      
      * fix config imports
      
      * update config docs
      
      * make style
      
      * send pattern mask to device
      
      * pattern mask with delay
      
      * recover prompted audio tokens
      
      * fix docstrings
      
      * laydown test file
      
      * pattern edge case
      
      * remove t5 ref
      
      * add processing class
      
      * config refactor
      
      * better pattern comment
      
      * check if mask is not present
      
      * check if mask is not present
      
      * refactor to auto class
      
      * remove encoder configs
      
      * fix processor
      
      * processor import
      
      * start updating conversion
      
      * start updating tests
      
      * make style
      
      * convert t5, encodec, lm
      
      * convert as composite
      
      * also convert processor
      
      * run generate
      
      * classifier free gen
      
      * comments and clean up
      
      * make style
      
      * docs for logit proc
      
      * docstring for uncond gen
      
      * start lm tests
      
      * work tests
      
      * let the lm generate
      
      * refactor: reshape inside forward
      
      * undo greedy loop changes
      
      * from_enc_dec -> from_sub_model
      
      * fix input id shapes in docstrings
      
      * Apply suggestions from code review
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * undo generate changes
      
      * from sub model config
      
      * Update src/transformers/models/musicgen/modeling_musicgen.py
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      
      * make generate work again
      
      * generate uncond -> get uncond inputs
      
      * remove prefix allowed tokens fn
      
      * better error message
      
      * logit proc checks
      
      * Apply suggestions from code review
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      
      * make decoder only tests work
      
      * composite fast tests
      
      * make style
      
      * uncond generation
      
      * feat extr padding
      
      * make audio prompt work
      
      * fix inputs docstrings
      
      * unconditional inputs: dict -> model output
      
      * clean up tests
      
      * more clean up tests
      
      * make style
      
      * t5 encoder -> auto text encoder
      
      * remove comments
      
      * deal with frames
      
      * fix auto text
      
      * slow tests
      
      * nice mdx
      
      * remove can generate
      
      * todo - hub id
      
      * convert m/l
      
      * make fix copies
      
      * only import generation with torch
      
      * ignore decoder from tests
      
      * don't wrap uncond inputs
      
      * make style
      
      * cleaner uncond inputs
      
      * add example to musicgen forward
      
      * fix docs
      
      * ignore MusicGen Model/ForConditionalGeneration in auto mapping
      
      * add doc section to toctree
      
      * add to doc tests
      
      * add processor tests
      
      * fix push to hub in conversion
      
      * tips for decoder only loading
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * fix conversion for s / m / l checkpoints
      
      * import stopping criteria from module
      
      * remove from pipeline tests
      
      * fix uncond docstring
      
      * decode audio method
      
      * fix docs
      
      * org: sanchit-gandhi -> facebook
      
      * fix max pos embeddings
      
      * remove auto doc (not compatible with shapes)
      
      * bump max pos emb
      
      * make style
      
      * fix doc
      
      * fix config doc
      
      * fix config doc
      
      * ignore musicgen config from docstring
      
      * make style
      
      * fix config
      
      * fix config for doctest
      
      * consistent from_sub_models
      
      * don't automap decoder
      
      * fix mdx save audio file
      
      * fix mdx save audio file
      
      * processor batch decode for audio
      
      * remove keys to ignore
      
      * update doc md
      
      * update generation config
      
      * allow changes for default generation config
      
      * update tests
      
      * make style
      
      * fix docstring for uncond
      
      * fix processor test
      
      * fix processor test
      
      ---------
      Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
      Co-authored-by: default avatarJoao Gante <joaofranciscocardosogante@gmail.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      1c1c9075
  13. 20 Jun, 2023 1 commit
  14. 02 Jun, 2023 1 commit
  15. 09 May, 2023 1 commit
    • Sylvain Gugger's avatar
      Add RWKV-4 (#22797) · b4d4d6fe
      Sylvain Gugger authored
      
      
      * First draft of RWKV-4
      
      * Add support for generate
      
      * Style post-rebase
      
      * Properly use state
      
      * Write doc
      
      * Fix doc
      
      * More math
      
      * Add model to README, dummies and clean config
      
      * Fix init
      
      * multiple fixes:
      
      - fix common tests
      - fix configuraion default values
      - add CI test for checking state computation
      - fix some CI tests
      
      * correct tokenizer
      
      * some tweaks
      
      - fix config docstring
      - fix failing tests
      
      * fix CI tests
      
      - add output_attention / output_hidden_states
      - override test_initialization
      - fix failing CIs
      
      * fix conversion script
      
      - fix sharded case
      - add new arguments
      
      * add slow tests + more fixes on conversion script
      
      * add another test
      
      * final fixes
      
      * change single name variable
      
      * add mock attention mask for pipeline to work
      
      * correct eos token id
      
      * fix nits
      
      * add checkpoints
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add `tie_word_embeddings` in docstring
      
      * change tensor name
      
      * fix final nits
      
      * Trigger CI
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      b4d4d6fe
  16. 28 Apr, 2023 1 commit
  17. 12 Apr, 2023 1 commit
    • pioliverse's avatar
      add model resources for CPMAnt (new) (#20906) · 523ca4e0
      pioliverse authored
      
      
      * resolve conflicts
      
      * rebase and make style
      
      * test
      
      * test
      
      * test
      
      * rebase and make style
      
      * rebase and make style
      
      * tests
      
      * tests
      
      * rewrite some functions
      
      * rebase and make style
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * fix some bugs & docstring
      
      * add models and tests
      
      * solve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * tests
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * fix some bugs & docstring
      
      * save resolution
      
      * make style
      
      * delete redefinition code
      
      * reformat function
      
      * reformat
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * tests
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * resolve conflicts
      
      * fix load_tf_weights_in_cpmant
      
      * reformat some unrelated files
      
      * upgrade quality
      
      * resolve conflicts
      
      * make style
      
      * fix bugs and refactor
      
      * modify docstrings and make style
      
      * unify import format in __init__.py
      
      * fix import-altclp bug
      
      * fix copies to update index.md
      
      * fix unused config parameters
      
      * fix unused config parameters
      
      * fix unused config parameters
      
      * update README_ja.md
      
      * dummy commit for unit test
      
      * fix attention mask
      
      * add CPMAntTokenizer&-Fast to auto-mapping
      
      * drop redundant changes in README_ko
      
      * fix  defaults in docstring
      
      * fix use_cache and some docstring
      
      * add missing args in tokenizer
      
      * modify tester inheritance
      
      * add is_jieba_available
      
      * fix some bugs
      
      * make style and fix-copies
      
      * add doctests
      
      * skip integration tests
      
      * add is_jieba_available
      
      * fix bugs in common tests
      
      * adjust docstrings and make style
      
      * add argument docstring
      
      * adjust code to some specifications
      
      * make style and fix-copies
      
      * add fast tokenization test
      
      * dummy commit for unit test
      
      * dummy commit for unit test
      
      * dummy commit for unit test
      
      * normalize some comments and names
      
      * Bert->CPMAnt
      
      * camel names and drop redundant codes
      
      * make style and fix-coies
      
      * add CpmTokenizerFast _import_structure
      
      * drop cpmanttokenizerfast in model_doc
      
      * fix some problems
      
      * fix CPMAnt tokenization for common test
      
      * make style and fixup
      
      * fix copies and fixup
      
      * fix bugs in tokenization test
      
      * dummy commit for connection failure in unittest
      
      * fix copies
      
      * drop trailing comma
      
      * fix decorator in tests
      
      * dummy commit for connection failure in unittest
      
      ---------
      Co-authored-by: default avatarGong Baitao <gongbaitao11@gmail.com>
      523ca4e0
  18. 10 Apr, 2023 1 commit
    • Joel Lamy-Poirier's avatar
      Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) (#22575) · e0921c6b
      Joel Lamy-Poirier authored
      
      
      * Add model with cli tool
      
      * Remove unwanted stuff
      
      * Add new code
      
      * Remove inference runner
      
      * Style
      
      * Fix checks
      
      * Test updates
      
      * make fixup
      
      * fix docs
      
      * fix doc
      
      * fix test
      
      * hopefully fix pipeline tests
      
      * refactor
      
      * fix CIs
      
      * add comment
      
      * rename to `GPTBigCodeForCausalLM`
      
      * correct readme
      
      * make fixup + docs
      
      * make fixup
      
      * fixes
      
      * fixes
      
      * Remove pruning
      
      * Remove import
      
      * Doc updates
      
      * More pruning removal
      
      * Combine copies
      
      * Single MQA implementation, remove kv cache pre-allocation and padding
      
      * Update doc
      
      * Revert refactor to match gpt2 style
      
      * Merge back key and value caches, fix some type hints
      
      * Update doc
      
      * Fix position ids pith padding (PR 21080)
      
      * Add conversion script temporarily
      
      * Update conversion script
      
      * Remove checkpoint conversion
      
      * New model
      
      * Fix MQA test
      
      * Fix copies
      
      * try fix tests
      
      * FIX TEST!!
      
      * remove  `DoubleHeadsModel`
      
      * add MQA tests
      
      * add slow tests
      
      * clean up
      
      * add CPU checker
      
      * final fixes
      
      * fixes
      
      - fix GPU issue
      - fixed slow tests
      - skip disk offload
      
      * fix final issue
      
      * Simplify and comment baddbmm fix
      
      * Remove unnecessary code
      
      * Transpose tweaks
      
      * Use beta=1 on cpu, improve tests
      
      ---------
      Co-authored-by: default avataryounesbelkada <younesbelkada@gmail.com>
      e0921c6b
  19. 24 Mar, 2023 1 commit
    • Mitch Naylor's avatar
      Add Mega: Moving Average Equipped Gated Attention (#21766) · 57f25f4b
      Mitch Naylor authored
      
      
      * add mega file structure and plain pytorch version of mega source code
      
      * added config class with old naming conventions
      
      * filled in mega documentation
      
      * added config class and embeddings with optional token types
      
      * updated notes
      
      * starting the conversion process, deleted intermediate and added use_cache back to config
      
      * renamed config attributes in modeling_mega.py
      
      * checkpointing before refactoring incremental decoding functions
      
      * removed stateful incremental key/values for EMA and self-attention
      
      * refactored MovingAverageGatedAttention to remove stateful k/v history and use unified attention mask
      
      * MovingAverageGatedAttention works with incremental decoding + past values, added sequence length enforcement
      
      * more comments in MovingAverageGatedAttention + checkpointing before GatedCrossAttention
      
      * bug fix in attention mask handling in MovingAverageGatedAttention
      
      * removed incremental state from GatedCrossAttention and removed IncrementalState class
      
      * finished gated cross attention and got MegaLayer working
      
      * fixed causal masking in mega decoder
      
      * fixed how padding and causal masks are passed through MegaLayer with and without k/v caching
      
      * finished MegaModel; tested with encoder, decoder-only, and cross-attention type inputs; started work on downstream classes; removed mentions of position_ids
      
      * added optional dense hidden layer for masked and causal LM classes
      
      * docstring updates in MultiHeadEMA and GatedCrossAttention, removed unnecessary inputs in cross-attention
      
      * removed before_attn_fn in Mega class and updated docstrings and comments up to there
      
      * bug fix in MovingAverageGatedAttention masking
      
      * working conversion of MLM checkpoint in scratchpad script -- perfect matches
      
      * moved arg for hidden dense layer in LM head to config; discovered issue where from_pretrained is renaming gamma and beta parameters
      
      * renamed gamma and beta parameters to avoid HF renaming when loading from checkpoint
      
      * finished checkpoint conversion script
      
      * cleanup old class in mega config script
      
      * removed 'copied from' statements and passing integration tests
      
      * added num_attention_heads=1 to config for integration compatibility, decoder tests working, generation tests failing
      
      * fixed tuple output of megamodel
      
      * all common tests passing after fixing issues in decoder, gradient retention, and initialization
      
      * added mega-specific tests, ready for more documentation and style checks
      
      * updated docstrings; checkpoint before style fixes
      
      * style and quality checks, fixed initialization problem in float_tensor, ready for PR
      
      * added mega to toctree
      
      * removed unnecessary arg in megaconfig
      
      * removed unused arg and fixed code samples with leftover roberta models
      
      * Apply suggestions from code review
      
      Applied all suggestions except the one renaming a class, as I'll need to update that througout
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * fixed issue where .view breaks batch dimension, conversion script fixed with absolute imports, updated readme with Mega->MEGA
      
      * removed asserts in Mega code, renamed sequencenorm, gatedcrossattention, and NFFN, replaced get_activation_fn with ACTFN, and added sequencenorm to layer norms
      
      * reformatted .forward() docstrings to match style and removed unused mask input in cross-attention
      
      * removed all reset_parameters() methods and rolled into MegaPreTrainedModel._init_weights()
      
      * renamed all single-letter variables and improved readability in tensor size comments, Mega->MEGA in 2 documentation files
      
      * variable names in NFFN
      
      * manual Mega->MEGA changes in docs
      
      * Mega->MEGA in config auto
      
      * style and quality fixes
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * renamed parameters and variables with confusing names, added copied from statements, moved fft conv to its own method, other cleanup from PR comments
      
      * commit before dealing with merge conflicts
      
      * made new attention activation functions available in ACT2FN and added generation test from OPT
      
      * style and quality in activations and tests
      
      * documentation fixes, renaming variables in dropout and rotary positions, used built-in causal masking, encoders->layers in MegaModel, moved comments into docstrings
      
      * style and quality fixes after latest updates, before rotary position ids
      
      * causal mask in MegaBlock docstring + added missing device passing
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update README.md
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      * added Mega prefixes where missing, reverted MegaSequenceNorm to if-else, other module renaming requested in PR
      
      * style and quality fixes + readme updates pointing to main
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      57f25f4b
  20. 16 Mar, 2023 1 commit
    • Jason Phang's avatar
      LLaMA Implementation (#21955) · 0041be5b
      Jason Phang authored
      
      
      * LLaMA
      
      * sharding and docs
      
      * tweak
      
      * black
      
      * inits
      
      * ruff
      
      * LLAMA_PRETRAINED_CONFIG_ARCHIVE_MAP
      
      * init
      
      * no checkpoint
      
      * docs
      
      * ruff
      
      * type_vocab_size
      
      * tokenizer fixes
      
      * tokenizer fixes
      
      * Update tokenization_llama.py
      
      * Update tokenization_llama.py
      
      * Update configuration_llama.py
      
      * Update modeling_llama.py
      
      * tokenizer add_bos by default
      
      * licenses
      
      * remove decoder
      
      * norms and mlp
      
      * rope overhaul
      
      * tweaks
      
      * black
      
      * mention OPT implementation
      
      * off-by-one naming
      
      * typo
      
      * fix
      
      * tokenization fix and slicing bug
      
      * padding config
      
      * cleanup
      
      * black
      
      * update tests
      
      * undo typo
      
      * fix vocab caching logic
      
      * ruff
      
      * docbuilder
      
      * attn fix from BlackSamorez
      
      * initial feedback
      
      * typo
      
      * docs
      
      * llama case
      
      * llama case
      
      * load checkpoint docs
      
      * comment about tokenizer
      
      * tokenizer defaults
      
      * clear past_key_values if use_cache=False
      
      * last tweaks
      
      * last tweaks
      
      * last tweaks
      
      * last tweaks
      
      ---------
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      0041be5b
  21. 06 Mar, 2023 1 commit
  22. 10 Feb, 2023 1 commit
    • Jannis Vamvas's avatar
      Add X-MOD (#20939) · b0d539cc
      Jannis Vamvas authored
      
      
      * Add X-MOD to Readme
      
      * Add documentation for X-MOD
      
      * Implement X-MOD
      
      * Fix formatting of X-MOD docs
      
      * Change signature of X-MOD forward methods to use lang_ids
      
      * Minor changes
      
      * Rebase with main and run make fix-copies
      
      * Make suggested changes to docstrings
      
      * Improve code readability
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      
      * Fix code style
      
      * Conversion script: Remove asserts and type annotations
      
      * Remove _TOKENIZER_FOR_DOC
      
      * XMOD -> Xmod
      
      * Update copyright note
      
      * Fix doctests
      
      * Fix docstring
      
      * Add integration test for FillMaskPipeline
      
      * Revert "Add integration test for FillMaskPipeline"
      
      This reverts commit 4381eb3b1d0f5d85785f89caba83928e6efa6d1f.
      
      * Add end-to-end integration test for mask fill
      
      * make style
      
      * Rebase with main and make fix-copies
      
      ---------
      Co-authored-by: default avatarYounes Belkada <49240599+younesbelkada@users.noreply.github.com>
      b0d539cc
  23. 02 Feb, 2023 1 commit
  24. 27 Jan, 2023 1 commit
    • Maria Khalusova's avatar
      Automated compatible models list for task guides (#21338) · 73a2ff69
      Maria Khalusova authored
      * initial commit. added tip placeholders and a script
      
      * removed unused imports, fixed paths
      
      * fixed generated links
      
      * make style
      
      * split language modeling doc into two: causal language modeling and masked language modeling
      
      * added check_task_guides.py to make fix-copies
      
      * review feedback addressed
      73a2ff69
  25. 21 Nov, 2022 1 commit
    • Steven Liu's avatar
      Add inference section to task guides (#18781) · d896029e
      Steven Liu authored
      * 📝 start adding inference section to task guides
      
      *  make style
      
      * 📝 add multiple choice
      
      * add rest of inference sections
      
      * make style
      
      * add compute_metric, push_to_hub, pipeline
      
      * make style
      
      * add updated sequence and token classification
      
      * make style
      
      * make edits in token classification
      
      * add audio classification
      
      * make style
      
      * add asr
      
      * make style
      
      * add image classification
      
      * make style
      
      * add summarization
      
      * make style
      
      * add translation
      
      * make style
      
      * add multiple choice
      
      * add language modeling
      
      * add qa
      
      * make style
      
      * review and edits
      
      * apply reviews
      
      * make style
      
      * fix call to processor
      
      * apply audio reviews
      
      * update to better asr model
      
      * make style
      d896029e
  26. 07 Sep, 2022 1 commit
  27. 06 Jul, 2022 1 commit
  28. 28 Jun, 2022 1 commit
  29. 04 Apr, 2022 1 commit
  30. 25 Mar, 2022 1 commit
  31. 22 Mar, 2022 1 commit
  32. 18 Mar, 2022 1 commit
  33. 15 Mar, 2022 1 commit
  34. 23 Feb, 2022 1 commit