1. 11 Dec, 2023 3 commits
  2. 09 Dec, 2023 3 commits
  3. 08 Dec, 2023 18 commits
  4. 07 Dec, 2023 16 commits
    • Rockerz's avatar
      Translate `model_doc` files from `clip` to `cpm` to JP (#27774) · 0ea42ef0
      Rockerz authored
      
      
      * Add models
      
      * Add more models
      
      * Update docs/source/ja/model_doc/convnextv2.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update docs/source/ja/model_doc/convbert.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update docs/source/ja/model_doc/codegen.md
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      
      * Update translation errors and author names
      
      * link update
      
      ---------
      Co-authored-by: default avatarSteven Liu <59462357+stevhliu@users.noreply.github.com>
      0ea42ef0
    • Dina Suehiro Jones's avatar
      Updates the distributed CPU training documentation to add instructions for... · 79b79ae2
      Dina Suehiro Jones authored
      Updates the distributed CPU training documentation to add instructions for running on a Kubernetes cluster (#27780)
      
      * Updates the Distributed CPU documentation to add a Kubernetes example
      
      * Small edits
      
      * Fixing link
      
      * Adding missing new lines
      
      * Minor edits
      
      * Update to include Dockerfile snippet
      
      * Add comment about tuning env var
      
      * Updates based on review comments
      79b79ae2
    • Steven Liu's avatar
      [docs] Custom semantic segmentation dataset (#27859) · f7595760
      Steven Liu authored
      * custom dataset
      
      * fix link
      
      * feedback
      f7595760
    • Joao Gante's avatar
    • Matt's avatar
      Fix TF loading PT safetensors when weights are tied (#27490) · 47500b1d
      Matt authored
      
      
      * Un-skip tests
      
      * Add aliasing support to tf_to_pt_weight_rename
      
      * Refactor tf-to-pt weight rename for simplicity
      
      * Patch mobilebert
      
      * Let us pray that the transfo-xl one works
      
      * Add XGLM rename
      
      * Expand the test to see if we can get more models to break
      
      * Expand the test to see if we can get more models to break
      
      * Fix MPNet (it was actually an unrelated bug)
      
      * Fix MPNet (it was actually an unrelated bug)
      
      * Add speech2text fix
      
      * Update src/transformers/modeling_tf_pytorch_utils.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update to always return a tuple from tf_to_pt_weight_rename
      
      * reformat
      
      * Add a couple of missing tuples
      
      * Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway
      
      * Revert changes to modeling_tf_mpnet.py
      
      * Skip MPNet test and add explanation
      
      * Add weight link for BART
      
      * Add TODO to clean this up a bit
      
      ---------
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      47500b1d
    • Yih-Dar's avatar
      9f1f11a2
    • fxmarty's avatar
      Fix device of masks in tests (#27887) · c99f2547
      fxmarty authored
      fix device of mask in tests
      c99f2547
    • Hz, Ji's avatar
    • Sourab Mangrulkar's avatar
      update `create_model_card` to properly save peft details when using Trainer with PEFT (#27754) · 5324bf9c
      Sourab Mangrulkar authored
      
      
      * update `create_model_card` to properly save peft details when using Trainer with PEFT
      
      * nit
      
      * Apply suggestions from code review
      Co-authored-by: default avatarBenjamin Bossan <BenjaminBossan@users.noreply.github.com>
      
      ---------
      Co-authored-by: default avatarBenjamin Bossan <BenjaminBossan@users.noreply.github.com>
      5324bf9c
    • Yih-Dar's avatar
      Allow `# Ignore copy` (#27328) · 52746922
      Yih-Dar authored
      
      
      * fix
      
      ---------
      Co-authored-by: default avatarydshieh <ydshieh@users.noreply.github.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      52746922
    • Younes Belkada's avatar
      [`Llava`] Add Llava to transformers (#27662) · 44b5506d
      Younes Belkada authored
      * add model like
      
      * logits match
      
      * minor fixes
      
      * fixes
      
      * up
      
      * up
      
      * add todo
      
      * llava processor
      
      * keep the processor simple
      
      * add conversion script
      
      * fixup
      
      * fix copies
      
      * up
      
      * add to index
      
      * fix config + logits
      
      * fix
      
      * refactor
      
      * more refactor
      
      * more refactor
      
      * fix copies
      
      * add authors
      
      * v1 tests
      
      * add `LlavaProcessor` in init
      
      * remove unneeded import
      
      * up
      
      * up
      
      * docs
      
      * up
      
      * fix CI
      
      * fix CI
      
      * add attention  mask in test
      
      * make fixup
      
      * remove the vision model
      
      * that' s the dirty way to do it
      
      * nits
      
      * nits
      
      * updates
      
      * add more tests
      
      * add input tests
      
      * fixup
      
      * more styling
      
      * nits
      
      * updates amd cleanup
      
      * fixup the generation expected results
      
      * fix the testing script
      
      * some cleanup and simplification which does not work yet but almost there!
      
      * make correct dispatch operations
      
      * vectorize works for batch of images and text
      
      * last todos
      
      * nits
      
      * update test and modeling code
      
      * remove useless function for now
      
      * fix few issues
      
      * fix generation
      
      * some nits
      
      * add bakllava
      
      * nits
      
      * remove duplicated code
      
      * finis merge
      
      * cleanup
      
      * missed this line
      
      * fill the todos
      
      * add left padding offset
      
      * add left and rignt padding logic
      
      * bool to properly index
      
      * make sure
      
      * more cleanups
      
      * batch is fixed 😉
      
      
      
      * add correct device for tensor creation
      
      * fix some dtype missmatch
      
      * ruff
      
      * update conversion script
      
      * Update src/transformers/__init__.py
      
      * fa 2 support + fix conversion script
      
      * more
      
      * correct reshaping
      
      * fix test dict
      
      * fix copies by ignoring
      
      * fix nit
      
      * skip clip vision model
      
      * fixup
      
      * fixup
      
      * LlavaForVisionText2Text -> LlavaForCausalLM
      
      * update
      
      * fix
      
      * raise correct errors
      
      * fix
      
      * docs
      
      * nuke for now
      
      * nits here and there
      
      * fixup
      
      * fix remaining tests
      
      * update LlavaForConditionalGeneration instead of CausalLM
      
      * fixups
      
      * pipeline support
      
      * slow and piepline tests
      
      * supports batch
      
      * nits
      
      * cleanup
      
      * fix first integration tests
      
      * add pad token where needed
      
      * correct etsts
      
      * fixups
      
      * update pipeline testr
      
      * fix quality
      
      * nits
      
      * revert unneeded change
      
      * nit
      
      * use BatchFeature
      
      * from ...feature_extraction_utils import BatchFeature
      
      * nits
      
      * nits
      
      * properly update
      
      * more f*** nits
      
      * fix copies
      
      * comment
      
      * keep slow test slow
      
      * Update src/transformers/models/llava/processing_llava.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * add piepline example
      
      * add pixel values in docstrign
      
      * update pr doctest
      
      * fix
      
      * fix slow tests
      
      * remove hack
      
      * fixup
      
      * small note
      
      * forward contrib credits from PR25789
      
      * forward contrib credits from original implementation and work
      
      * add arthur
      
      * Update src/transformers/models/llava/processing_llava.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * update docstring
      
      * nit
      
      * move to not doctested because of timeout issues
      
      * fixup
      
      * add description
      
      * more
      
      * fix-copies
      
      * fix docs
      
      * add beam search
      
      * add more comments
      
      * add typehints on processor
      
      * add speedup plot
      
      * update slow tests and docs
      
      * push test
      
      * push batched test
      
      * fix batched generation with different number of images
      
      * remove benchmark due to a bug
      
      * fix test
      
      * fix copies
      
      * add gcolab demo
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      Co-authored-by: default avatarshauray8 <shauray8@users.noreply.github.com>
      Co-authored-by: default avatarhaotian-liu <haotian-liu@users.noreply.github.com>
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      44b5506d
    • Phuc Van Phan's avatar
    • Susnato Dhar's avatar
      [`FA-2`] Add Flash Attention to `Phi` (#27661) · f84d85ba
      Susnato Dhar authored
      * add FA and modify doc file
      
      * test_flash_attn_2_generate_padding_right test overwritten
      
      * comment
      
      * modify persimmon modeling file
      
      * added speedup graph
      
      * more changes
      f84d85ba
    • Nolwenn Bernard's avatar
      [i18n-fr] Translate autoclass tutorial to French (#27659) · 06f56168
      Nolwenn Bernard authored
      * Translation of autoclass tutorial
      
      * Update totree to keep only tutorial section
      
      * Translate title toctree
      
      * Fix typos
      
      * Update review comments
      06f56168
    • jiqing-feng's avatar
      Fix bug of _prepare_4d_attention_mask (#27847) · 4d806dba
      jiqing-feng authored
      * use _prepare_4d_attention_mask
      
      * fix comment
      4d806dba
    • Alex McKinney's avatar
      Add Llama Flax Implementation (#24587) · 75336c17
      Alex McKinney authored
      * Copies `modeling_flax_gpt_neo.py` to start
      
      * MLP Block. WIP Attention and Block
      
      * Adds Flax implementation of `LlamaMLP`
      Validated with in-file test.
      Some slight numeric differences, but assuming it isn't an issue
      
      * Adds `FlaxLlamaRMSNorm` layer
      `flax.linen` includes `RMSNorm` layer but not necessarily in all
      versions. Hence, we add in-file.
      
      * Adds FlaxLlamaAttention
      Copied from GPT-J as it has efficient caching implementation as well as
      rotary embeddings.
      Notice numerically different, but not by a huge amount. Needs
      investigating
      
      * Adds `FlaxLlamaDecoderLayer`
      numerically inaccurate, debugging..
      
      * debugging rotary mismatch
      gptj uses interleaved whilst llama uses contiguous
      i think they match now but still final result is wrong.
      maybe drop back to just debugging attention layer?
      
      * fixes bug with decoder layer
      still somewhat numerically inaccurate, but close enough for now
      
      * adds markers for what to implement next
      the structure here diverges a lot from the PT version.
      not a big fan of it, but just get something working for now
      
      * implements `FlaxLlamaBlockCollection`]
      tolerance must be higher than expected, kinda disconcerting
      
      * Adds `FlaxLlamaModule`
      equivalent PyTorch model is `LlamaModel`
      yay! a language model🤗
      
      * adds `FlaxLlamaForCausalLMModule`
      equivalent to `LlamaForCausalLM`
      still missing returning dict or tuple, will add later
      
      * start porting pretrained wrappers
      realised it probably needs return dict as a prereq
      
      * cleanup, quality, style
      
      * readds `return_dict` and model output named tuples
      
      * (tentatively) pretrained wrappers work 🔥
      
      * fixes numerical mismatch in `FlaxLlamaRMSNorm`
      seems `jax.lax.rsqrt` does not match `torch.sqrt`.
      manually computing `1 / jax.numpy.sqrt` results in matching values.
      
      * [WIP] debugging numerics
      
      * numerical match
      I think issue was accidental change of backend. forcing CPU fixes test.
      We expect some mismatch on GPU.
      
      * adds in model and integration tests for Flax Llama
      summary of failing:
      - mul invalid combination of dimensions
      - one numerical mismatch
      - bf16 conversion (maybe my local backend issue)
      - params are not FrozenDict
      
      * adds missing TYPE_CHECKING import and `make fixup`
      
      * adds back missing docstrings
      needs review on quality of docstrings, not sure what is required.
      Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO
      
      * commenting out equivalence test as can just use common
      
      * debugging
      
      * Fixes bug where mask and pos_ids were swapped in pretrained models
      This results in all tests passing now 🔥
      
      
      
      * cleanup of modeling file
      
      * cleanup of test file
      
      * Resolving simpler review comments
      
      * addresses more minor review comments
      
      * fixing introduced pytest errors from review
      
      * wip additional slow tests
      
      * wip tests
      need to grab a GPU machine to get real logits for comparison
      otherwise, slow tests should be okay
      
      * `make quality`, `make style`
      
      * adds slow integration tests
      - checking logits
      - checking hidden states
      - checking generation outputs
      
      * `make fix-copies`
      
      * fix mangled function following `make fix-copies`
      
      * adds missing type checking imports
      
      * fixes missing parameter checkpoint warning
      
      * more finegrained 'Copied from' tags
      avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`
      
      * swaps import guards
      ??? how did these get swapped initially?
      
      * removing `inv_freq` again as pytorch version has now removed
      
      * attempting to get CI to pass
      
      * adds doc entries for llama flax models
      
      * fixes typo in __init__.py imports
      
      * adds back special equivalence tests
      these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version
      
      * overrides tests with dummy to see if CI passes
      need to fill in these tests later
      
      * adds my contribution to docs
      
      * `make style; make quality`
      
      * replaces random masking with fixed to work with flax version
      
      * `make quality; make style`
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * updates `x`->`tensor` in `rotate_half`
      
      * addresses smaller review comments
      
      * Update docs/source/en/model_doc/llama.md
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * adds integration test class
      
      * adds `dtype` to rotary embedding to cast outputs
      
      * adds type to flax llama rotary layer
      
      * `make style`
      
      * `make fix-copies`
      
      * Apply suggestions from code review
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * applies suggestions from review
      
      * Update modeling_flax_llama.py
      
      * `make fix-copies`
      
      * Update tests/models/llama/test_modeling_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * Update src/transformers/models/llama/modeling_flax_llama.py
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      
      * fixes shape mismatch in FlaxLlamaMLP
      
      * applies some suggestions from reviews
      
      * casts attn output logits to f32 regardless of dtype
      
      * adds attn bias using `LlamaConfig.attention_bias`
      
      * adds Copied From comments to Flax Llama test
      
      * mistral and persimmon test change -copy from llama
      
      * updates docs index
      
      * removes Copied from in tests
      
      it was preventing `make fix-copies` from succeeding
      
      * quality and style
      
      * ignores FlaxLlama input docstring
      
      * adds revision to `_CHECKPOINT_FOR_DOC`
      
      * repo consistency and quality
      
      * removes unused import
      
      * removes copied from from Phi test
      
      now diverges from llama tests following FlaxLlama changes
      
      * adds `_REAL_CHECKPOINT_FOR_DOC`
      
      * removes refs from pr tests
      
      * reformat to make ruff happy
      
      ---------
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      75336c17