1. 07 Aug, 2024 1 commit
    • Raushan Turganbay's avatar
      Cache: new Cache format in decoder-only models (#31421) · a30c865f
      Raushan Turganbay authored
      
      
      * draft bart with new cache
      
      * add cache for decoder-only models
      
      * revert utils
      
      * modify docstring
      
      * revert bart
      
      * minor fixes
      
      * fix copies (not related)
      
      * revert tests
      
      * remove enc-dec related code
      
      * remove bloom
      
      * remove opt (enc-dec)
      
      * update docstring
      
      * git, codegen, gpt_neo, gpt_neox, gpj
      
      * clean up
      
      * copied from statements
      
      * revert
      
      * tmp
      
      * update warning msg
      
      * forgot git
      
      * add more flags
      
      * run-slow git,codegen,gpt_neo,gpt_neox,gpj
      
      * add cache flag to VLMs
      
      * remove files
      
      * style
      
      * video LLMs also need a flag
      
      * style
      
      * llava will go in another PR
      
      * style
      
      * [run-slow] codegen, falcon, git, gpt_neo, gpt_neox, gptj, idefics
      
      * Update src/transformers/models/gpt_neo/modeling_gpt_neo.py
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * copy from
      
      * deprecate until v4.45 and warn if not training
      
      * nit
      
      * fix test
      
      * test static cache
      
      * add more tests and fix models
      
      * fix copies
      
      * return sliding window mask
      
      * run slow tests & fix + codestyle
      
      * one more falcon fix for alibi
      
      ---------
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      a30c865f
  2. 06 Aug, 2024 17 commits
  3. 05 Aug, 2024 10 commits
  4. 03 Aug, 2024 2 commits
    • Xueshen Liu's avatar
      MixtralFlashAttention2: put "plus 1" inside parentheses when calculating... · 621fb3c0
      Xueshen Liu authored
      MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500)
      
      * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)
      
      * fix typo [:-1] to [:, -1]
      
      * to meet formatting requirement
      
      * to meet formatting requirement
      
      * remove white space
      
      * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.
      
      * propagate to startcoder2, phi3, mixtral and qwen2
      
      * update qwen2_moe
      621fb3c0
    • Shaopeng Fu's avatar
      fix: (issue #32124) Exception raised when running... · 7c31d05b
      Shaopeng Fu authored
      fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157)
      
      fix: Exception raised when running .
      7c31d05b
  5. 02 Aug, 2024 3 commits
  6. 01 Aug, 2024 7 commits