1. 05 Aug, 2024 10 commits
  2. 03 Aug, 2024 2 commits
    • Xueshen Liu's avatar
      MixtralFlashAttention2: put "plus 1" inside parentheses when calculating... · 621fb3c0
      Xueshen Liu authored
      MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. (#31500)
      
      * Mixtral: remove unnecessary plus 1 when calculating rotary_seq_len, allowing position_ids=None (no auto position_ids generation could be unsafe)
      
      * fix typo [:-1] to [:, -1]
      
      * to meet formatting requirement
      
      * to meet formatting requirement
      
      * remove white space
      
      * MixtralFlashAttention2: put "+ 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input. Fix format/style issue.
      
      * propagate to startcoder2, phi3, mixtral and qwen2
      
      * update qwen2_moe
      621fb3c0
    • Shaopeng Fu's avatar
      fix: (issue #32124) Exception raised when running... · 7c31d05b
      Shaopeng Fu authored
      fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`. (#32157)
      
      fix: Exception raised when running .
      7c31d05b
  3. 02 Aug, 2024 3 commits
  4. 01 Aug, 2024 13 commits
  5. 31 Jul, 2024 9 commits
  6. 30 Jul, 2024 3 commits
    • Joshua Lochner's avatar
      Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process (#32191) · 6e2d04e4
      Joshua Lochner authored
      * Remove user-defined tokens which can be obtained through merges
      
      * Remove debug line
      
      * formatting
      
      * Refactor spm slow -> fast converter
      
      * revert unnecessary refactor
      
      * set comprehension
      
      * remove test files
      
      * Use `vocab_scores`
      
      * Always replace spiece underline with space in decode
      
      * we no longer need token filtering
      
      * Add save fast load slow unit test
      
      * Remove tokenizers version check
      
      * Remove duplicate code
      
      * Make `<start_of_turn>` and `<end_of_turn>` special tokens
      
      * Bias merge priority with length if score is the same
      
      * Add unit test for merge priority
      
      * CI
      6e2d04e4
    • Joao Gante's avatar
      Repo checks: skip docstring checks if not in the diff (#32328) · 026a173a
      Joao Gante authored
      * tmp
      
      * skip files not in the diff
      
      * use git.Repo instead of an external subprocess
      
      * add tiny change to confirm that the diff is working on pushed changes
      
      * add make quality task
      
      * more profesh main commit reference
      026a173a
    • fkrasnov2's avatar
      fixes #32329 : The Torch code is correct - to get an average of 10% o… (#32335) · 516af4bb
      fkrasnov2 authored
      fixes #32329 : The Torch code is correct - to get an average of 10% of the total, we want to take 50% of the remainder after we've already masked 80% with [MASK] in the previous step.
      516af4bb