1. 21 Sep, 2025 4 commits
    • Janna's avatar
      Add BabiLong (#3287) · ccfa4ad1
      Janna authored
      * create babilong tasks
      
      * lint
      
      * add clarification
      
      * fix typo
      
      * add babilong description
      ccfa4ad1
    • Luis Cosio's avatar
      feat: Add mmlu-redux and it's spanish transaltion as generative task definitions (#2705) · fec9dde7
      Luis Cosio authored
      
      
      * Added benchmark
      
      * Added more testing
      
      * Added task definition for mmlu_redux and mmlu_redux_spanish
      
      * Add MMLU Redux English and Spanish tasks with YAML fixes and READMEs
      
      * Add remaining MMLU Redux YAMLs and updated tasks README
      
      * Add MMLU Redux English and Spanish tasks with YAML fixes and READMEs
      
      * Add MMLU Redux changes from pr-2705
      
      * Resolve pre-commit hook and pytest overlapping group issues by adding mmlu_redux_spanish task entries and unique subgroup names
      
      * Enhance retry logic to prevent 429 error when using Hugging Face API for tests, apply pre-commit fixes
      
      * Revert python test changes and comments one task group to avoid Hugging Face rate limit and task failure
      
      ---------
      Co-authored-by: default avatarCT-6282 <ricardo.godric@hotmail.com>
      fec9dde7
    • kaixuanliu's avatar
      add xpu support HFLM (#3211) · 368275f3
      kaixuanliu authored
      
      Signed-off-by: default avatarLiu, Kaixuan <kaixuan.liu@intel.com>
      368275f3
    • Timur Aysin's avatar
      Fix LongBench Evaluation (#3273) · 7f698a5a
      Timur Aysin authored
      
      
      * fix: set 'do_sample=False' and use double quotes in 'doc_to_text'
      
      * feat: update versions and README for longbench
      
      * pacify pre-commit
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      7f698a5a
  2. 12 Sep, 2025 1 commit
  3. 08 Sep, 2025 3 commits
  4. 02 Sep, 2025 4 commits
  5. 27 Aug, 2025 3 commits
  6. 26 Aug, 2025 1 commit
    • Janna's avatar
      Support for AIME dataset (#3248) · 5ac7cdf8
      Janna authored
      * add AIME tasks
      
      * standardize the repeats
      
      * fix task naming
      
      * aime25 only has test set
      
      * edit readme
      
      * add utils
      
      * standardize
      
      * fix case sensitivity
      
      * repeat once
      
      * lint
      
      * more linting
      
      * lint huggingface.py
      5ac7cdf8
  7. 25 Aug, 2025 4 commits
  8. 23 Aug, 2025 1 commit
  9. 22 Aug, 2025 1 commit
  10. 21 Aug, 2025 9 commits
  11. 13 Aug, 2025 1 commit
  12. 08 Aug, 2025 1 commit
  13. 04 Aug, 2025 5 commits
  14. 02 Aug, 2025 1 commit
  15. 24 Jul, 2025 1 commit