1. 14 Oct, 2025 2 commits
    • Janna's avatar
      Longbench v2 (#3338) · 655718d0
      Janna authored
      
      
      * initial commit
      
      * change to acc
      
      * fix long-dialogue tasks
      
      * fix versioning
      
      * more fixes
      
      * fix naming
      
      * fix naming
      
      * more renaming
      
      * maybe a dataset fix
      
      * fix dataset and use new dataset schema
      
      * add README
      
      * fix prompt and dataset naming
      
      * lint
      
      * remove utils.py
      
      * lint
      
      * more linting
      
      * fix typo
      
      * fix naming
      
      * add longbenchv2
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      655718d0
    • Janna's avatar
      bump to python 3.10 (#3337) · 8efef8f1
      Janna authored
      * remove math dependency from python 3.9
      
      * bump to python 3.10
      
      * add 3.12
      8efef8f1
  2. 03 Oct, 2025 1 commit
  3. 02 Oct, 2025 3 commits
  4. 22 Sep, 2025 1 commit
    • priverabsc's avatar
      Add eqbench tasks in Spanish and Catalan (#3168) · de496b80
      priverabsc authored
      * Add eqbench tasks in Spanish and Catalan
      
      * Incremented catalan_bench and spanish_bench versions. Added 'multilingual' folder inside 'eq_bench' and moved the eqbench_ca and eqbench_es .yaml to that folder. Updated the tasks README with eqbench_es and eqbench_ca, expliciting inside each description both the Hugging Face link and the translation method.
      
      * Fixed tasks table.
      
      * remove test_task.sh and results folder
      
      * Add utils.py to multilingual folder
      de496b80
  5. 21 Sep, 2025 6 commits
  6. 12 Sep, 2025 1 commit
  7. 08 Sep, 2025 3 commits
  8. 02 Sep, 2025 4 commits
  9. 27 Aug, 2025 3 commits
  10. 26 Aug, 2025 1 commit
    • Janna's avatar
      Support for AIME dataset (#3248) · 5ac7cdf8
      Janna authored
      * add AIME tasks
      
      * standardize the repeats
      
      * fix task naming
      
      * aime25 only has test set
      
      * edit readme
      
      * add utils
      
      * standardize
      
      * fix case sensitivity
      
      * repeat once
      
      * lint
      
      * more linting
      
      * lint huggingface.py
      5ac7cdf8
  11. 25 Aug, 2025 4 commits
  12. 23 Aug, 2025 1 commit
  13. 22 Aug, 2025 1 commit
  14. 21 Aug, 2025 9 commits