1. 22 Sep, 2025 1 commit
    • priverabsc's avatar
      Add eqbench tasks in Spanish and Catalan (#3168) · de496b80
      priverabsc authored
      * Add eqbench tasks in Spanish and Catalan
      
      * Incremented catalan_bench and spanish_bench versions. Added 'multilingual' folder inside 'eq_bench' and moved the eqbench_ca and eqbench_es .yaml to that folder. Updated the tasks README with eqbench_es and eqbench_ca, expliciting inside each description both the Hugging Face link and the translation method.
      
      * Fixed tasks table.
      
      * remove test_task.sh and results folder
      
      * Add utils.py to multilingual folder
      de496b80
  2. 21 Sep, 2025 5 commits
  3. 08 Sep, 2025 1 commit
  4. 02 Sep, 2025 4 commits
  5. 27 Aug, 2025 3 commits
  6. 26 Aug, 2025 1 commit
    • Janna's avatar
      Support for AIME dataset (#3248) · 5ac7cdf8
      Janna authored
      * add AIME tasks
      
      * standardize the repeats
      
      * fix task naming
      
      * aime25 only has test set
      
      * edit readme
      
      * add utils
      
      * standardize
      
      * fix case sensitivity
      
      * repeat once
      
      * lint
      
      * more linting
      
      * lint huggingface.py
      5ac7cdf8
  7. 25 Aug, 2025 3 commits
  8. 23 Aug, 2025 1 commit
  9. 22 Aug, 2025 1 commit
  10. 21 Aug, 2025 6 commits
  11. 08 Aug, 2025 1 commit
  12. 04 Aug, 2025 4 commits
  13. 23 Jul, 2025 2 commits
  14. 22 Jul, 2025 2 commits
  15. 19 Jul, 2025 3 commits
  16. 18 Jul, 2025 1 commit
  17. 16 Jul, 2025 1 commit
    • philipdoldo's avatar
      `bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (#3140) · c2be7211
      philipdoldo authored
      
      
      * Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.
      
      * feat: remove extra space from answers; add changelog
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      c2be7211