1. 23 Aug, 2025 1 commit
  2. 22 Aug, 2025 1 commit
  3. 21 Aug, 2025 6 commits
  4. 08 Aug, 2025 1 commit
  5. 04 Aug, 2025 4 commits
  6. 23 Jul, 2025 2 commits
  7. 22 Jul, 2025 2 commits
  8. 19 Jul, 2025 3 commits
  9. 18 Jul, 2025 1 commit
  10. 16 Jul, 2025 1 commit
    • philipdoldo's avatar
      `bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (#3140) · c2be7211
      philipdoldo authored
      
      
      * Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.
      
      * feat: remove extra space from answers; add changelog
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      c2be7211
  11. 14 Jul, 2025 1 commit
  12. 10 Jul, 2025 1 commit
  13. 03 Jul, 2025 2 commits
    • Baber Abbasi's avatar
      Humaneval - fix regression (#3102) · 8c1016cb
      Baber Abbasi authored
      * use double quotes
      8c1016cb
    • Blanca Calvo's avatar
      Truthfulqa multi harness (#3062) · e0dc33ae
      Blanca Calvo authored
      
      
      * truthfulqa-multi task
      
      * truthfulqa-multi with chat few-shot
      
      * few shot chat implementation
      
      * changed until so it outputs lists
      
      * changed dataset location
      
      * added MT task
      
      * Create README.md
      
      * do not include MT
      
      * changes for PR
      
      * tag change
      
      * removed yaml extension
      
      * adding task to the table
      
      * fix task configs
      
      * add import exception
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      e0dc33ae
  14. 30 Jun, 2025 1 commit
    • jinze's avatar
      FixBug: Align the Humaneval with official results for Llama-3.1-70B-Instruct (#3092) · a7ca0435
      jinze authored
      * Fix: Align the Humaneval dataset with official results
      
      Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals".
      
      (2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one.
      
      Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5).
      
      Ref: PR#2650
      
      * add changelog and version
      
      * add changelog
      a7ca0435
  15. 25 Jun, 2025 1 commit
  16. 20 Jun, 2025 1 commit
  17. 19 Jun, 2025 2 commits
  18. 16 Jun, 2025 2 commits
  19. 12 Jun, 2025 1 commit
  20. 08 Jun, 2025 1 commit
    • Baber Abbasi's avatar
      [longbench] fix metric calculation (#2983) · 147e9d61
      Baber Abbasi authored
      * use all answers
      
      * use middle truncation
      
      * maybe fix classification score
      
      * strip classification preds
      
      * [vllm] remove stop tokens post-hoc
      
      * strip all preds
      
      * pacify pre-commit
      
      * start on truncation utility
      
      * add to readme
      
      * add a footgun doc
      
      * fix newline in yaml templates
      
      * do not strip code_sim preds!
      
      * fix pre-commit config
      
      * fix instruction warning
      
      * add not to longbench readme
      147e9d61
  21. 03 Jun, 2025 2 commits
    • Baber Abbasi's avatar
      remove prints (#3041) · 9f152e0b
      Baber Abbasi authored
      9f152e0b
    • Baber Abbasi's avatar
      add Mbpp instruct (#2995) · 60e85da5
      Baber Abbasi authored
      * feat: add mbpp_instruct
      
      * fix: update generation_kwargs to use an empty until list
      
      * fix: correct predictions formatting in pass_at_1 function
      
      * fix: improve code block extraction by checking first without opening backticks
      
      * fix mbpp `pass_at_1`
      60e85da5
  22. 26 May, 2025 1 commit
  23. 21 May, 2025 1 commit
  24. 19 May, 2025 1 commit