1. 10 Jul, 2025 1 commit
    • Baber's avatar
      fixup · 66736bc1
      Baber authored
      66736bc1
  2. 08 Jul, 2025 2 commits
  3. 07 Jul, 2025 3 commits
    • Baber's avatar
      nit · 5efa7937
      Baber authored
      5efa7937
    • Baber's avatar
      nit · 646dec9e
      Baber authored
      646dec9e
    • Baber's avatar
      add docs · 0967905f
      Baber authored
      0967905f
  4. 05 Jul, 2025 1 commit
  5. 04 Jul, 2025 4 commits
  6. 03 Jul, 2025 1 commit
  7. 01 Jul, 2025 1 commit
  8. 30 Jun, 2025 7 commits
  9. 25 Jun, 2025 1 commit
  10. 03 Jun, 2025 1 commit
  11. 21 May, 2025 1 commit
  12. 19 May, 2025 1 commit
  13. 15 May, 2025 1 commit
  14. 16 Apr, 2025 1 commit
    • Baber Abbasi's avatar
      Longbench bugfix (#2895) · 930d8378
      Baber Abbasi authored
      * add warning in for default until
      
      * fix stop tokens; add vcsum
      
      * bugfix:fix doc_to_target to string
      
      * fix lsht, trec
      
      * add task to readme
      
      * add debugging logs for multiple input/output
      930d8378
  15. 07 Apr, 2025 1 commit
    • Felipe Maia Polo's avatar
      Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2
      Felipe Maia Polo authored
      
       Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)
      
      * added option --examples
      
      * specifying examples in dictionary
      
      * run pre-commit - fix arg type
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * fixing bug when examples==None
      
      * fixing bug when examples==None
      
      * limit or examples must be None in simple_evaluate.py and in evaluator.py
      
      * run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * merge main and run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * Update __main__.py
      
      undefined "limit" and "examples"
      
      * update branch, fix conflicts, run pre-commit
      
      * nits
      
      * nits
      
      * change 'examples' to 'samples'
      
      ---------
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      Co-authored-by: default avatarmirianfrsilva <mirianfrsilva@ibm.com>
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      d693dcd2
  16. 18 Mar, 2025 1 commit
    • Baber Abbasi's avatar
      Add loncxt tasks (#2629) · 80a10075
      Baber Abbasi authored
      suport for longcontext (and other synthetic tasks)
      * add ruler
      * add longbench
      * pass `metadata` to TaskConfig
      80a10075
  17. 14 Mar, 2025 1 commit
  18. 11 Mar, 2025 1 commit
    • PabloAgustin's avatar
      New healthcare benchmark: careqa (#2714) · 7c9fbcf8
      PabloAgustin authored
      
      
      * New healthcare benchmark: careqa
      
      * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0>
      
      * Add fixes, READMES, and remove task_list.txt
      
      * pre-commit passed, add formatting updates; add nanmean agg_metric
      
      * Fix import error.
      
      * Wrapped imports in try excepts
      
      * Wrapped imports in try excepts; also metrics to catch bert_score import error
      
      * Try except to catch ImportErrors as well
      
      * use np.nan
      
      * pre-commit
      
      ---------
      Co-authored-by: default avatarPabloAgustin <pablo.martin@bsc.es>
      Co-authored-by: default avatarBaber <baber@hey.com>
      7c9fbcf8
  19. 04 Mar, 2025 1 commit
  20. 21 Feb, 2025 1 commit
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
  21. 14 Feb, 2025 2 commits
  22. 06 Feb, 2025 1 commit
  23. 28 Jan, 2025 1 commit
  24. 19 Jan, 2025 1 commit
  25. 17 Jan, 2025 1 commit
  26. 15 Jan, 2025 2 commits
    • Baber Abbasi's avatar
      assistant prefill (#2615) · 703fbffd
      Baber Abbasi authored
      * add assistant prefix
      
      * add arc_challenge from llama
      
      * nit
      
      * nit
      
      * nit
      
      * add assistant prefix
      
      * add mmlu_llama
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.
      
      * fix regex bug
      
      * add assistant_prefix to vllm
      
      * add `Question:`
      
      * add mmlu_pro
      
      * add fewshot assistant_prefix
      
      * use `assistant_prefill`
      
      * typehints
      
      * nits
      
      * nits
      
      * add to docs
      
      * add readme
      703fbffd
    • Hojin Lee's avatar
      Add HumanEval (#1992) · 4c11206b
      Hojin Lee authored
      
      
      * add custom filter
      
      * fix type casting of references
      
      * add humaneval
      
      * fix a bug in humaneval
      
      * add greedy version of humaneval
      
      * update tasks README
      
      * test humaneval
      
      * return multiple metrics
      
      * nit
      
      * add confirmation to run code tasks
      
      * nit
      
      * nit
      
      ---------
      Co-authored-by: default avatarHojin Lee <19949034+hjlee1371@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      4c11206b