1. 23 Jul, 2025 2 commits
  2. 22 Jul, 2025 4 commits
  3. 21 Jul, 2025 4 commits
  4. 19 Jul, 2025 1 commit
  5. 18 Jul, 2025 1 commit
  6. 10 Jul, 2025 2 commits
  7. 08 Jul, 2025 2 commits
  8. 07 Jul, 2025 1 commit
    • Baber's avatar
      nit · 5efa7937
      Baber authored
      5efa7937
  9. 04 Jul, 2025 2 commits
  10. 03 Jul, 2025 1 commit
  11. 01 Jul, 2025 1 commit
  12. 30 Jun, 2025 6 commits
  13. 25 Jun, 2025 1 commit
  14. 03 Jun, 2025 1 commit
  15. 21 May, 2025 1 commit
  16. 19 May, 2025 1 commit
  17. 15 May, 2025 1 commit
  18. 16 Apr, 2025 1 commit
    • Baber Abbasi's avatar
      Longbench bugfix (#2895) · 930d8378
      Baber Abbasi authored
      * add warning in for default until
      
      * fix stop tokens; add vcsum
      
      * bugfix:fix doc_to_target to string
      
      * fix lsht, trec
      
      * add task to readme
      
      * add debugging logs for multiple input/output
      930d8378
  19. 07 Apr, 2025 1 commit
    • Felipe Maia Polo's avatar
      Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2
      Felipe Maia Polo authored
      
       Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)
      
      * added option --examples
      
      * specifying examples in dictionary
      
      * run pre-commit - fix arg type
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * fixing bug when examples==None
      
      * fixing bug when examples==None
      
      * limit or examples must be None in simple_evaluate.py and in evaluator.py
      
      * run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * merge main and run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * Update __main__.py
      
      undefined "limit" and "examples"
      
      * update branch, fix conflicts, run pre-commit
      
      * nits
      
      * nits
      
      * change 'examples' to 'samples'
      
      ---------
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      Co-authored-by: default avatarmirianfrsilva <mirianfrsilva@ibm.com>
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      d693dcd2
  20. 18 Mar, 2025 1 commit
    • Baber Abbasi's avatar
      Add loncxt tasks (#2629) · 80a10075
      Baber Abbasi authored
      suport for longcontext (and other synthetic tasks)
      * add ruler
      * add longbench
      * pass `metadata` to TaskConfig
      80a10075
  21. 14 Mar, 2025 1 commit
  22. 04 Mar, 2025 1 commit
  23. 21 Feb, 2025 1 commit
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
  24. 14 Feb, 2025 1 commit
  25. 06 Feb, 2025 1 commit