1. 24 Sep, 2025 1 commit
  2. 25 Jul, 2025 1 commit
    • Baber's avatar
      fix · e72ec96c
      Baber authored
      e72ec96c
  3. 24 Jul, 2025 1 commit
    • Baber's avatar
      fix · d762e2aa
      Baber authored
      d762e2aa
  4. 23 Jul, 2025 3 commits
  5. 22 Jul, 2025 4 commits
  6. 21 Jul, 2025 4 commits
  7. 19 Jul, 2025 1 commit
  8. 18 Jul, 2025 1 commit
  9. 10 Jul, 2025 2 commits
  10. 08 Jul, 2025 2 commits
  11. 07 Jul, 2025 1 commit
    • Baber's avatar
      nit · 5efa7937
      Baber authored
      5efa7937
  12. 04 Jul, 2025 2 commits
  13. 03 Jul, 2025 1 commit
  14. 01 Jul, 2025 1 commit
  15. 30 Jun, 2025 6 commits
  16. 25 Jun, 2025 1 commit
  17. 03 Jun, 2025 1 commit
  18. 21 May, 2025 1 commit
  19. 19 May, 2025 1 commit
  20. 15 May, 2025 1 commit
  21. 16 Apr, 2025 1 commit
    • Baber Abbasi's avatar
      Longbench bugfix (#2895) · 930d8378
      Baber Abbasi authored
      * add warning in for default until
      
      * fix stop tokens; add vcsum
      
      * bugfix:fix doc_to_target to string
      
      * fix lsht, trec
      
      * add task to readme
      
      * add debugging logs for multiple input/output
      930d8378
  22. 07 Apr, 2025 1 commit
    • Felipe Maia Polo's avatar
      Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2
      Felipe Maia Polo authored
      
       Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)
      
      * added option --examples
      
      * specifying examples in dictionary
      
      * run pre-commit - fix arg type
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * fixing bug when examples==None
      
      * fixing bug when examples==None
      
      * limit or examples must be None in simple_evaluate.py and in evaluator.py
      
      * run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * merge main and run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * Update __main__.py
      
      undefined "limit" and "examples"
      
      * update branch, fix conflicts, run pre-commit
      
      * nits
      
      * nits
      
      * change 'examples' to 'samples'
      
      ---------
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      Co-authored-by: default avatarmirianfrsilva <mirianfrsilva@ibm.com>
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      d693dcd2
  23. 18 Mar, 2025 1 commit
    • Baber Abbasi's avatar
      Add loncxt tasks (#2629) · 80a10075
      Baber Abbasi authored
      suport for longcontext (and other synthetic tasks)
      * add ruler
      * add longbench
      * pass `metadata` to TaskConfig
      80a10075
  24. 14 Mar, 2025 1 commit