1. 08 Oct, 2025 1 commit
  2. 04 Oct, 2025 1 commit
    • Baber Abbasi's avatar
      Fewshot refactor (#3227) · 003e5852
      Baber Abbasi authored
      
      
      * overhaul `ContextSampler`
      
      * refactor masakhapos
      
      * move multi_target to `exact_match`
      
      * remove doc_to_choice from `boolq-seq2seq`
      
      * remove doc_to_choice in generation process_results
      
      * Remove unused `doc_to_choice` and fix superglue whitespaces
      
      * require multiple_inputs and multiple_targets to be explicitly set in taskconfig
      
      * fix copa; better logging in task init
      
      * fix doc_to_target to return int rather than str (deprecated)
      
      * fix processing regression; recursively parse lists fron template
      
      * remove redundant jinja parsing logic
      
      * remove promptsource
      
      * for multiple_inputs use `doc_to_text: list[str]``
      
      * Refactor `ContextSampler` `fewshot_context`
      
      * fix multiple_input context
      
      * fix `target_delimiter` with `gen_prefix`
      
      * `doc_to_text` is list for multiple_inputs
      
      * Refactor `count_bytes` and `count_words` methods to `@staticmethod`
      
      * make has_*(train/test/validation) to properties
      
      * remove `multi_target` `generate_until`
      
      * `fix doc_to_target/multiple_targets handling add tests
      
      * rename `multi_target` to `multiple_targets`
      
      * evalaute list when multiple targets
      
      * allow doc_to_target to return list
      
      * Remove gen_prefix space and add warning (#3239)
      
      * Remove gen_prefix space and add warning
      
      * fix null gen_prefix bug again
      
      * use git tests
      
      ---------
      Co-authored-by: default avatarBoaz Ben-Dov <bendboaz@gmail.com>
      003e5852
  3. 25 Sep, 2025 21 commits
  4. 23 Jul, 2025 1 commit
  5. 11 Jul, 2025 2 commits
  6. 25 Jun, 2025 1 commit
  7. 03 Jun, 2025 1 commit
  8. 21 May, 2025 1 commit
  9. 19 May, 2025 1 commit
  10. 15 May, 2025 1 commit
  11. 16 Apr, 2025 1 commit
    • Baber Abbasi's avatar
      Longbench bugfix (#2895) · 930d8378
      Baber Abbasi authored
      * add warning in for default until
      
      * fix stop tokens; add vcsum
      
      * bugfix:fix doc_to_target to string
      
      * fix lsht, trec
      
      * add task to readme
      
      * add debugging logs for multiple input/output
      930d8378
  12. 07 Apr, 2025 1 commit
    • Felipe Maia Polo's avatar
      Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2
      Felipe Maia Polo authored
      
       Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)
      
      * added option --examples
      
      * specifying examples in dictionary
      
      * run pre-commit - fix arg type
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * fixing bug when examples==None
      
      * fixing bug when examples==None
      
      * limit or examples must be None in simple_evaluate.py and in evaluator.py
      
      * run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * merge main and run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * Update __main__.py
      
      undefined "limit" and "examples"
      
      * update branch, fix conflicts, run pre-commit
      
      * nits
      
      * nits
      
      * change 'examples' to 'samples'
      
      ---------
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      Co-authored-by: default avatarmirianfrsilva <mirianfrsilva@ibm.com>
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      d693dcd2
  13. 18 Mar, 2025 1 commit
    • Baber Abbasi's avatar
      Add loncxt tasks (#2629) · 80a10075
      Baber Abbasi authored
      suport for longcontext (and other synthetic tasks)
      * add ruler
      * add longbench
      * pass `metadata` to TaskConfig
      80a10075
  14. 14 Mar, 2025 1 commit
  15. 04 Mar, 2025 1 commit
  16. 21 Feb, 2025 1 commit
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
  17. 14 Feb, 2025 1 commit
  18. 06 Feb, 2025 1 commit
  19. 28 Jan, 2025 1 commit