1. 05 Jul, 2025 1 commit
  2. 04 Jul, 2025 1 commit
  3. 03 Jul, 2025 3 commits
  4. 25 Jun, 2025 1 commit
  5. 23 Jun, 2025 1 commit
  6. 03 Jun, 2025 1 commit
  7. 21 May, 2025 1 commit
  8. 19 May, 2025 1 commit
  9. 15 May, 2025 1 commit
  10. 22 Apr, 2025 2 commits
  11. 16 Apr, 2025 1 commit
    • Baber Abbasi's avatar
      Longbench bugfix (#2895) · 930d8378
      Baber Abbasi authored
      * add warning in for default until
      
      * fix stop tokens; add vcsum
      
      * bugfix:fix doc_to_target to string
      
      * fix lsht, trec
      
      * add task to readme
      
      * add debugging logs for multiple input/output
      930d8378
  12. 07 Apr, 2025 1 commit
    • Felipe Maia Polo's avatar
      Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2
      Felipe Maia Polo authored
      
       Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)
      
      * added option --examples
      
      * specifying examples in dictionary
      
      * run pre-commit - fix arg type
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * fixing bug when examples==None
      
      * fixing bug when examples==None
      
      * limit or examples must be None in simple_evaluate.py and in evaluator.py
      
      * run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * merge main and run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * Update __main__.py
      
      undefined "limit" and "examples"
      
      * update branch, fix conflicts, run pre-commit
      
      * nits
      
      * nits
      
      * change 'examples' to 'samples'
      
      ---------
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      Co-authored-by: default avatarmirianfrsilva <mirianfrsilva@ibm.com>
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      d693dcd2
  13. 18 Mar, 2025 1 commit
    • Baber Abbasi's avatar
      Add loncxt tasks (#2629) · 80a10075
      Baber Abbasi authored
      suport for longcontext (and other synthetic tasks)
      * add ruler
      * add longbench
      * pass `metadata` to TaskConfig
      80a10075
  14. 14 Mar, 2025 1 commit
  15. 11 Mar, 2025 1 commit
    • PabloAgustin's avatar
      New healthcare benchmark: careqa (#2714) · 7c9fbcf8
      PabloAgustin authored
      
      
      * New healthcare benchmark: careqa
      
      * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0>
      
      * Add fixes, READMES, and remove task_list.txt
      
      * pre-commit passed, add formatting updates; add nanmean agg_metric
      
      * Fix import error.
      
      * Wrapped imports in try excepts
      
      * Wrapped imports in try excepts; also metrics to catch bert_score import error
      
      * Try except to catch ImportErrors as well
      
      * use np.nan
      
      * pre-commit
      
      ---------
      Co-authored-by: default avatarPabloAgustin <pablo.martin@bsc.es>
      Co-authored-by: default avatarBaber <baber@hey.com>
      7c9fbcf8
  16. 04 Mar, 2025 1 commit
  17. 21 Feb, 2025 1 commit
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
  18. 14 Feb, 2025 2 commits
  19. 06 Feb, 2025 1 commit
  20. 28 Jan, 2025 1 commit
  21. 19 Jan, 2025 1 commit
  22. 17 Jan, 2025 1 commit
  23. 15 Jan, 2025 2 commits
    • Baber Abbasi's avatar
      assistant prefill (#2615) · 703fbffd
      Baber Abbasi authored
      * add assistant prefix
      
      * add arc_challenge from llama
      
      * nit
      
      * nit
      
      * nit
      
      * add assistant prefix
      
      * add mmlu_llama
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.
      
      * fix regex bug
      
      * add assistant_prefix to vllm
      
      * add `Question:`
      
      * add mmlu_pro
      
      * add fewshot assistant_prefix
      
      * use `assistant_prefill`
      
      * typehints
      
      * nits
      
      * nits
      
      * add to docs
      
      * add readme
      703fbffd
    • Hojin Lee's avatar
      Add HumanEval (#1992) · 4c11206b
      Hojin Lee authored
      
      
      * add custom filter
      
      * fix type casting of references
      
      * add humaneval
      
      * fix a bug in humaneval
      
      * add greedy version of humaneval
      
      * update tasks README
      
      * test humaneval
      
      * return multiple metrics
      
      * nit
      
      * add confirmation to run code tasks
      
      * nit
      
      * nit
      
      ---------
      Co-authored-by: default avatarHojin Lee <19949034+hjlee1371@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      4c11206b
  24. 04 Jan, 2025 1 commit
    • Baber Abbasi's avatar
      some minor logging nits (#2609) · 888ac292
      Baber Abbasi authored
      * remove yaml extension from phraes_va_common
      
      * remove yaml extension from winogenerated
      
      * remove yaml extension from phrases_es
      
      * no cache debug logging when not used
      888ac292
  25. 29 Nov, 2024 1 commit
  26. 28 Nov, 2024 1 commit
  27. 11 Nov, 2024 1 commit
  28. 07 Nov, 2024 1 commit
  29. 08 Oct, 2024 2 commits
  30. 07 Oct, 2024 1 commit
  31. 04 Oct, 2024 1 commit
  32. 17 Sep, 2024 1 commit
  33. 13 Sep, 2024 1 commit
    • Lintang Sutawika's avatar
      Multimodal prototyping (#2243) · fb963f0f
      Lintang Sutawika authored
      
      
      * add WIP hf vlm class
      
      * add doc_to_image
      
      * add mmmu tasks
      
      * fix merge conflicts
      
      * add lintang's changes to hf_vlms.py
      
      * fix doc_to_image
      
      * added yaml_path for config-loading
      
      * revert
      
      * add line to process str type v
      
      * update
      
      * modeling cleanup
      
      * add aggregation for mmmu
      
      * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP)
      
      * implemented doc_to_image
      
      * update doc_to_image to accept list of features
      
      * update functions
      
      * readd image processed
      
      * update args process
      
      * bugfix for repeated images fed to model
      
      * push WIP loglikelihood code
      
      * commit most recent code (generative ; qwen2-vl testing)
      
      * preliminary image_token_id handling
      
      * small mmmu update: some qs have >4 mcqa options
      
      * push updated modeling code
      
      * use processor.apply_chat_template
      
      * add mathvista draft
      
      * nit
      
      * nit
      
      * ensure no footguns in text<>multimodal LM<>task incompatibility
      
      * add notification to readme regarding launch of prototype!
      
      * fix compatibility check
      
      * reorganize mmmu configs
      
      * chat_template=None
      
      * add interleave chat_template
      
      * add condition
      
      * add max_images; interleave=true
      
      * nit
      
      * testmini_mcq
      
      * nit
      
      * pass image string; convert img
      
      * add vllm
      
      * add init
      
      * vlm add multi attr
      
      * fixup
      
      * pass max images to vllm model init
      
      * nit
      
      * encoding to device
      
      * fix HFMultimodalLM.chat_template ?
      
      * add mmmu readme
      
      * remove erroneous prints
      
      * use HFMultimodalLM.chat_template ; restore tasks/__init__.py
      
      * add docstring for replace_placeholders in utils
      
      * fix `replace_placeholders`; set image_string=None
      
      * fix typo
      
      * cleanup + fix merge conflicts
      
      * update MMMU readme
      
      * del mathvista
      
      * add some sample scores
      
      * Update README.md
      
      * add log msg for image_string value
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      Co-authored-by: default avatarBaber Abbasi <baber@eleuther.ai>
      Co-authored-by: default avatarBaber <baber@hey.com>
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      fb963f0f
  34. 04 Sep, 2024 1 commit