- 07 Jul, 2025 4 commits
- 05 Jul, 2025 9 commits
- 25 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 03 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* fix: bug in acc_mutual_info slicing; add `target_delimiter` to uncond choices * add tests
-
- 21 May, 2025 1 commit
-
-
Baber Abbasi authored
This reverts commit 4dbd5ec9
-
- 19 May, 2025 1 commit
-
-
Baber Abbasi authored
* add `sglang-generate` * nit * nit * nit * pacify pre-commit
-
- 15 May, 2025 1 commit
-
-
Tingchen Fu authored
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* add warning in for default until * fix stop tokens; add vcsum * bugfix:fix doc_to_target to string * fix lsht, trec * add task to readme * add debugging logs for multiple input/output
-
- 07 Apr, 2025 1 commit
-
-
Felipe Maia Polo authored
Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520) * added option --examples * specifying examples in dictionary * run pre-commit - fix arg type Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * fixing bug when examples==None * fixing bug when examples==None * limit or examples must be None in simple_evaluate.py and in evaluator.py * run pre-commit (fix formatting) Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * merge main and run pre-commit (fix formatting) Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com * Update __main__.py undefined "limit" and "examples" * update branch, fix conflicts, run pre-commit * nits * nits * change 'examples' to 'samples' --------- Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com Co-authored-by:
mirianfrsilva <mirianfrsilva@ibm.com> Co-authored-by:
Stella Biderman <stellabiderman@gmail.com> Co-authored-by:
Baber <baber@hey.com>
-
- 18 Mar, 2025 1 commit
-
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
- 14 Mar, 2025 1 commit
-
-
achervyakov authored
* Added audio-modality pipeline for qwen2-audio model * Beauty imports * fix apply_chat_template args * update default audio placeholders list * add demo task - common_voice subset * add audiolm_qwen libs to pyproject.toml * pre-commit beautify --------- Co-authored-by:Alexandra Rak <rakalexandra@mail.ru>
-
- 04 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 21 Feb, 2025 1 commit
-
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-
- 14 Feb, 2025 1 commit
-
-
Kiersten Stokes authored
-
- 06 Feb, 2025 1 commit
-
-
Baber Abbasi authored
-
- 28 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* feat: drop Python 3.8 support * feat: drop Python 3.8 tests * pre-commit * handle chat_template for multiple iput
-
- 17 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* switch arg
-
- 15 Jan, 2025 2 commits
-
-
Baber Abbasi authored
* add assistant prefix * add arc_challenge from llama * nit * nit * nit * add assistant prefix * add mmlu_llama * nit * nit * Revert "nit" This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc. * fix regex bug * add assistant_prefix to vllm * add `Question:` * add mmlu_pro * add fewshot assistant_prefix * use `assistant_prefill` * typehints * nits * nits * add to docs * add readme
-
Hojin Lee authored
* add custom filter * fix type casting of references * add humaneval * fix a bug in humaneval * add greedy version of humaneval * update tasks README * test humaneval * return multiple metrics * nit * add confirmation to run code tasks * nit * nit --------- Co-authored-by:
Hojin Lee <19949034+hjlee1371@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
- 04 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* remove yaml extension from phraes_va_common * remove yaml extension from winogenerated * remove yaml extension from phrases_es * no cache debug logging when not used
-
- 29 Nov, 2024 1 commit
-
-
Baber Abbasi authored
-
- 11 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* batch commit * :Revert "batch commit" This reverts commit d859d1ca . * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * Chat template fix (#7) * cleanup * cleanup * cleanup * linting * fix tests * add ifeval install to new_task CI * Revert "add ifeval install to new_task CI" This reverts commit 1d19449bb7fbfa05d51e7cd20950475eae533bf1. * adds leaderboard tasks (#1) * adds leaderboard tasks * Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml * add readme * Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml * modify readme * fix bbh task * fix bbh salient task * modify the readme * Delete lm_eval/tasks/leaderboard/ifeval/README.md * Delete lm_eval/tasks/leaderboard/math/README.md * add leaderboard to the tasks repertory * add anouncment about new leaderbaord tasks * linting * Update README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * installs ifeval dependency in new_task github workflow --------- Co-authored-by:
Nathan Habib <nathan.habib@huggingface.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix math parser * fix math parser * fix version * add warning about chat template --------- Co-authored-by:
Nathan Habib <nathan.habib@huggingface.co> Co-authored-by:
Nathan Habib <30601243+NathanHB@users.noreply.github.com> Co-authored-by:
Nathan Habib <nathan.habib@huggingface.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
Nathan Habib <nathan.habib19@gmail.com>
-
- 07 Nov, 2024 1 commit
-
-
Baber Abbasi authored
-
- 08 Oct, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 07 Oct, 2024 1 commit
-
-
Baber Abbasi authored
* bugfix * pre-commit
-
- 04 Oct, 2024 1 commit
-
-
Baber Abbasi authored
-
- 17 Sep, 2024 1 commit
-
-
Baber Abbasi authored
-
- 13 Sep, 2024 1 commit
-
-
Lintang Sutawika authored
* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai> Co-authored-by:
Baber Abbasi <baber@eleuther.ai> Co-authored-by:
Baber <baber@hey.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 05 Aug, 2024 1 commit
-
-
Yu Shi Jie authored
* initialized mmlu_pro task * added generative mmlu-pro * added cot fewshot for mmlu-pro * Initial commit * updated mmlu-pro to take on 3 splits: test, val, dev * mmlu-pro: added continuation and flan_cot_zeroshot * added README.md for mmlu_pro * removed * update files * moved files out, and removed unused versions * updated * mmlu_pro: -changed task 'other' to 'miscellaneous' there is already a group named 'other' task and group with the same alias (e.g. mmlu_pro_other_generative) throws an error -fixed yaml backslash escape for fewshot cot * changed choices -> options in yaml config to fit dataset schema * ONLY FOR DEFAULT: fixed yaml file to use variable number of choices * mmlu-pro: fixed doc_to_text/choice/target configs for all variants * mmlu-pro: minor fixes * mmlu-pro/default: aligned with mmlu updates * mmlu-pro: update yaml content in line with mmlu * mmlu-pro: fixed mislabelling of task (math->chemistry) * mmlu-pro: fixed yaml formatting * add custom fewshot doc_to_text, target, and choice * add process for each subtask * add process for each subtask * pre-commit * pre-commit * format * resolved left out merge * deleted folders + updated readme * Update evaluator.py * Update evaluator.py --------- Co-authored-by:
Yu Shi Jie <shijie@tensorplex.ai> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
root <root@455bdd73-01.cloud.together.ai> Co-authored-by:
Lintang Sutawika <lintang@sutawika.com>
-