- 04 Apr, 2025 3 commits
-
-
Qubitium-ModelCloud authored
* add gsm8k platinum * only test splits * wrong dataset * link to blog * format
-
Nikodem Szwast authored
* update authnentications methods, add support for deployment_id * run pre-commit on changed file
-
Michele Resta authored
* feat: initial commit with templates for evalita evaluation * fix: change rule for generate_until * feat: modified yaml to use reduced version of NER test datasets * feat: added templates to use reduced dataset for summarization (fanpage and ilpost) * Add Six Prompts for Each Multiple-Choice Task * fix: fastest eval for summarization * chore: linted with ruff * chore: linted with ruff --------- Co-authored-by:rzanoli <zanoli@fbk.eu>
-
- 03 Apr, 2025 1 commit
-
-
Lu Fang authored
Signed-off-by:Lu Fang <lufang@fb.com>
-
- 02 Apr, 2025 2 commits
-
-
Baber Abbasi authored
* add subtask scores * pacify pre-commit
-
Saibo-creator authored
* Add JSON schema benchmark * Update lm_eval/tasks/jsonschema_bench/metrics.py Thanks for catching this Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> * run pre-commit * add description to task catalogue readme --------- Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
- 01 Apr, 2025 2 commits
-
-
Daniel Holanda authored
-
Baber Abbasi authored
* sync with leaderboard * also output old metric * wrap old extraction in try except * better log
-
- 30 Mar, 2025 1 commit
-
-
Alexandre Marques authored
* llama-style MMLU CoT * Refactor MMLU CoT template YAML to simplify 'until' structure * Add GSM8K task configuration for LLaMA3 with few-shot examples * Fix missing newline at end of MMLU CoT YAML file * Add ARC-Challenge task configuration and processing utility * Add additional MMLU and ARC-Challenge task variants to README * Update README with notes on arc_challenge_llama dataset preprocessing
-
- 29 Mar, 2025 1 commit
-
-
Harsha authored
-
- 28 Mar, 2025 3 commits
-
-
Baber Abbasi authored
-
dazipe authored
* Changed default max_length from 2048 to 8192 and max_gen_toks from 256 to 2048 fro MMLU Pro tasks. * Update lm_eval/tasks/mmlu_pro/_default_template_yaml * pre-commit * nit ---------
-
Hadi Abdine authored
* add Darija tasks * fix multiple groups issue in darijammlu * add MT to the description of the Darija tasks * Update README.md nit * fix the recursion error caused by the darija_summarization task * use a custom filter instead of the decorator for the strip function --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
- 27 Mar, 2025 3 commits
- 26 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 25 Mar, 2025 1 commit
-
-
Alexandre Marques authored
* Multilingual MMLU * Refactor process_docs function calls for clarity and consistency
-
- 23 Mar, 2025 1 commit
-
-
Bruno Carneiro authored
I haven't had time to review the library that's replacing tj-actions or whether this change breaks anything, but the vulnerability is quite severe and I would rather the functionality be broken than risk compromise. **to do:** review this later
-
- 21 Mar, 2025 2 commits
-
-
Alexandre Marques authored
-
heli-qi authored
* update mmlu_prox configs * update tasks/README * correct hyphon to underline in task/README * update pre-commit codes
-
- 20 Mar, 2025 6 commits
-
-
Alexandre Marques authored
* Update generation_kwargs in default template to include additional end tokens * Update filter_list in MMLU Pro configuration to use strict_match * Update _default_template_yaml
-
Baber Abbasi authored
-
Baber Abbasi authored
-
Yifei Zhang authored
-
Kiersten Stokes authored
* Add markdown linter to pre-commit hooks * Reformat existing markdown (excluding lm_eval/tasks/*.md)
-
Alexandre Marques authored
* Update continuation template YAML for MMLU task with new generation and filtering options * Refactor filter_list structure in continuation template YAML for improved readability * Add 'take_first' function to filter_list in continuation template YAML * Update filter_list in continuation template YAML to use 'strict_match' and modify filtering functions * Add 'do_sample' option to generation_kwargs in MMLU template YAML
-
- 19 Mar, 2025 2 commits
-
-
Stella Biderman authored
-
Kiersten Stokes authored
-
- 18 Mar, 2025 8 commits
-
-
Jaedong Hwang authored
-
Surya Kasturi authored
* Allow writing confing to wandb * set defaults * Update help * Update help
-
Baber Abbasi authored
* add changelog to readme template * add readme * add to task list
-
Baber Abbasi authored
* add min_pixels, max_pixels * fix
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
Jonas Golde authored
* add MastermindEval benchmark * fill out checklist
-
Santiago Galiano Segura authored
* Add cocoteros_va dataset * Fix format in cocoteros_va.yml * Undo newline added * Execute pre-commit to fix format errors * Update catalan_bench.yaml version and add Changelog section into Readme.md
-
Baber Abbasi authored
* add __version__ * add version consistency check to publish action
-
- 17 Mar, 2025 3 commits
-
-
Kiersten Stokes authored
* Add support for token-based auth for watsonx models * Fix lint * Move dotenv import to inner scope * Improve readability of _verify_credentials
-
Angelika Romanou authored
* Add INCLUDE tasks * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Avelina9X authored
* Update openllm.yaml to use train fewshot split for arc
-