- 25 Sep, 2025 6 commits
- 25 Aug, 2025 1 commit
-
-
William Held authored
* Anthropic Discrim Eval * Mixed Effects Regression * Actually wire it all upo * Operator Name Doesn't Exist on Github * Update lm_eval/tasks/discrim_eval/discrim_eval_implicit.yaml Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> * Update discrim_eval_implicit.yaml * Update discrim_eval_explicit.yaml * pacify pre-commit --------- Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
- 04 Aug, 2025 1 commit
-
-
Baber Abbasi authored
-
- 26 Jul, 2025 1 commit
-
-
Baber authored
-
- 23 Jul, 2025 2 commits
-
-
Baber Abbasi authored
* remove trust-remote-code * add W605 rule
-
Baber Abbasi authored
* Fix: pin datasets < 4.0 * fix * update type hints in HF * fix hellaswag path
-
- 22 Jul, 2025 1 commit
-
-
Svetlana Karimova authored
* Feat: add LIBRA benchmark * Feat: add dataset filter to LIBRA * Fix: formatting through pre-commit and main tasks README * Fix: resolve conflict * Fix: dataset name to real * Fix: delete unnececcary datasets and correct dependency --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
- 06 Jul, 2025 2 commits
-
-
Baber Abbasi authored
* remove sparse-ml
-
Baber Abbasi authored
-
- 05 Jul, 2025 1 commit
-
-
Baber Abbasi authored
-
- 19 Jun, 2025 1 commit
-
-
Baber Abbasi authored
-
- 19 May, 2025 1 commit
-
-
Harsha authored
* adding ACPBench_hard * adding Clingo * changing tarski to tarski[clingo] * denoting the main variants in each paper
-
- 13 May, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* switch MMLU to cais/mmlu * switch back to tj-actions/changed-files * cache HF folder
-
- 03 Apr, 2025 1 commit
-
-
Lu Fang authored
Signed-off-by:Lu Fang <lufang@fb.com>
-
- 20 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
* Add markdown linter to pre-commit hooks * Reformat existing markdown (excluding lm_eval/tasks/*.md)
-
- 19 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
-
- 18 Mar, 2025 1 commit
-
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
- 17 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
* Add support for token-based auth for watsonx models * Fix lint * Move dotenv import to inner scope * Improve readability of _verify_credentials
-
- 14 Mar, 2025 1 commit
-
-
achervyakov authored
* Added audio-modality pipeline for qwen2-audio model * Beauty imports * fix apply_chat_template args * update default audio placeholders list * add demo task - common_voice subset * add audiolm_qwen libs to pyproject.toml * pre-commit beautify --------- Co-authored-by:Alexandra Rak <rakalexandra@mail.ru>
-
- 05 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 04 Mar, 2025 2 commits
-
-
Kiersten Stokes authored
* Add a test for a custom unitxt task * Update task.py to bring in line with breaking change in v1.17.2 * Fix lint
-
Lucia Quirke authored
* Enable steering HF models Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu> * increase HF download timeout * Update readme; improve steering vector device handling * Update latest news * remove HF timeout increase * fix tests * ignore sae lens test * fix accidental force push --------- Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>
-
- 21 Feb, 2025 1 commit
-
-
Baber Abbasi authored
* add math_verify to minerva math * add math_verify to benchmark * fix error * increment version
-
- 17 Dec, 2024 2 commits
-
-
Baber Abbasi authored
* feat: drop Python 3.8 support * feat: drop Python 3.8 tests * pre-commit
-
Baber Abbasi authored
forgot to increment 0.4.6!
-
- 15 Nov, 2024 1 commit
-
-
Nikodem Szwast authored
* refactor code, fix config path bug * update types to be from typing lib * add pre-commit formatting * specify version of ibm_watsonx_ai package * adjust get_watsonx_credentials() function, add minor refactor to adress PR review comments * change missing installation hint from ibm_watsonx_ai to lm_eval[ibm_watsonx_ai]
-
- 05 Nov, 2024 1 commit
-
-
mtkachenko authored
* add jaqket_v2 and jcommonsenseqa * remove comments * remove num_beams as it is incompatible with vllm * add jnli + refactor * rename jnla -> jnli * add jsquad + replace colon chars with the Japanese unicode * ignore whitespaces in generation tasks * add marc_ja * add xwinograd + simplify other yamls * add mgsm and xlsum * refactor xlsum * add ja_leaderboard tag * edit README.md * update README.md * add credit + minor changes * run ruff format * address review comments + add group * remove aggregate_metric_list * remove tags * update tasks/README.md
-
- 31 Oct, 2024 1 commit
-
-
Qubitium-ModelCloud authored
* support gptqmodel * code opt * add gptqmodel option * Update huggingface.py * Update pyproject.toml * gptqmodel version upgraded to 1.0.6 * GPTQModel version upgraded to 1.0.8 * Update pyproject.toml * fix ruff-format error * add gptqmodel test * Update gptqmodel test model * skip cuda * python3.8 compatible * Update README.md * Update README.md --------- Co-authored-by:CL-ModelCloud <cl@modelcloud.ai>
-
- 25 Oct, 2024 1 commit
-
-
Kiersten Stokes authored
* Update pyproject.toml with watsonx package extra Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> * Remove unused function Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> --------- Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Oct, 2024 1 commit
-
-
Nikodem Szwast authored
* add support for IBM watsonx_llm * add ibm_watsonx_ai package to optional-dependencies * move global scope imports to inner scope * change cache to lru_cache * fix circular import * use 3.8 typing * use 3.8 typing --------- Co-authored-by:Baber <baber@hey.com>
-
- 08 Oct, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 05 Sep, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 28 Aug, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 01 Aug, 2024 1 commit
-
-
Nathan Weinberg authored
* refactor: move scipy and sklearn module imports to func imports Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * refactor: consolidate weighted_f1_score func into lm_eval utils Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * lint: allow for utils file to have unused imports this allows for shared functions to be defined only once while allowing for the YAML function importing to continue working Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> --------- Signed-off-by:
Nathan Weinberg <nweinber@redhat.com>
-