- 04 Aug, 2025 1 commit
-
-
Baber authored
-
- 07 Jul, 2025 5 commits
- 05 Jul, 2025 15 commits
- 04 Jul, 2025 1 commit
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
- 03 Jul, 2025 4 commits
-
-
Ankush authored
* fix(hf-gguf): skip gguf_file if external tokenizer is provided * docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
-
Baber Abbasi authored
* use double quotes
-
Alex Stachowiak authored
* Lazy-load submodules to reduce import time * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Blanca Calvo authored
* truthfulqa-multi task * truthfulqa-multi with chat few-shot * few shot chat implementation * changed until so it outputs lists * changed dataset location * added MT task * Create README.md * do not include MT * changes for PR * tag change * removed yaml extension * adding task to the table * fix task configs * add import exception --------- Co-authored-by:Baber <baber@hey.com>
-
- 30 Jun, 2025 2 commits
-
-
jinze authored
* Fix: Align the Humaneval dataset with official results Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals". (2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one. Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5). Ref: PR#2650 * add changelog and version * add changelog -
Baber Abbasi authored
* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8 . The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type. Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py. Require further modification to restore the functionality provided by the previous commit. * wrap quantization_config in AutoQuantizationConfig * handle quantization config not dict * wrap quantization_config in AutoQuantizationConfig if dict --------- Co-authored-by:
shanhx2000 <hs359@duke.edu>
-
- 25 Jun, 2025 3 commits
-
-
Younes B authored
* add subfolder * lint * change it to empty string * fix typehints --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Jun, 2025 1 commit
-
-
NourFahmy authored
* Fix Anthropic API compatibility issues in chat completions solves two important compatibility issues between the LM Eval Harness and Anthropic's API: 1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included 2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors * pacufy pre-commit * add type --------- Co-authored-by:Baber <baber@hey.com>
-
- 20 Jun, 2025 1 commit
-
-
Anna Fontana authored
"arc_chalenge_chat" doesn't exist: I think it should be "arc_challenge_chat", but this task is not implemented here (see arc task folder).
-
- 19 Jun, 2025 3 commits
-
-
Baber Abbasi authored
-
Maxim Evtush authored
-
Anna Fontana authored
Wrong task name: mmlu_generation doesn't non exist -> mmlu_generative is the correct one
-
- 16 Jun, 2025 2 commits
-
-
Baber Abbasi authored
* fix longbech citation
-
fuder.eth authored
* Update README.md * Update utils_mcq.py
-
- 12 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 08 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* use all answers * use middle truncation * maybe fix classification score * strip classification preds * [vllm] remove stop tokens post-hoc * strip all preds * pacify pre-commit * start on truncation utility * add to readme * add a footgun doc * fix newline in yaml templates * do not strip code_sim preds! * fix pre-commit config * fix instruction warning * add not to longbench readme
-