- 19 Jul, 2025 3 commits
-
-
Baber Abbasi authored
-
James A. Michaelov authored
* add multiblimp * run linter
-
Avelina Asada Hadji-Kyriacou authored
* Update default.yaml
-
- 18 Jul, 2025 3 commits
-
-
Ramiro R. C. authored
* added headers and custom model name | fixed bug with trust_remote_code param * linting * removed custom model name | changed headers override * add `header` to base TemplateAPI * nit --------- Co-authored-by:Baber <baber@hey.com>
-
mans authored
* fix request hanging when request api * pre commit --------- Co-authored-by:qinyidao <qinyidao@moonshot.cn>
-
Idan Tene authored
* Update utils.py
-
- 16 Jul, 2025 2 commits
-
-
philipdoldo authored
* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway. * feat: remove extra space from answers; add changelog --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
* feat: add postprocessing for generated text to strip stop sequences and thinking tokens * nit * fix: trim leading whitespace after stripping thinking tokens from generation * feat: add think_end_token to model_args * nit * nit * nit * add to readme * nit
-
- 15 Jul, 2025 1 commit
-
-
MaYongQing authored
-
- 14 Jul, 2025 3 commits
-
-
Ankit Gola authored
-
Avelina Asada Hadji-Kyriacou authored
-
Atou Houdaifa authored
* add egy mmlu hellaswag * add egymmlu egyhellaswag to tasks readme * fix egymmlu config generation * fix _generate_configs formating
-
- 10 Jul, 2025 2 commits
-
-
Baber Abbasi authored
-
Baber Abbasi authored
* check for chat for warning * add test * remove yaml extension from some evalita configs * move unitxt to own test script * fix CI test
-
- 06 Jul, 2025 3 commits
-
-
Baber Abbasi authored
-
Baber Abbasi authored
* remove sparse-ml
-
Baber Abbasi authored
-
- 05 Jul, 2025 4 commits
-
-
achervyakov authored
* add image hashing * remove unused params decription * use `LMEVAL_HASHMM` (defualt '1') to save raw images --------- Co-authored-by:Baber <baber@hey.com>
-
Debjyoti Ray authored
* git push --force correctly processes both formats of model_args: string and dictionary both * exctract to function for better test * nit --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
* delete unneeded files
-
Baber Abbasi authored
-
- 04 Jul, 2025 1 commit
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
- 03 Jul, 2025 4 commits
-
-
Ankush authored
* fix(hf-gguf): skip gguf_file if external tokenizer is provided * docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
-
Baber Abbasi authored
* use double quotes
-
Alex Stachowiak authored
* Lazy-load submodules to reduce import time * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Blanca Calvo authored
* truthfulqa-multi task * truthfulqa-multi with chat few-shot * few shot chat implementation * changed until so it outputs lists * changed dataset location * added MT task * Create README.md * do not include MT * changes for PR * tag change * removed yaml extension * adding task to the table * fix task configs * add import exception --------- Co-authored-by:Baber <baber@hey.com>
-
- 30 Jun, 2025 2 commits
-
-
jinze authored
* Fix: Align the Humaneval dataset with official results Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals". (2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one. Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5). Ref: PR#2650 * add changelog and version * add changelog -
Baber Abbasi authored
* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8 . The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type. Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py. Require further modification to restore the functionality provided by the previous commit. * wrap quantization_config in AutoQuantizationConfig * handle quantization config not dict * wrap quantization_config in AutoQuantizationConfig if dict --------- Co-authored-by:
shanhx2000 <hs359@duke.edu>
-
- 25 Jun, 2025 3 commits
-
-
Younes B authored
* add subfolder * lint * change it to empty string * fix typehints --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Jun, 2025 1 commit
-
-
NourFahmy authored
* Fix Anthropic API compatibility issues in chat completions solves two important compatibility issues between the LM Eval Harness and Anthropic's API: 1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included 2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors * pacufy pre-commit * add type --------- Co-authored-by:Baber <baber@hey.com>
-
- 20 Jun, 2025 1 commit
-
-
Anna Fontana authored
"arc_chalenge_chat" doesn't exist: I think it should be "arc_challenge_chat", but this task is not implemented here (see arc task folder).
-
- 19 Jun, 2025 3 commits
-
-
Baber Abbasi authored
-
Maxim Evtush authored
-
Anna Fontana authored
Wrong task name: mmlu_generation doesn't non exist -> mmlu_generative is the correct one
-
- 16 Jun, 2025 2 commits
-
-
Baber Abbasi authored
* fix longbech citation
-
fuder.eth authored
* Update README.md * Update utils_mcq.py
-
- 12 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 08 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* use all answers * use middle truncation * maybe fix classification score * strip classification preds * [vllm] remove stop tokens post-hoc * strip all preds * pacify pre-commit * start on truncation utility * add to readme * add a footgun doc * fix newline in yaml templates * do not strip code_sim preds! * fix pre-commit config * fix instruction warning * add not to longbench readme
-