- 10 Jul, 2025 3 commits
-
-
Baber authored
-
Baber Abbasi authored
-
Baber Abbasi authored
* check for chat for warning * add test * remove yaml extension from some evalita configs * move unitxt to own test script * fix CI test
-
- 06 Jul, 2025 3 commits
-
-
Baber Abbasi authored
-
Baber Abbasi authored
* remove sparse-ml
-
Baber Abbasi authored
-
- 05 Jul, 2025 4 commits
-
-
achervyakov authored
* add image hashing * remove unused params decription * use `LMEVAL_HASHMM` (defualt '1') to save raw images --------- Co-authored-by:Baber <baber@hey.com>
-
Debjyoti Ray authored
* git push --force correctly processes both formats of model_args: string and dictionary both * exctract to function for better test * nit --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
* delete unneeded files
-
Baber Abbasi authored
-
- 04 Jul, 2025 6 commits
-
-
Neel Gupta authored
* [FIX] Initial code to disable multi-proc for stderr * add docs; align no-mp bootstrap with mp --------- Co-authored-by:Baber <baber@hey.com>
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
- 03 Jul, 2025 18 commits
-
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
# Conflicts: # lm_eval/__main__.py
-
Baber authored
-
Ankush authored
* fix(hf-gguf): skip gguf_file if external tokenizer is provided * docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
-
Baber Abbasi authored
* use double quotes
-
Alex Stachowiak authored
* Lazy-load submodules to reduce import time * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Baber authored
-
Blanca Calvo authored
* truthfulqa-multi task * truthfulqa-multi with chat few-shot * few shot chat implementation * changed until so it outputs lists * changed dataset location * added MT task * Create README.md * do not include MT * changes for PR * tag change * removed yaml extension * adding task to the table * fix task configs * add import exception --------- Co-authored-by:Baber <baber@hey.com>
-
- 30 Jun, 2025 2 commits
-
-
jinze authored
* Fix: Align the Humaneval dataset with official results Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals". (2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one. Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5). Ref: PR#2650 * add changelog and version * add changelog -
Baber Abbasi authored
* Try fixing issue 3026 which is caused by the quantization_config argument introduced in Commit 758c5ed8 . The argument is in Dict type, but for a GPTQ quantized model, it has a conflict with the huggingface interface which expects QuantizationConfigMixin type. Current solution is removing quantization_config argument in HFLM._create_model() of lm_eval/models/huggingface.py. Require further modification to restore the functionality provided by the previous commit. * wrap quantization_config in AutoQuantizationConfig * handle quantization config not dict * wrap quantization_config in AutoQuantizationConfig if dict --------- Co-authored-by:
shanhx2000 <hs359@duke.edu>
-
- 25 Jun, 2025 3 commits
-
-
Younes B authored
* add subfolder * lint * change it to empty string * fix typehints --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Jun, 2025 1 commit
-
-
Baber authored
-