- 24 Jul, 2025 7 commits
-
-
Baber authored
-
Baber authored
# Conflicts: # lm_eval/__main__.py # lm_eval/utils.py
-
Baber authored
# Conflicts: # .pre-commit-config.yaml # lm_eval/api/task.py # lm_eval/models/huggingface.py # lm_eval/models/vllm_causallms.py # pyproject.toml
-
Baber authored
-
Baber Abbasi authored
-
weiliang authored
-
Baber authored
-
- 23 Jul, 2025 11 commits
-
-
Baber Abbasi authored
* remove trust-remote-code * add W605 rule
-
Michael Goin authored
Device has been a deprecated arg for a few releases of vLLM and is now removed in 0.10.0 https://github.com/vllm-project/vllm/pull/21349
-
Baber Abbasi authored
* Fix: pin datasets < 4.0 * fix * update type hints in HF * fix hellaswag path
-
Avelina Asada Hadji-Kyriacou authored
* added support for additional chat template arguments * use `enable_thinking` * add wrap logging function * add `chat_template_args` back to HF --------- Co-authored-by:Baber <baber@hey.com>
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
- 22 Jul, 2025 8 commits
-
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Svetlana Karimova authored
* Feat: add LIBRA benchmark * Feat: add dataset filter to LIBRA * Fix: formatting through pre-commit and main tasks README * Fix: resolve conflict * Fix: dataset name to real * Fix: delete unnececcary datasets and correct dependency --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
Geun, Lim authored
* Fix: extended to max_gen_toks 8192 for HRM8K math benchmarks * • Increased max_gen_toks to 2 048 (matches Appendix B of original paper). • Added Evaluation Settings and Changelog sections. * add some logs --------- Co-authored-by:Baber <baber@hey.com>
-
- 21 Jul, 2025 14 commits
-
-
Baber authored
feat: implement check_gold_index_error utility and refactor process_results for improved error handling. remove generate_until multiple-choice
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
# Conflicts: # lm_eval/api/filter.py # lm_eval/api/metrics.py # lm_eval/api/task.py # lm_eval/filters/extraction.py
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-