- 08 Sep, 2025 1 commit
-
-
Baber authored
-
- 25 Aug, 2025 1 commit
-
-
Baber authored
-
- 04 Aug, 2025 7 commits
-
-
Baber authored
-
Baber authored
# Conflicts: # lm_eval/__init__.py # pyproject.toml
-
Baber Abbasi authored
-
parkhs21 authored
* improve include-path precedence handling * test: add task for test * add test for include path precedence handling * Refactor `test_include_path.py` --------- Co-authored-by:Baber <baber@hey.com>
-
Matthias Neumayer authored
The tasks are called without .yaml just the task name
-
Idan Tene authored
* Update humaneval_64_instruct.yaml Sync doc_to_text with humaneval_instruct.yaml * Update humaneval_instruct.yaml Remove redundant (flawed) spaces * Update README.md * Bump task version
-
Felix Michalak authored
* Update continuation group names to fit Readme * added changelog to readme and switched datasets form hails to cais * added missing new line at end of readme
-
- 02 Aug, 2025 1 commit
-
-
Cyrus Leung authored
* Update vLLM compatibility Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> * add TokensPrompt to all generate calls --------- Signed-off-by:
DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by:
Baber <baber@hey.com>
-
- 30 Jul, 2025 1 commit
-
-
Baber authored
-
- 25 Jul, 2025 1 commit
-
-
Baber authored
-
- 24 Jul, 2025 7 commits
-
-
Baber authored
-
Baber authored
# Conflicts: # lm_eval/__main__.py # lm_eval/utils.py
-
Baber authored
# Conflicts: # .pre-commit-config.yaml # lm_eval/api/task.py # lm_eval/models/huggingface.py # lm_eval/models/vllm_causallms.py # pyproject.toml
-
Baber authored
-
Baber Abbasi authored
-
weiliang authored
-
Baber authored
-
- 23 Jul, 2025 11 commits
-
-
Baber Abbasi authored
* remove trust-remote-code * add W605 rule
-
Michael Goin authored
Device has been a deprecated arg for a few releases of vLLM and is now removed in 0.10.0 https://github.com/vllm-project/vllm/pull/21349
-
Baber Abbasi authored
* Fix: pin datasets < 4.0 * fix * update type hints in HF * fix hellaswag path
-
Avelina Asada Hadji-Kyriacou authored
* added support for additional chat template arguments * use `enable_thinking` * add wrap logging function * add `chat_template_args` back to HF --------- Co-authored-by:Baber <baber@hey.com>
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
- 22 Jul, 2025 8 commits
-
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Baber authored
-
Svetlana Karimova authored
* Feat: add LIBRA benchmark * Feat: add dataset filter to LIBRA * Fix: formatting through pre-commit and main tasks README * Fix: resolve conflict * Fix: dataset name to real * Fix: delete unnececcary datasets and correct dependency --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
Geun, Lim authored
* Fix: extended to max_gen_toks 8192 for HRM8K math benchmarks * • Increased max_gen_toks to 2 048 (matches Appendix B of original paper). • Added Evaluation Settings and Changelog sections. * add some logs --------- Co-authored-by:Baber <baber@hey.com>
-
- 21 Jul, 2025 2 commits