- 20 Jan, 2025 4 commits
-
-
Gyouk Chu authored
* Update KorMedMCQA: ver 2.0 * Fix pre-commit formatting issues * Update KorMedMCQA v2.0 * pre-commit
-
Minho Ryu authored
-
Boda Sadallah authored
* point to the original ArabicMMLU dataset * create the new subtasks files * fix bug when the context filed is empty
-
Minho Ryu authored
* add hrm8k benchmark for both Korean and English * apply precommit * revise tasks to make models not to directly answer; use zeroshot_cot if possible * add README * Add hrm8k on the task-list --------- Co-authored-by:Baber <baber@hey.com>
-
- 19 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update pre-commit
-
- 17 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* switch arg
-
- 15 Jan, 2025 4 commits
-
-
Baber Abbasi authored
* add assistant prefix * add arc_challenge from llama * nit * nit * nit * add assistant prefix * add mmlu_llama * nit * nit * Revert "nit" This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc. * fix regex bug * add assistant_prefix to vllm * add `Question:` * add mmlu_pro * add fewshot assistant_prefix * use `assistant_prefill` * typehints * nits * nits * add to docs * add readme
-
Shivansh Pachnanda authored
* Add MLQA * add mlqa_common_yaml * add 49 tests of mlqa family * update tasks/README.md --------- * fix: mlqa ast error * nit: removed .yaml ext from template_yaml * nit changes: minor modifications generate_tasks.py * deleted lm_eval/tasks/mlqa/mlqa_common_yaml.yaml * tests updated * nit
-
Hojin Lee authored
* add mbpp * fix some bugs * add README for mbpp * update README * nits --------- Co-authored-by:
Hojin Lee <19949034+hjlee1371@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
Hojin Lee authored
* add custom filter * fix type casting of references * add humaneval * fix a bug in humaneval * add greedy version of humaneval * update tasks README * test humaneval * return multiple metrics * nit * add confirmation to run code tasks * nit * nit --------- Co-authored-by:
Hojin Lee <19949034+hjlee1371@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
- 07 Jan, 2025 3 commits
-
-
Wenyang LUO authored
* Fix the format of mgsm zh and ja. * Add change log to mgsm. * Add newline after changelog.
-
Petr Baudis authored
* fix(zeno): Generate unique ids in case of multiple filters * fix(zeno): Report even non-aggregable metrics, just not as metrics * pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
CL-ModelCloud authored
* hf support load gguf file * code review * code review * code clean up * note about use_fast compat with gguf --------- Co-authored-by:Qubitium-ModelCloud <qubitium@modelcloud.ai>
-
- 04 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* remove yaml extension from phraes_va_common * remove yaml extension from winogenerated * remove yaml extension from phrases_es * no cache debug logging when not used
-
- 02 Jan, 2025 1 commit
-
-
Baber Abbasi authored
* update evaluate; update construct requests * update construct requests to handle `apply_chat_template` kwarg
-
- 30 Dec, 2024 1 commit
-
-
Baber Abbasi authored
upgrade transformers and peft in CI
-
- 25 Dec, 2024 1 commit
-
-
Wang, Yi authored
* fix extra_match low if batch_size > 1 Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> * add sorting to logprobs * nit --------- Signed-off-by:
Wang, Yi A <yi.a.wang@intel.com> Co-authored-by:
Baber <baber@hey.com>
-
- 24 Dec, 2024 1 commit
-
-
Firoj Alam, Scientist, QCRI authored
* added aradice * Added ArabicMMLU Lev Configs * added ArabicMMLU egy configs * Added boolq configs * Added cultural bench configs * added openbookqa configs * Added PiQA configs * added winogrande configs * Added truthfulQA configs * Added aradice group config * Remove deleted files from repository * modified arabimmlu configs * modified metadata versions * fixed formatting using ruff * added aradice tasks information * pre-commit * Uptaded openbookqa utils * fixed formatting on obqa --------- Co-authored-by:
Basel Mousi <bmousi@hbku.edu.qa> Co-authored-by:
Baber <baber@hey.com>
-
- 20 Dec, 2024 1 commit
-
-
Sabrina J. Mielke authored
-
- 19 Dec, 2024 2 commits
-
-
Baber Abbasi authored
* add warning for truncation
-
shivalika-singh authored
* add global mmlu lite * add global mmlu lite * fix bugs * add task README.md * Update README.md * Update tasks README.md * Update README.md * update readme --------- Co-authored-by:shivi <shivalikasingh95@gmail.com>
-
- 17 Dec, 2024 2 commits
-
-
Baber Abbasi authored
* feat: drop Python 3.8 support * feat: drop Python 3.8 tests * pre-commit
-
Baber Abbasi authored
forgot to increment 0.4.6!
-
- 16 Dec, 2024 3 commits
-
-
Baber Abbasi authored
* fix `DeprecationWarning: invalid escape sequence '\s'` * add type hints * Revert "add type hints" This reverts commit 15d8abc626a84e97f8c238ddfbf9e243d6f6eb5c.
-
Baber Abbasi authored
* batch all rolling token windows * nit * copy to vllm * fix max_length for `get_rolling_token_windows` * bugfix * bugfix * add type hints
-
Rima Shahbazyan authored
* score readme added * generate until task's "until" parameter's default value fixed. * score mmlu-pro and agieval added * changed macro accuracy to micro for agieval * Always E removed from agi eval * redundancies removed * MATH added * minor cosmetic changes for math * Licenses added Readme updated * changes for flake8 + license header on math * Score added to readme and precommit was run. * Score added to readme and precommit was run. * Import error fixed * math task bugfix postprocess minor fix * CR for math added * math CR * math task bugfix postprocess minor fix CR for math added * Math cr fixed * mmlu_pro non_greedy task added * non greedy summarizer added * Non greedy for all score tasks * Bugfixes for non-greedy * fixing the until argument * undoing the change to "until" arguments default behaviour * minor fix in summarizer * log naming changes for better readability * math subtasks naming fix * agieval subtask naming fix * logging added for debugging * path issue fixed * minor fix * path fix * path fix * non_greedy_math minor fix * final changes * changed readme for non-greedy added Nvidia header added wxample script for non_greedy changed prompts to match that fo trt runs * non greedy summarizer bugfix * non_greedy summarizer fixed
-
- 14 Dec, 2024 1 commit
-
-
Baber Abbasi authored
* make warning prominent * make warning prominent
-
- 13 Dec, 2024 1 commit
-
-
Yao Matrix authored
* initial support for optimum-intel ipex model. LM model as first step * format Signed-off-by:
Yao Matrix <matrix.yao@intel.com> * pass dtype Signed-off-by:
Yao Matrix <matrix.yao@intel.com> * update README Signed-off-by:
Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by:
Yao Matrix <matrix.yao@intel.com>
-
- 09 Dec, 2024 2 commits
-
-
Maanu Grover authored
* update import Signed-off-by:
Maanu Grover <maanug@nvidia.com> * run formatting --------- Signed-off-by:
Maanu Grover <maanug@nvidia.com>
-
Baber Abbasi authored
* left truncate for generate_until * pre-commit
-
- 05 Dec, 2024 1 commit
-
-
fzyzcjy authored
-
- 04 Dec, 2024 3 commits
-
-
Slawomir Strehlke authored
* Handle pipeline_parallel parameter * Add description of pipeline parallelism with OV models
-
Baber Abbasi authored
-
Baber Abbasi authored
* Update README.md add caching tip to readme * Update README.md add api link
-
- 03 Dec, 2024 2 commits
-
-
Trawinski, Dariusz authored
* avoid timeout errors with high concurrency in api_model * style * add timeout * add docs --------- Co-authored-by:Baber <baber@hey.com>
-
Naiara Perez authored
-
- 01 Dec, 2024 1 commit
-
-
Yoav Katz authored
Update Unitxt task to use locally installed unitxt and not download Unitxt code from Huggingface (#2514) * Moved to require unitxt installation and not download unitxt from HF hub. This has performance benefits and simplifies the code. Signed-off-by:
Yoav Katz <katz@il.ibm.com> * Updated watsonx documentation * Updated installation instructions * Removed redundant comman * Allowed unitxt tasks to generate chat APIs Modified WatsonXI model to support chat apis * Removed print * Run precommit formatting --------- Signed-off-by:
Yoav Katz <katz@il.ibm.com>
-
- 30 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* make utility function to handle `until` * fix text
-
- 29 Nov, 2024 1 commit
-
-
Baber Abbasi authored
-
- 28 Nov, 2024 1 commit
-
-
Baber Abbasi authored
* allow !function filters * bugfix * nit * add `filter` to logged samples * add `filter` and `metric` to logged samples to identification * convert `metric` to `metrics`: list
-