- 11 Mar, 2025 7 commits
-
-
Stella Biderman authored
-
Stella Biderman authored
-
PabloAgustin authored
* New healthcare benchmark: careqa * LAUNCH_MN5_ACC <python main.py --config config/mn5.yml --models Llama-3.2-1B-Instruct --tasks careqa_open --num_fewshot 0> * Add fixes, READMES, and remove task_list.txt * pre-commit passed, add formatting updates; add nanmean agg_metric * Fix import error. * Wrapped imports in try excepts * Wrapped imports in try excepts; also metrics to catch bert_score import error * Try except to catch ImportErrors as well * use np.nan * pre-commit --------- Co-authored-by:
PabloAgustin <pablo.martin@bsc.es> Co-authored-by:
Baber <baber@hey.com>
-
Kajetan Dymkiewicz authored
* fix for mc2 calculation * increment versions and changelog --------- Co-authored-by:Baber <baber@hey.com>
-
Yotam Perlitz authored
* Filter new leaderboard_math_hard dataset to "Level 5" only * align to linters Signed-off-by:
Yotam Perlitz <y.perlitz@ibm.com> --------- Signed-off-by:
Yotam Perlitz <y.perlitz@ibm.com>
-
Giulio Lovisotto authored
-
Baber Abbasi authored
-
- 10 Mar, 2025 1 commit
-
-
Rui Vieira authored
-
- 06 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 05 Mar, 2025 3 commits
-
-
Baber Abbasi authored
* bug fix * add warning for instruct models * nit
-
Yongkeun Hwang authored
-
Baber Abbasi authored
-
- 04 Mar, 2025 3 commits
-
-
Baber Abbasi authored
-
Kiersten Stokes authored
* Add a test for a custom unitxt task * Update task.py to bring in line with breaking change in v1.17.2 * Fix lint
-
Lucia Quirke authored
* Enable steering HF models Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu> * increase HF download timeout * Update readme; improve steering vector device handling * Update latest news * remove HF timeout increase * fix tests * ignore sae lens test * fix accidental force push --------- Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>
-
- 03 Mar, 2025 3 commits
-
-
Baber Abbasi authored
-
Harsh Kohli authored
* Fix failing tests * Resolved merge conflicts * pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Jinwei authored
* initial components to support sglang * init of class SGLangLM * draft for generate_until of SGLang model * mock loglikelihood * initial loglikelihood_tokens * todo: fix bug of sglang engine init * implement generation tasks and test * support output type loglikelihood and loglikelihood_rolling (#1) * . * loglikelihood_rolling * / * support dp_size>1 * typo * add tests and clean code * skip tests of sglang for now * fix OOM error of sglang pytest * finish test for sglang * add sglang to readme * fix OOM of tests and clean SGLang model * update readme * clean pyproject and add tests for evaluator * add accuracy tests and it passed locally * add notes for test * Update README.md update readme * pre-commit * add OOM guideline for sglang and fix readme error * fix typo * fix typo * add readme --------- Co-authored-by:
Xiaotong Jiang <xiaotong.jiang@databricks.com> Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
- 27 Feb, 2025 1 commit
-
-
Baber Abbasi authored
* remove ray.remote resources * remove kobtest tag (registered as group)
-
- 26 Feb, 2025 1 commit
-
-
Baber Abbasi authored
-
- 25 Feb, 2025 4 commits
-
-
Jinwei authored
* initial components to support sglang * init of class SGLangLM * draft for generate_until of SGLang model * mock loglikelihood * initial loglikelihood_tokens * todo: fix bug of sglang engine init * implement generation tasks and test * support output type loglikelihood and loglikelihood_rolling (#1) * . * loglikelihood_rolling * / * support dp_size>1 * typo * add tests and clean code * skip tests of sglang for now * fix OOM error of sglang pytest * finish test for sglang * add sglang to readme * fix OOM of tests and clean SGLang model * update readme * clean pyproject and add tests for evaluator * add accuracy tests and it passed locally * add notes for test * Update README.md update readme * pre-commit --------- Co-authored-by:
Xiaotong Jiang <xiaotong.jiang@databricks.com> Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Baber <baber@hey.com>
-
Minho Ryu authored
* add humaneval+ and mbpp+ * add newline at end of file
-
Kailashbuki authored
* Fix the import source for eval_logger * fix logging --------- Co-authored-by:Baber <baber@hey.com>
-
Santiago Galiano Segura authored
Co-authored-by:Robiert Sepulveda Torres <rsepulveda911112@gmail.com>
-
- 24 Feb, 2025 3 commits
-
-
Naiara Perez authored
* add Basque translation of ARC and PAWS to BasqueBench * pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
Jocelyn authored
* add o3-mini support * fix linter tests
-
Naiara Perez authored
Added IberoBench citation info (https://aclanthology.org/2025.coling-main.699/) in correpsonding READMEs (#2729)
-
- 23 Feb, 2025 1 commit
-
-
Baber Abbasi authored
-
- 21 Feb, 2025 3 commits
-
-
Farhan Ahmed authored
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
* add math_verify to minerva math * add math_verify to benchmark * fix error * increment version
-
- 17 Feb, 2025 1 commit
-
-
Baber Abbasi authored
* fix vllm * fix data_parallel * copy to multimodal
-
- 14 Feb, 2025 4 commits
-
-
Baber Abbasi authored
* set target delimiter to empty string * nit * add warning
-
Baber Abbasi authored
-
Irina Proskurina authored
-
Kiersten Stokes authored
-
- 13 Feb, 2025 1 commit
-
-
James A. Michaelov authored
-
- 12 Feb, 2025 2 commits
-
-
achervyakov authored
-
Kiersten Stokes authored
-
- 11 Feb, 2025 1 commit
-
-
Baber Abbasi authored
-