- 27 Jun, 2025 3 commits
- 26 Jun, 2025 2 commits
- 25 Jun, 2025 3 commits
-
-
Younes B authored
* add subfolder * lint * change it to empty string * fix typehints --------- Co-authored-by:Baber <baber@hey.com>
-
Baber Abbasi authored
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Jun, 2025 1 commit
-
-
NourFahmy authored
* Fix Anthropic API compatibility issues in chat completions solves two important compatibility issues between the LM Eval Harness and Anthropic's API: 1) The type field issue - Anthropic's Messages API doesn't accept the type field that other APIs might expect, that was previously included 2) The stop sequences issue - Anthropic requires stop sequences to contain non-whitespace characters tested with most recent models from anthopic; claude-sonnet-4-0, claude-opus-4-0, resolved my local api errors * pacufy pre-commit * add type --------- Co-authored-by:Baber <baber@hey.com>
-
- 20 Jun, 2025 1 commit
-
-
Anna Fontana authored
"arc_chalenge_chat" doesn't exist: I think it should be "arc_challenge_chat", but this task is not implemented here (see arc task folder).
-
- 19 Jun, 2025 3 commits
-
-
Baber Abbasi authored
-
Maxim Evtush authored
-
Anna Fontana authored
Wrong task name: mmlu_generation doesn't non exist -> mmlu_generative is the correct one
-
- 16 Jun, 2025 2 commits
-
-
Baber Abbasi authored
* fix longbech citation
-
fuder.eth authored
* Update README.md * Update utils_mcq.py
-
- 12 Jun, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 08 Jun, 2025 1 commit
-
-
Baber Abbasi authored
* use all answers * use middle truncation * maybe fix classification score * strip classification preds * [vllm] remove stop tokens post-hoc * strip all preds * pacify pre-commit * start on truncation utility * add to readme * add a footgun doc * fix newline in yaml templates * do not strip code_sim preds! * fix pre-commit config * fix instruction warning * add not to longbench readme
-
- 03 Jun, 2025 4 commits
-
-
Baber Abbasi authored
-
Baber Abbasi authored
* feat: add mbpp_instruct * fix: update generation_kwargs to use an empty until list * fix: correct predictions formatting in pass_at_1 function * fix: improve code block extraction by checking first without opening backticks * fix mbpp `pass_at_1`
-
Younes B authored
-
Baber Abbasi authored
* fix: bug in acc_mutual_info slicing; add `target_delimiter` to uncond choices * add tests
-
- 02 Jun, 2025 2 commits
-
-
Yury Sulsky authored
-
Ivan Stankevich authored
* chore: clean up and extend .gitignore rules * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
- 26 May, 2025 2 commits
-
-
Boda Sadallah authored
* add arab_culture tasks * add target_delimeter and remove debugging code
-
Baber Abbasi authored
* add data_parallel for V1 * use Process instead of Queue * ray used if V0 DP * better error handling * fix truncation warning comparison
-
- 23 May, 2025 2 commits
-
-
Ameya Godbole authored
* FIX error due to grouping queries with different continuation length Make Collator choose query with the longest continuation as the candidate for generation * use max for key selection * added comments explaining variable cont length (identical ctx+cont[:-1]) --------- Co-authored-by:Baber <baber@hey.com>
-
fxmarty-amd authored
* fix arguments * pacify pre-commit --------- Co-authored-by:Baber <baber@hey.com>
-
- 22 May, 2025 1 commit
-
-
Baber Abbasi authored
changed multimodal check from strict equality
-
- 21 May, 2025 6 commits
-
-
Baber Abbasi authored
This reverts commit 4dbd5ec9
-
achervyakov authored
* first version of image resizing * fixed bug * clean up `resize_image` --------- Co-authored-by:
Artem Safin <artemsafin67@gmail.com> Co-authored-by:
Baber <baber@hey.com>
-
Baber Abbasi authored
* use images with apis * pacify pre-commit
-
Niccolò Ajroldi authored
* fix(output_path): support direct JSON file paths * fix linting * turn off external Lm tests for now * Update help text for `output_path` --------- Co-authored-by:Baber <baber@hey.com>
-
Hongseok Oh authored
-
Rob Geada authored
* Log tokenized request warning only once * Fix logging for concurrent usecase as well
-
- 19 May, 2025 3 commits
-
-
Baber Abbasi authored
-
Baber Abbasi authored
* add `sglang-generate` * nit * nit * nit * pacify pre-commit
-
Harsha authored
* adding ACPBench_hard * adding Clingo * changing tarski to tarski[clingo] * denoting the main variants in each paper
-
- 17 May, 2025 1 commit
-
-
Stella Biderman authored
This function was written years ago when the cost of running an OpenAI model was easy to compute. It is no longer viable to support this.
-
- 15 May, 2025 2 commits
-
-
Baber Abbasi authored
-
Filippo Momentè authored
* fix: pass device arg in model_ar in vllm_causallms * casting device arg to str in vLLM model args
-