- 23 Aug, 2025 1 commit
-
-
Baber Abbasi authored
* update math_verify * remove normalization * use full solution in `parse` * update version
-
- 23 Jul, 2025 1 commit
-
-
Baber Abbasi authored
* remove trust-remote-code * add W605 rule
-
- 25 Feb, 2025 1 commit
-
-
Kailashbuki authored
* Fix the import source for eval_logger * fix logging --------- Co-authored-by:Baber <baber@hey.com>
-
- 21 Feb, 2025 1 commit
-
-
Baber Abbasi authored
* add math_verify to minerva math * add math_verify to benchmark * fix error * increment version
-
- 17 Oct, 2024 1 commit
-
-
Ranger authored
I find out this bug by comparing the code between hendrycks_math and minerva_math.
-
- 31 May, 2024 1 commit
-
-
Clémentine Fourrier authored
* init test 1 * fix * this format seems to be working - need to update all other tasks with the new format * bbh with few shot format * fix fewshot bbh * add mmlu flan cot * samples of cot * kmmlu * fix gsm8k * update keys for mmlu * minerva math * bbh * fix * fix samples * small fixes to templates * last prompt format change * fixing prompt * fixed minerva math format * rm accidental commited file * added doc for few shot samples * Update lm_eval/loggers/evaluation_tracker.py * Update lm_eval/loggers/evaluation_tracker.py * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * added check in sampler per code review * added the system from a function, plus an example in minerva math * style * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix unit tests 1 * forcing use of test split --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 07 May, 2024 1 commit
-
-
Hailey Schoelkopf authored
* add Hendrycks MATH (no sympy checking) variant * add readmes for MATH tasks
-
- 03 Mar, 2024 1 commit
-
-
Vicki Boykis authored
* setting trust_remote_code * dataset list no notebooks * respect trust remote code * Address changes, move cli options and change datasets * fix task for tests * headqa * remove kobest * pin datasets and address comments * clean up space
-
- 26 Feb, 2024 1 commit
-
-
LSinev authored
-
- 01 Feb, 2024 1 commit
-
-
Hailey Schoelkopf authored
* allow tasks to specify printed fewshot val * fix to belebele * update metadata field's documentation
-
- 11 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* fix incorrect lookback protections * bump generate_until task versions
-
- 21 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
* change version field formatting in metadata * mention versioning in new task guide * add instructions for changelog * run linters
-
- 07 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
-
- 28 Nov, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 10 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 17 Oct, 2023 1 commit
-
-
lintangsutawika authored
-
- 22 Sep, 2023 2 commits
-
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
- 21 Sep, 2023 1 commit
-
-
Hailey Schoelkopf authored
-
- 18 Sep, 2023 3 commits
- 14 Sep, 2023 1 commit
-
-
baberabb authored
-