Commits · 250a04ec79457da420bb3bb9e99a86eaa642f279 · gaoqiong / lm-evaluation-harness

22 Jul, 2025 1 commit

Fix: extended to max_gen_toks 2048 for HRM8K math benchmarks (#3124) · 250a04ec

Geun, Lim authored Jul 22, 2025



* Fix: extended to max_gen_toks 8192 for HRM8K math benchmarks

* • Increased max_gen_toks to 2 048 (matches Appendix B of original paper).
• Added Evaluation Settings and Changelog sections.

* add some logs

---------
Co-authored-by: Baber <baber@hey.com>

250a04ec

20 Jan, 2025 1 commit

add hrm8k benchmark for both Korean and English (#2627) · a5c344cf

Minho Ryu authored Jan 21, 2025



* add hrm8k benchmark for both Korean and English

* apply precommit

* revise tasks to make models not to directly answer; use zeroshot_cot if possible

* add README

* Add hrm8k on the task-list

---------
Co-authored-by: Baber <baber@hey.com>

a5c344cf