Unverified Commit a7ca0435 authored by jinze's avatar jinze Committed by GitHub
Browse files

FixBug: Align the Humaneval with official results for Llama-3.1-70B-Instruct (#3092)

* Fix: Align the Humaneval dataset with official results

Details:(1) modified the "doc_to_text" and "gen_prefix" in the "humaneval_instruct.yaml" file to make them the same as the Prompt in "meta-llama/Llama-3.1-70B-Instruct-evals".

(2) Change r.rfind("```") to r.find("```"), so it can locate the first "```", not the last one.

Results: Partially reproduced the official results: The result of LLaMA3.1-8B-Instruct is 66.5 (the official result is 72.6), and the result of LLaMA3.1-70B-Instruct is 80.5 (the official result is 80.5).

Ref: PR#2650

* add changelog and version

* add changelog
parent fea4d11d
......@@ -50,3 +50,5 @@ If other tasks on this dataset are already supported:
### Changelog
v2 20-MAR-2025: `humaneval_instruct`, `humaneval_instruct_64`: fixed typo in gen_prefix
v3 30-JUN-2025: Updated prompt generation and output parsing to align with the official `Llama-3.1-70B-Instruct-evals`. This corrects the prompt format and fixes a bug in locating the code block. See PR [#3092](https://github.com/EleutherAI/lm-evaluation-harness/pull/3092).
include: humaneval.yaml
task: humaneval_instruct
doc_to_text: "Write a solution to the following problem and make sure that it passes the tests:\n```{{prompt}}"
gen_prefix: "Here is the completed function:\n```python\n{{prompt}}\n"
doc_to_text: 'Write a solution to the following problem and make sure that it passes the tests:\n```python\n{{ prompt }}\n```\n '
gen_prefix: 'Here is the completed function:\n```python\n{{ prompt }}\n '
filter_list:
- name: "create_test"
filter:
- function: "custom"
filter_fn: !function utils.build_predictions_instruct
metadata:
version: 2.0
version: 3.0
......@@ -32,7 +32,7 @@ def build_predictions_instruct(
) -> list[list[str]]:
return [
[
doc["prompt"] + (r if r.rfind("```") == -1 else r[: r.rfind("```")])
doc["prompt"] + (r if r.find("```") == -1 else r[: r.find("```")])
for r in resp
]
for resp, doc in zip(resps, docs)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment