Unverified Commit c8489857 authored by Baber Abbasi's avatar Baber Abbasi Committed by GitHub
Browse files

humaneval instruct (#2650)

* add instruct humaneval

* nit

* add to readme

* nit
parent 7a2ba052
......@@ -8,6 +8,7 @@ We introduce Codex, a GPT language model fine-tuned on publicly available code f
Homepage: https://github.com/openai/human-eval
Note: For instruct tuned models, we recommend the instruct variant. That uses a gen_prefix to ensure the model completes the partial code snippet (might not work with all APIs)
## Citation
```
......@@ -31,6 +32,8 @@ Homepage: https://github.com/openai/human-eval
- `humaneval` pass@1
- `humaneval_64` pass@64 variant
- `humaneval_instruct`: pass@1 with config more appropriate for instruct models. (implementation taken from llama [evals](https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-Instruct-evals/viewer/Llama-3.1-8B-Instruct-evals__human_eval__details?row=0))
- `humaneval_instruct_64`: pass@64 variant
### Checklist
......@@ -44,3 +47,5 @@ If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
### Changelog
include: humaneval_64.yaml
task: humaneval_64_instruct
doc_to_text: "Write a solution to the following problem and make sure that it passes the tests:\n```{{prompt}}"
gen_prefix: "Here is the completed function:\\n```python\\n{{prompt}}\\n"
filter_list:
- name: "create_test"
filter:
- function: "custom"
filter_fn: !function utils.build_predictions_instruct
include: humaneval.yaml
task: humaneval_instruct
doc_to_text: "Write a solution to the following problem and make sure that it passes the tests:\n```{{prompt}}"
gen_prefix: "Here is the completed function:\\n```python\\n{{prompt}}\\n"
filter_list:
- name: "create_test"
filter:
- function: "custom"
filter_fn: !function utils.build_predictions_instruct
......@@ -25,3 +25,15 @@ def pass_at_k(references: list[str], predictions: list[list[str]], k: list[int]
def build_predictions(resps: list[list[str]], docs: list[dict]) -> list[list[str]]:
return [[doc["prompt"] + r for r in resp] for resp, doc in zip(resps, docs)]
def build_predictions_instruct(
resps: list[list[str]], docs: list[dict]
) -> list[list[str]]:
return [
[
doc["prompt"] + (r if r.rfind("```") == -1 else r[: r.rfind("```")])
for r in resp
]
for resp, doc in zip(resps, docs)
]
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment