Unverified Commit d92595b6 authored by Connor-Shen's avatar Connor-Shen Committed by GitHub
Browse files

[Feat] Support TACO (#966)

* [Feat] Support TACO

* update README

* update README
parent c78a4df9
...@@ -24,9 +24,9 @@ print(next(iter(ds))["question"]) ...@@ -24,9 +24,9 @@ print(next(iter(ds))["question"])
## Evaluation results ## Evaluation results
| Dataset | Metric | Baichuan2-7B | Baichuan2-13B | InternLM2-7B | InternLM2-20B | | dataset | metric | Qwen1.5-1.8B | Qwen1.5-7B | Qwen1.5-14B | Qwen1.5-72B | Baichuan2-7B | Baichuan2-13B | InternLM2-7B | InternLM2-20B |
|---------|--------|---------------|----------------|---------------|----------------| |----------------------|--------|---------------|---------------|---------------|---------------|----------------|----------------|---------------|---------------|
| APPS(testset) | pass@1 | 0.0 | 0.06 | 0.0 | 0.0 | | APPS (testset5000) | pass@1 | 0.2 | 1.7 | 5.92 | 7.5 | 0 | 0.06 | 0 | 0 |
Please refer to Table 3 of [code llama](https://scontent-nrt1-2.xx.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=TxT1PKkNBZoAX8zMHbm&_nc_ht=scontent-nrt1-2.xx&oh=00_AfDmmQAPzqX1-QOKIDUV5lGKzaZqt0CZUVtxFjHtnh6ycQ&oe=65F5AF8F) for original results if needed. Please refer to Table 3 of [code llama](https://scontent-nrt1-2.xx.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=TxT1PKkNBZoAX8zMHbm&_nc_ht=scontent-nrt1-2.xx&oh=00_AfDmmQAPzqX1-QOKIDUV5lGKzaZqt0CZUVtxFjHtnh6ycQ&oe=65F5AF8F) for original results if needed.
......
# TACO
## Dataset Description
TACO (Topics in Algorithmic COde generation dataset) is a dataset focused on algorithmic code generation, designed to provide a more challenging training dataset and evaluation benchmark for the code generation model field. The dataset consists of programming competition problems that are more difficult and closer to real programming scenarios. It emphasizes improving or evaluating the model's understanding and reasoning abilities in practical application scenarios, rather than just implementing predefined function functionalities.
* Larger scale: TACO includes a training set (25,443 problems) and a test set (1,000 problems), making it the largest code generation dataset currently available.
* Higher quality: Each problem in the TACO dataset is designed to match a diverse set of solution answers, with answer sizes of up to 1.55M. This ensures that the model is not prone to overfitting during training and validates the effectiveness of evaluation results.
* Fine-grained labels: Each problem in the TACO dataset includes fine-grained labels such as task topics, algorithms, skills, and difficulty levels. These labels provide more accurate references for the training and evaluation of code generation models.
## Dataset Structure
```python
DatasetDict({
train: Dataset({
features: ['question', 'solutions', 'starter_code', 'input_output', 'difficulty', 'raw_tags', 'name', 'source', 'tags', 'skill_types', 'url', 'Expected Auxiliary Space', 'time_limit', 'date', 'picture_num', 'memory_limit', 'Expected Time Complexity'],
num_rows: 25443
})
test: Dataset({
features: ['question', 'solutions', 'starter_code', 'input_output', 'difficulty', 'raw_tags', 'name', 'source', 'tags', 'skill_types', 'url', 'Expected Auxiliary Space', 'time_limit', 'date', 'picture_num', 'memory_limit', 'Expected Time Complexity'],
num_rows: 1000
})
})
```
## How to Use
You can also specify the difficulties (a list choosing from ["EASY", "MEDIUM", "MEDIUM_HARD", "HARD", "VERY_HARD"] or ["ALL"] as default) or skills (a list choosing from ["Data structures", "Sorting", "Range queries", "Complete search", "Amortized analysis", "Dynamic programming", "Bit manipulation", "Greedy algorithms"] or ["ALL"] as default) by passing the list of difficulties or skills as a list.
```python
from datasets import load_dataset
taco_difficulties = load_dataset('BAAI/TACO', difficulties=['EASY'], token=YOUR_HF_TOKEN)
```
```python
from datasets import load_dataset
taco_skills = load_dataset('BAAI/TACO', skills=['Sorting', 'Range queries'], token=YOUR_HF_TOKEN)
```
## Evaluation results
| dataset | metric | Qwen1.5-1.8B | Qwen1.5-7B | Qwen1.5-14B | Qwen1.5-72B | Baichuan2-7B | Baichuan2-13B | InternLM2-7B | InternLM2-20B |
|-----------------------|----------|-------------|-------------|-------------|-------------|--------------|--------------|-------------|-------------|
| TACO | pass@1 | 0.2 | 1 | 2 | 2 | 0.2 | 0.1 | 0.1 | 0.1 |
Please refer to [repo](https://github.com/FlagOpen/TACO/tree/main?tab=readme-ov-file) for original results if needed.
## Citation
```
@article{li2023taco,
title={TACO: Topics in Algorithmic COde generation dataset},
author={Rongao Li and Jie Fu and Bo-Wen Zhang and Tao Huang and Zhihong Sun and Chen Lyu and Guang Liu and Zhi Jin and Ge Li},
journal={arXiv preprint arXiv:2312.14852},
year={2023}
}
```
\ No newline at end of file
from mmengine.config import read_base
with read_base():
from .taco_gen_d82929 import TACO_datasets # noqa: F401, F403
\ No newline at end of file
from opencompass.openicl.icl_prompt_template import PromptTemplate
from opencompass.openicl.icl_retriever import ZeroRetriever
from opencompass.openicl.icl_inferencer import GenInferencer
from opencompass.datasets import TACODataset, TACOEvaluator
TACO_reader_cfg = dict(input_columns=["question", "starter"], output_column="problem_id", train_split='test')
TACO_infer_cfg = dict(
prompt_template=dict(
type=PromptTemplate,
template="\nQUESTION:\n{question} {starter}\nANSWER:\n"),
retriever=dict(type=ZeroRetriever),
inferencer=dict(type=GenInferencer, max_out_len=512),
)
TACO_eval_cfg = dict(evaluator=dict(type=TACOEvaluator), pred_role="BOT")
TACO_datasets = [
dict(
type=TACODataset,
abbr="TACO",
path='BAAI/TACO',
num_repeats = 1,
reader_cfg=TACO_reader_cfg,
infer_cfg=TACO_infer_cfg,
eval_cfg=TACO_eval_cfg,
)
]
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment