[Feat] Support TACO (#966)

* [Feat] Support TACO * update README * update README

[Feat] Support TACO (#966)
* [Feat] Support TACO * update README * update README
d92595b6 · Connor-Shen · GitHub · c78a4df9 · d92595b6 · d92595b6
Unverified Commit d92595b6 authored Mar 19, 2024 by Connor-Shen Committed by GitHub Mar 19, 2024
5 changed files
--- a/configs/datasets/apps/README.md
+++ b/configs/datasets/apps/README.md
@@ -24,9 +24,9 @@ print(next(iter(ds))["question"])
 ## Evaluation results
-| Dataset | Metric | Baichuan2-7B | Baichuan2-13B | InternLM2-7B | InternLM2-20B |
+| dataset             | metric | Qwen1.5-1.8B | Qwen1.5-7B  | Qwen1.5-14B | Qwen1.5-72B | Baichuan2-7B | Baichuan2-13B | InternLM2-7B | InternLM2-20B |
-|---------|--------|---------------|----------------|---------------|----------------|
+|----------------------|--------|---------------|---------------|---------------|---------------|----------------|----------------|---------------|---------------|
-| APPS(testset) | pass@1 | 0.0 | 0.06 | 0.0 | 0.0 |
+| APPS (testset5000)      | pass@1  | 0.2           | 1.7           | 5.92          | 7.5           | 0              | 0.06          | 0             | 0             |
 Please refer to Table 3 of [code llama](https://scontent-nrt1-2.xx.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=TxT1PKkNBZoAX8zMHbm&_nc_ht=scontent-nrt1-2.xx&oh=00_AfDmmQAPzqX1-QOKIDUV5lGKzaZqt0CZUVtxFjHtnh6ycQ&oe=65F5AF8F) for original results if needed. 

--- a/configs/datasets/taco/README.md
+++ b/configs/datasets/taco/README.md
+# TACO
+## Dataset Description
+TACO (Topics in Algorithmic COde generation dataset) is a dataset focused on algorithmic code generation, designed to provide a more challenging training dataset and evaluation benchmark for the code generation model field. The dataset consists of programming competition problems that are more difficult and closer to real programming scenarios. It emphasizes improving or evaluating the model's understanding and reasoning abilities in practical application scenarios, rather than just implementing predefined function functionalities.
+* Larger scale: TACO includes a training set (25,443 problems) and a test set (1,000 problems), making it the largest code generation dataset currently available.
+* Higher quality: Each problem in the TACO dataset is designed to match a diverse set of solution answers, with answer sizes of up to 1.55M. This ensures that the model is not prone to overfitting during training and validates the effectiveness of evaluation results.
+* Fine-grained labels: Each problem in the TACO dataset includes fine-grained labels such as task topics, algorithms, skills, and difficulty levels. These labels provide more accurate references for the training and evaluation of code generation models.
+## Dataset Structure
+```python
+DatasetDict({
+    train: Dataset({
+        features: ['question', 'solutions', 'starter_code', 'input_output', 'difficulty', 'raw_tags', 'name', 'source', 'tags', 'skill_types', 'url', 'Expected Auxiliary Space', 'time_limit', 'date', 'picture_num', 'memory_limit', 'Expected Time Complexity'],
+        num_rows: 25443
+    })
+    test: Dataset({
+        features: ['question', 'solutions', 'starter_code', 'input_output', 'difficulty', 'raw_tags', 'name', 'source', 'tags', 'skill_types', 'url', 'Expected Auxiliary Space', 'time_limit', 'date', 'picture_num', 'memory_limit', 'Expected Time Complexity'],
+        num_rows: 1000
+    })
+})
+```
+## How to Use
+You can also specify the difficulties (a list choosing from ["EASY", "MEDIUM", "MEDIUM_HARD", "HARD", "VERY_HARD"] or ["ALL"] as default) or skills (a list choosing from ["Data structures", "Sorting", "Range queries", "Complete search", "Amortized analysis", "Dynamic programming", "Bit manipulation", "Greedy algorithms"] or ["ALL"] as default) by passing the list of difficulties or skills as a list.
+```python
+from datasets import load_dataset
+taco_difficulties = load_dataset('BAAI/TACO', difficulties=['EASY'], token=YOUR_HF_TOKEN)
+```
+```python
+from datasets import load_dataset
+taco_skills = load_dataset('BAAI/TACO', skills=['Sorting', 'Range queries'], token=YOUR_HF_TOKEN)
+```
+## Evaluation results
+| dataset              | metric   | Qwen1.5-1.8B | Qwen1.5-7B  | Qwen1.5-14B | Qwen1.5-72B | Baichuan2-7B | Baichuan2-13B | InternLM2-7B | InternLM2-20B |
+|-----------------------|----------|-------------|-------------|-------------|-------------|--------------|--------------|-------------|-------------|
+| TACO                   | pass@1   | 0.2         | 1           | 2           | 2           | 0.2          | 0.1          | 0.1         | 0.1         |
+Please refer to [repo](https://github.com/FlagOpen/TACO/tree/main?tab=readme-ov-file) for original results if needed.
+## Citation
+```
+@article{li2023taco,
+  title={TACO: Topics in Algorithmic COde generation dataset},
+  author={Rongao Li and Jie Fu and Bo-Wen Zhang and Tao Huang and Zhihong Sun and Chen Lyu and Guang Liu and Zhi Jin and Ge Li},
+  journal={arXiv preprint arXiv:2312.14852},
+  year={2023}
+}
+```
\ No newline at end of file
--- a/configs/datasets/taco/taco_gen.py
+++ b/configs/datasets/taco/taco_gen.py
+from mmengine.config import read_base
+with read_base():
+    from .taco_gen_d82929 import TACO_datasets  # noqa: F401, F403
\ No newline at end of file
--- a/configs/datasets/taco/taco_gen_d82929.py
+++ b/configs/datasets/taco/taco_gen_d82929.py
+from opencompass.openicl.icl_prompt_template import PromptTemplate
+from opencompass.openicl.icl_retriever import ZeroRetriever
+from opencompass.openicl.icl_inferencer import GenInferencer
+from opencompass.datasets import TACODataset, TACOEvaluator
+TACO_reader_cfg = dict(input_columns=["question", "starter"], output_column="problem_id", train_split='test')
+TACO_infer_cfg = dict(
+    prompt_template=dict(
+        type=PromptTemplate,
+        template="\nQUESTION:\n{question} {starter}\nANSWER:\n"),
+    retriever=dict(type=ZeroRetriever),
+    inferencer=dict(type=GenInferencer, max_out_len=512),
+)
+TACO_eval_cfg = dict(evaluator=dict(type=TACOEvaluator), pred_role="BOT")
+TACO_datasets = [
+    dict(
+        type=TACODataset,
+        abbr="TACO",
+        path='BAAI/TACO',
+        num_repeats = 1,
+        reader_cfg=TACO_reader_cfg,
+        infer_cfg=TACO_infer_cfg,
+        eval_cfg=TACO_eval_cfg,
+    )
+]
--- a/opencompass/datasets/taco.py
+++ b/opencompass/datasets/taco.py