Fix codexglue (#3238)

* Fix codex-glue/code2text group issue * Added README * pacify pre-commit --------- Co-authored-by: Baber <baber@hey.com>

Fix codexglue (#3238)
* Fix codex-glue/code2text group issue * Added README * pacify pre-commit --------- Co-authored-by: Baber <baber@hey.com>
84aa9f95 · Gül Sena A · GitHub · 3a9bcc3f · 84aa9f95 · 84aa9f95
Unverified Commit 84aa9f95 authored Aug 27, 2025 by Gül Sena A Committed by GitHub Aug 27, 2025
9 changed files
--- a/lm_eval/tasks/code_x_glue/code-text/README.md
+++ b/lm_eval/tasks/code_x_glue/code-text/README.md
+# Task-name
+### Paper
+Title: `CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation`
+Abstract: https://arxiv.org/abs/2102.04664
+CodeXGLUE provides benchmark datasets for multiple code understanding and generation tasks, including generating docstrings in natural language from code snippets (code2text).
+### Citation
+```
+@inproceedings{DBLP:conf/nips/LuGRHSBCDJTLZSZ21,
+  author       = {Shuai Lu and
+                  Daya Guo and
+                  Shuo Ren and
+                  Junjie Huang and
+                  Alexey Svyatkovskiy and
+                  Ambrosio Blanco and
+                  Colin B. Clement and
+                  Dawn Drain and
+                  Daxin Jiang and
+                  Duyu Tang and
+                  Ge Li and
+                  Lidong Zhou and
+                  Linjun Shou and
+                  Long Zhou and
+                  Michele Tufano and
+                  Ming Gong and
+                  Ming Zhou and
+                  Nan Duan and
+                  Neel Sundaresan and
+                  Shao Kun Deng and
+                  Shengyu Fu and
+                  Shujie Liu},
+  editor       = {Joaquin Vanschoren and
+                  Sai{-}Kit Yeung},
+  title        = {CodeXGLUE: {A} Machine Learning Benchmark Dataset for Code Understanding
+                  and Generation},
+  booktitle    = {Proceedings of the Neural Information Processing Systems Track on
+                  Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December
+                  2021, virtual},
+  year         = {2021},
+  url          = {https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html},
+  timestamp    = {Thu, 19 Dec 2024 22:07:31 +0100},
+  biburl       = {https://dblp.org/rec/conf/nips/LuGRHSBCDJTLZSZ21.bib},
+  bibsource    = {dblp computer science bibliography, https://dblp.org}
+}
+```
+### Groups and Tasks
+#### Groups
+* code2text
+#### Tasks
+* `code2text_go`: Generate docstring in natural language from Go code snippets.
+* `code2text_java`: Generate docstring in natural language from Java code snippets.
+* `code2text_javascript`: Generate docstring in natural language from JavaScript code snippets.
+* `code2text_php`: Generate docstring in natural language from PHP code snippets.
+* `code2text_python`: Generate docstring in natural language from Python code snippets.
+* `code2text_ruby`: Generate docstring in natural language from Ruby code snippets.
+### Checklist
+For adding novel benchmarks/datasets to the library:
+* [x] Is the task an existing benchmark in the literature?
+  * [x] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/code_x_glue/code-text/_codexglue.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/_codexglue.yaml
+group: code2text
+task:
+  - code2text_go
+  - code2text_java
+  - code2text_javascript
+  - code2text_php
+  - code2text_python
+  - code2text_ruby
+aggregate_metric_list:
+  - aggregation: mean
+    metric: !function bleu.smoothed_bleu_4
+    weight_by_size: true
+metadata:
+  version: 1.0
+# 449326
--- a/lm_eval/tasks/code_x_glue/code-text/_default_template_yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/_default_template_yaml
+training_split: train
+validation_split: validation
+test_split: test
+output_type: generate_until
+generation_kwargs:
+  num_beams: 10
+  max_gen_toks: 128
+  until:
+    - "</s>"
+doc_to_text: !function utils.doc_to_text
+doc_to_target: !function utils.doc_to_target
+metric_list:
+  - metric: !function bleu.smoothed_bleu_4
+    aggregation: mean
+    higher_is_better: True
+metadata:
+  version: 1.0
--- a/lm_eval/tasks/code_x_glue/code-text/go.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/go.yaml
-group:
-  - codexglue_code2text
-task: code2text_go
 dataset_path: CM/codexglue_code2text_go
-training_split: train
+task: code2text_go
-validation_split: validation
+include: _default_template_yaml
-test_split: test
-output_type: generate_until
-generation_kwargs:
-  num_beams: 10
-  max_gen_toks: 128
-  until:
-    - "</s>"
-doc_to_text: !function utils.doc_to_text
-doc_to_target: !function utils.doc_to_target
-metric_list:
-  - metric: !function bleu.smoothed_bleu_4
-    aggregation: mean
-    higher_is_better: True
-metadata:
-  version: 1.0
--- a/lm_eval/tasks/code_x_glue/code-text/java.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/java.yaml
-group:
-  - codexglue_code2text
-task: code2text_java
 dataset_path: CM/codexglue_code2text_java
-training_split: train
+task: code2text_java
-validation_split: validation
+include: _default_template_yaml
-test_split: test
-output_type: generate_until
-generation_kwargs:
-  num_beams: 10
-  max_gen_toks: 128
-  until:
-    - "</s>"
-doc_to_text: !function utils.doc_to_text
-doc_to_target: !function utils.doc_to_target
-metric_list:
-  - metric: !function bleu.smoothed_bleu_4
-    aggregation: mean
-    higher_is_better: True
-metadata:
-  version: 1.0
--- a/lm_eval/tasks/code_x_glue/code-text/javascript.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/javascript.yaml
-group:
-  - codexglue_code2text
-task: code2text_javascript
 dataset_path: CM/codexglue_code2text_javascript
-training_split: train
+task: code2text_javascript
-validation_split: validation
+include: _default_template_yaml
-test_split: test
-output_type: generate_until
-generation_kwargs:
-  num_beams: 10
-  max_gen_toks: 128
-  until:
-    - "</s>"
-doc_to_text: !function utils.doc_to_text
-doc_to_target: !function utils.doc_to_target
-metric_list:
-  - metric: !function bleu.smoothed_bleu_4
-    aggregation: mean
-    higher_is_better: True
-metadata:
-  version: 1.0
--- a/lm_eval/tasks/code_x_glue/code-text/php.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/php.yaml
-group:
-  - codexglue_code2text
-task: code2text_php
 dataset_path: CM/codexglue_code2text_php
-training_split: train
+task: code2text_php
-validation_split: validation
+include: _default_template_yaml
-test_split: test
-output_type: generate_until
-generation_kwargs:
-  num_beams: 10
-  max_gen_toks: 128
-  until:
-    - "</s>"
-doc_to_text: !function utils.doc_to_text
-doc_to_target: !function utils.doc_to_target
-metric_list:
-  - metric: !function bleu.smoothed_bleu_4
-    aggregation: mean
-    higher_is_better: True
-metadata:
-  version: 1.0
--- a/lm_eval/tasks/code_x_glue/code-text/python.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/python.yaml
-group:
-  - codexglue_code2text
-task: code2text_python
 dataset_path: CM/codexglue_code2text_python
-training_split: train
+task: code2text_python
-validation_split: validation
+include: _default_template_yaml
-test_split: test
-output_type: generate_until
-generation_kwargs:
-  num_beams: 10
-  max_gen_toks: 128
-  until:
-    - "</s>"
-doc_to_text: !function utils.doc_to_text
-doc_to_target: !function utils.doc_to_target
-metric_list:
-  - metric: !function bleu.smoothed_bleu_4
-    aggregation: mean
-    higher_is_better: True
-metadata:
-  version: 1.0
--- a/lm_eval/tasks/code_x_glue/code-text/ruby.yaml
+++ b/lm_eval/tasks/code_x_glue/code-text/ruby.yaml
-group:
-  - codexglue_code2text
-task: code2text_ruby
 dataset_path: CM/codexglue_code2text_ruby
-training_split: train
+task: code2text_ruby
-validation_split: validation
+include: _default_template_yaml
-test_split: test
-output_type: generate_until
-generation_kwargs:
-  num_beams: 10
-  max_gen_toks: 128
-  until:
-    - "</s>"
-doc_to_text: !function utils.doc_to_text
-doc_to_target: !function utils.doc_to_target
-metric_list:
-  - metric: !function bleu.smoothed_bleu_4
-    aggregation: mean
-    higher_is_better: True
-metadata:
-  version: 3.0