edit logiqa

68758761 · lintangsutawika · 223a63c4 · 68758761 · 68758761 · 68758761
Commit 68758761 authored Aug 14, 2023 by lintangsutawika
5 changed files
--- a/lm_eval/tasks/logiqa/README.md
+++ b/lm_eval/tasks/logiqa/README.md
+# LogiQA
+
+### Paper
+
+Title: `LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning`
+
+Abstract: https://arxiv.org/abs/2007.08124
+
+LogiQA is a dataset for testing human logical reasoning. It consists of 8,678 QA
+instances, covering multiple types of deductive reasoning. Results show that state-
+of-the-art neural models perform by far worse than human ceiling. The dataset can
+also serve as a benchmark for reinvestigating logical AI under the deep learning
+NLP setting.
+
+Homepage: https://github.com/lgw863/LogiQA-dataset
+
+
+### Citation
+
+```
+@misc{liu2020logiqa,
+    title={LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning},
+    author={Jian Liu and Leyang Cui and Hanmeng Liu and Dandan Huang and Yile Wang and Yue Zhang},
+    year={2020},
+    eprint={2007.08124},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```
+
+### Groups and Tasks
+
+#### Groups
+
+* Not part of a group yet
+
+#### Tasks
+
+* `logiqa`
+
+### Checklist
+
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+
+
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
--- a/lm_eval/tasks/logiqa/logiqa.yaml
+++ b/lm_eval/tasks/logiqa/logiqa.yaml
-group:
-  - multiple_choice
 task: logiqa
 dataset_path: EleutherAI/logiqa
 dataset_name: logiqa

--- a/lm_eval/tasks/logiqa2/README.md
+++ b/lm_eval/tasks/logiqa2/README.md
@@ -25,15 +25,19 @@ Homepage: https://github.com/csitfun/LogiQA2.0
  doi={10.1109/TASLP.2023.3293046}}
 ```

-### Subtasks
+### Groups and Tasks

-`logiqa2_zh`: The original dataset in Chinese.
+#### Groups

-`logiqa2_NLI`: The NLI version of the dataset converted from the MRC version.
+* Not part of a group yet

-`logieval`: Prompt based; https://github.com/csitfun/LogiEval
+#### Tasks

-The subtasks have not been verified yet.
+* `logiqa2_zh`: The original dataset in Chinese.
+* `logiqa2_NLI`: The NLI version of the dataset converted from the MRC version.
+* `logieval`: Prompt based; https://github.com/csitfun/LogiEval
+
+NOTE! The subtasks have not been verified yet.

 ### Checklist


--- a/lm_eval/tasks/logiqa2/logieval.yaml
+++ b/lm_eval/tasks/logiqa2/logieval.yaml
-group:
-  - greedy_until
 task: logieval
 dataset_path: baber/logiqa2
 dataset_name: logieval

--- a/lm_eval/tasks/logiqa2/logiqa2.yaml
+++ b/lm_eval/tasks/logiqa2/logiqa2.yaml
-group:
-  - multiple_choice
 task: logiqa2
 dataset_path: baber/logiqa2
 dataset_name: logiqa2