README.md 2.23 KB
Newer Older
Lintang Sutawika's avatar
Lintang Sutawika committed
1
# ARC
Lintang Sutawika's avatar
Lintang Sutawika committed
2

Lintang Sutawika's avatar
Lintang Sutawika committed
3
4
5
6
7
### Paper

Title: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Abstract: https://arxiv.org/abs/1803.05457
8
9
10
11
12
13
14
15
16

The ARC dataset consists of 7,787 science exam questions drawn from a variety
of sources, including science questions provided under license by a research
partner affiliated with AI2. These are text-only, English language exam questions
that span several grade levels as indicated in the files. Each question has a
multiple choice structure (typically 4 answer options). The questions are sorted
into a Challenge Set of 2,590 “hard” questions (those that both a retrieval and
a co-occurrence method fail to answer correctly) and an Easy Set of 5,197 questions.

Lintang Sutawika's avatar
Lintang Sutawika committed
17
18
Homepage: https://allenai.org/data/arc

19
Note: The 0-shot generation variants are based on the LLama 3.2 [implementation](https://huggingface.co/datasets/meta-llama/Llama-3.2-3B-Instruct-evals/viewer/Llama-3.2-3B-Instruct-evals__arc_challenge__details).
Lintang Sutawika's avatar
Lintang Sutawika committed
20
21
22

### Citation

Lintang Sutawika's avatar
Lintang Sutawika committed
23
```
24
25
26
27
28
29
30
@article{Clark2018ThinkYH,
  title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
  author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
  journal={ArXiv},
  year={2018},
  volume={abs/1803.05457}
}
Lintang Sutawika's avatar
Lintang Sutawika committed
31
```
Lintang Sutawika's avatar
Lintang Sutawika committed
32

Lintang Sutawika's avatar
Lintang Sutawika committed
33
### Groups, Tags, and Tasks
lintangsutawika's avatar
lintangsutawika committed
34
35
36

#### Groups

Lintang Sutawika's avatar
Lintang Sutawika committed
37
38
39
40
None.

#### Tags

lintangsutawika's avatar
lintangsutawika committed
41
* `ai2_arc`: Evaluates `arc_easy` and `arc_challenge`
42
* `ai2_arc_generation`: Evaluates `arc_easy_generation` and `arc_challenge_generation`
lintangsutawika's avatar
lintangsutawika committed
43
44

#### Tasks
Lintang Sutawika's avatar
Lintang Sutawika committed
45

Lintang Sutawika's avatar
Lintang Sutawika committed
46
* `arc_easy`
larekrow's avatar
larekrow committed
47
* `arc_challenge`
48
49
* `arc_easy_generation`
* `arc_challenge_generation`
Lintang Sutawika's avatar
Lintang Sutawika committed
50

Lintang Sutawika's avatar
Lintang Sutawika committed
51
52
53
54
55
56
57
58
59
60
61
62
### Checklist

For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
  * [ ] Have you referenced the original paper that introduced the task?
  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?


If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?