Unverified Commit a2cada5d authored by Jonathan Tow's avatar Jonathan Tow Committed by GitHub
Browse files

Merge pull request #317 from EleutherAI/Mistobaan/add-pre-commit

Add pre-commit
parents 7a038118 83507c4b
[run] [run]
# tasks that aren't wired up. # tasks that aren't wired up.
omit = omit =
lm_eval/tasks/quac.py lm_eval/tasks/quac.py
lm_eval/tasks/storycloze.py lm_eval/tasks/storycloze.py
lm_eval/tasks/cbt.py lm_eval/tasks/cbt.py
...@@ -25,4 +25,4 @@ exclude_lines = ...@@ -25,4 +25,4 @@ exclude_lines =
# Don't complain if tests don't hit defensive assertion code: # Don't complain if tests don't hit defensive assertion code:
raise AssertionError raise AssertionError
raise NotImplementedError raise NotImplementedError
return NotImplemented return NotImplemented
\ No newline at end of file
[flake8]
ignore = E203, E266, E501, W503, F403, F401, C901
max-line-length = 127
max-complexity = 10
select = B,C,E,F,W,T4,B9
name: Pull Request
on: [pull_request]
jobs:
pre-commit:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.8
- uses: pre-commit/action@v2.0.3
...@@ -2,4 +2,4 @@ env ...@@ -2,4 +2,4 @@ env
*.pyc *.pyc
data/ data/
lm_cache lm_cache
.idea .idea
\ No newline at end of file
# Ignore test linting to avoid conflicting changes to version stability.
exclude: ^tests/testdata/
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.1.0
hooks:
- id: check-added-large-files
- id: check-ast
- id: check-byte-order-marker
- id: check-case-conflict
- id: check-json
- id: check-merge-conflict
- id: check-symlinks
- id: check-yaml
- id: destroyed-symlinks
- id: detect-private-key
- id: end-of-file-fixer
- id: no-commit-to-branch
- id: requirements-txt-fixer
- id: trailing-whitespace
- id: fix-byte-order-marker
exclude: docs/CNAME
- id: fix-encoding-pragma
args: [--remove]
- id: mixed-line-ending
args: [--fix=lf]
- repo: https://gitlab.com/pycqa/flake8
rev: 3.7.9
hooks:
- id: flake8
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
language_version: python3.8
- repo: https://github.com/codespell-project/codespell
rev: v2.1.0
hooks:
- id: codespell
exclude: >
(?x)^(
.*\.json|ignore.txt
)$
args: [--check-filenames, --check-hidden, --ignore-words=ignore.txt]
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
![](https://github.com/EleutherAI/lm-evaluation-harness/workflows/Build/badge.svg) ![](https://github.com/EleutherAI/lm-evaluation-harness/workflows/Build/badge.svg)
[![codecov](https://codecov.io/gh/EleutherAI/lm-evaluation-harness/branch/master/graph/badge.svg?token=JSG3O2427J)](https://codecov.io/gh/EleutherAI/lm-evaluation-harness) [![codecov](https://codecov.io/gh/EleutherAI/lm-evaluation-harness/branch/master/graph/badge.svg?token=JSG3O2427J)](https://codecov.io/gh/EleutherAI/lm-evaluation-harness)
## Overview ## Overview
This project provides a unified framework to test autoregressive language models (GPT-2, GPT-3, GPTNeo, etc) on a large number of different evaluation tasks. This project provides a unified framework to test autoregressive language models (GPT-2, GPT-3, GPTNeo, etc) on a large number of different evaluation tasks.
...@@ -403,7 +403,7 @@ the ngram files and info.json. See the above guide for ngram generation for the ...@@ -403,7 +403,7 @@ the ngram files and info.json. See the above guide for ngram generation for the
python main.py \ python main.py \
--model gpt2 \ --model gpt2 \
--device 0 \ --device 0 \
--tasks sciq \ --tasks sciq \
--decontamination_ngrams_path path/containing/training/set/ngrams --decontamination_ngrams_path path/containing/training/set/ngrams
``` ```
...@@ -420,9 +420,9 @@ Both LMs (`lm_eval.models`) and Tasks (`lm_eval.tasks`) are kept in a registry d ...@@ -420,9 +420,9 @@ Both LMs (`lm_eval.models`) and Tasks (`lm_eval.tasks`) are kept in a registry d
The [GPT-3 Evaluations Project](https://github.com/EleutherAI/lm_evaluation_harness/projects/1) tracks our progress implementing new tasks. Right now, we are focused on getting all the datasets loaded so that we can dedupe against the training data. Implementing the actual evaluations is nice but not necessary at the current moment. The [GPT-3 Evaluations Project](https://github.com/EleutherAI/lm_evaluation_harness/projects/1) tracks our progress implementing new tasks. Right now, we are focused on getting all the datasets loaded so that we can dedupe against the training data. Implementing the actual evaluations is nice but not necessary at the current moment.
### Task Versioning ### Task Versioning
To help improve reproducibility, all tasks have a VERSION field. When run from the command line, this is reported in a column in the table, or in the "version" field in the evaluator return dict. The purpose of the version is so that if the task definition changes (i.e to fix a bug), then we can know exactly which metrics were computed using the old buggy implementation to avoid unfair comparisons. To enforce this, there are unit tests that make sure the behavior of all tests remains the same as when they were first implemented. Task versions start at 0, and each time a breaking change is made, the version is incremented by one. To help improve reproducibility, all tasks have a VERSION field. When run from the command line, this is reported in a column in the table, or in the "version" field in the evaluator return dict. The purpose of the version is so that if the task definition changes (i.e to fix a bug), then we can know exactly which metrics were computed using the old buggy implementation to avoid unfair comparisons. To enforce this, there are unit tests that make sure the behavior of all tests remains the same as when they were first implemented. Task versions start at 0, and each time a breaking change is made, the version is incremented by one.
When reporting eval harness results, please also report the version of each task. This can be done either with a separate column in the table, or by reporting the task name with the version appended as such: taskname-v0. When reporting eval harness results, please also report the version of each task. This can be done either with a separate column in the table, or by reporting the task name with the version appended as such: taskname-v0.
......
...@@ -22,14 +22,14 @@ The basis for our decontamination procedure can be found in Appendix C of "Langu ...@@ -22,14 +22,14 @@ The basis for our decontamination procedure can be found in Appendix C of "Langu
## Implementation ## Implementation
Contamination detection can be found in "lm_eval/decontaminate.py" with supporting code in "lm_eval/decontamination/". Contamination detection can be found in "lm_eval/decontaminate.py" with supporting code in "lm_eval/decontamination/".
decontaminate.py does the following: decontaminate.py does the following:
1. Build dictionaries of all ngrams and their corresponding evaluation/document ids. 1. Build dictionaries of all ngrams and their corresponding evaluation/document ids.
2. Scan through sorted files containing training set n-grams. 2. Scan through sorted files containing training set n-grams.
3. If a match is found, the corresponding evaluation/document combinations are marked as contaminated. 3. If a match is found, the corresponding evaluation/document combinations are marked as contaminated.
"lm_eval/evaluator.py" can then produce a clean version of the benchmark by excluding the results of contaminated documents. For each metric, a clean version will be shown in the results with a "decontaminate" suffix. "lm_eval/evaluator.py" can then produce a clean version of the benchmark by excluding the results of contaminated documents. For each metric, a clean version will be shown in the results with a "decontaminate" suffix.
This is disabled by default for new tasks, to support decontamination on a task override the "should_decontaminate" and "doc_to_decontamination_query" methods. For more details see the [task guide](task_guide.md). This is disabled by default for new tasks, to support decontamination on a task override the "should_decontaminate" and "doc_to_decontamination_query" methods. For more details see the [task guide](task_guide.md).
...@@ -73,4 +73,3 @@ python -m scripts/clean_training_data/compress_and_package \ ...@@ -73,4 +73,3 @@ python -m scripts/clean_training_data/compress_and_package \
``` ```
Congratulations, the final directory can now be passed to lm-evaulation-harness with the "--decontamination_ngrams_path" argument. Congratulations, the final directory can now be passed to lm-evaulation-harness with the "--decontamination_ngrams_path" argument.
...@@ -16,7 +16,7 @@ pip install -e ".[dev]" ...@@ -16,7 +16,7 @@ pip install -e ".[dev]"
## Creating Your Task File ## Creating Your Task File
From the `lm-evaluation-harness` project root, copy over the `new_task.py` template to `lm_eval/datasets`. From the `lm-evaluation-harness` project root, copy over the `new_task.py` template to `lm_eval/datasets`.
```sh ```sh
cp templates/new_task.py lm_eval/tasks/<task-name>.py cp templates/new_task.py lm_eval/tasks/<task-name>.py
...@@ -52,7 +52,7 @@ For example, take the QuAC dataset. We have: ...@@ -52,7 +52,7 @@ For example, take the QuAC dataset. We have:
QuAC: Question Answering in Context QuAC: Question Answering in Context
https://arxiv.org/abs/1808.07036 https://arxiv.org/abs/1808.07036
Question Answering in Context (QuAC) is a dataset for modeling, understanding, and Question Answering in Context (QuAC) is a dataset for modeling, understanding, and
participating in information seeking dialog. Data instances consist of an interactive participating in information seeking dialog. Data instances consist of an interactive
dialog between two crowd workers: (1) a student who poses a sequence of freeform dialog between two crowd workers: (1) a student who poses a sequence of freeform
questions to learn as much as possible about a hidden Wikipedia text, and (2) questions to learn as much as possible about a hidden Wikipedia text, and (2)
...@@ -72,7 +72,7 @@ Now let's walk through the actual implementation - from data handling to evaluat ...@@ -72,7 +72,7 @@ Now let's walk through the actual implementation - from data handling to evaluat
### Downloading your Data ### Downloading your Data
All data downloading and management is handled through the HuggingFace (**HF**) [`datasets`](https://github.com/huggingface/datasets) API. So, the first thing you should do is check to see if your task's dataset is already provided in their catalog [here](https://huggingface.co/datasets). If it's not in there, please consider adding it to their Hub to make it accessible to a wider user base by following their [new dataset guide](https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md) All data downloading and management is handled through the HuggingFace (**HF**) [`datasets`](https://github.com/huggingface/datasets) API. So, the first thing you should do is check to see if your task's dataset is already provided in their catalog [here](https://huggingface.co/datasets). If it's not in there, please consider adding it to their Hub to make it accessible to a wider user base by following their [new dataset guide](https://github.com/huggingface/datasets/blob/master/ADD_NEW_DATASET.md)
. .
Now, that you have your HF dataset, you need to assign its path and name to your `Task` in the following fields: Now, that you have your HF dataset, you need to assign its path and name to your `Task` in the following fields:
```python ```python
...@@ -116,7 +116,7 @@ These should return a Python iterable (`list` or `generator`) of `dict`s that ca ...@@ -116,7 +116,7 @@ These should return a Python iterable (`list` or `generator`) of `dict`s that ca
#### Processing Documents #### Processing Documents
At this point, you can also process each individual document to, for example, strip whitespace or "detokenize" its fields. Put the processing logic into `_process_doc` and map the functions across training/validation/test docs inside of the respective functions. At this point, you can also process each individual document to, for example, strip whitespace or "detokenize" its fields. Put the processing logic into `_process_doc` and map the functions across training/validation/test docs inside of the respective functions.
🔠 If your task is **multiple-choice**, we require you to format your documents such that they contain `gold` and `choices` fields. They can also have other fields, but those will be ignored by `MultipleChoiceTask`. `choices` should be a list of possible continuations, and `gold` should be an integer specifying the index of the correct completion. 🔠 If your task is **multiple-choice**, we require you to format your documents such that they contain `gold` and `choices` fields. They can also have other fields, but those will be ignored by `MultipleChoiceTask`. `choices` should be a list of possible continuations, and `gold` should be an integer specifying the index of the correct completion.
See [this task](https://github.com/EleutherAI/lm-evaluation-harness/blob/6caa0afd96a7a7efb2ec4c1f24ad1756e48f3aa7/lm_eval/tasks/sat.py#L60) for an example. 🔠 See [this task](https://github.com/EleutherAI/lm-evaluation-harness/blob/6caa0afd96a7a7efb2ec4c1f24ad1756e48f3aa7/lm_eval/tasks/sat.py#L60) for an example. 🔠
...@@ -154,7 +154,7 @@ Finally, be aware that the strings from `doc_to_text` and `doc_to_target` will b ...@@ -154,7 +154,7 @@ Finally, be aware that the strings from `doc_to_text` and `doc_to_target` will b
### Decontamination ### Decontamination
For background on decontamination please see [this](./decontamination.md). For background on decontamination please see [this](./decontamination.md).
If you wish to support decontamination studies for your task simply override the "should_decontaminate" method and return true. If you wish to support decontamination studies for your task simply override the "should_decontaminate" method and return true.
You also need to override "doc_to_decontamination_query" and return the data you wish to compare against the training set. This doesn't necessarily need to be the full document or request, and we leave this up to the implementor. For a multi-choice evaluation you could for example just return the question. You also need to override "doc_to_decontamination_query" and return the data you wish to compare against the training set. This doesn't necessarily need to be the full document or request, and we leave this up to the implementor. For a multi-choice evaluation you could for example just return the question.
...@@ -172,7 +172,7 @@ python -m scripts.write_out \ ...@@ -172,7 +172,7 @@ python -m scripts.write_out \
--tasks <your-task> \ --tasks <your-task> \
--sets <train | val | test> \ --sets <train | val | test> \
--num_fewshot K \ --num_fewshot K \
--num_examples N \ --num_examples N \
--description_dict_path <path> --description_dict_path <path>
``` ```
...@@ -199,9 +199,9 @@ def construct_requests(self, doc, ctx): ...@@ -199,9 +199,9 @@ def construct_requests(self, doc, ctx):
""" """
return ... return ...
``` ```
#### What's a `Request`? What's a `doc`? #### What's a `Request`? What's a `doc`?
To reiterate, a `doc` is just a `Dict` object that contains information about a document from your corpus. It can contain things like a prompt, question type information, answers and anything else you think will be needed in order to assess your model for a given task. Keep in mind that the fields of this can be basically whatever you want (you can sort this out in `training_docs` \ `validation_docs` \ `test_docs` if you need to customise things - see above), just remember to be consistent with them throughout the rest of the `Task` you write up. To reiterate, a `doc` is just a `Dict` object that contains information about a document from your corpus. It can contain things like a prompt, question type information, answers and anything else you think will be needed in order to assess your model for a given task. Keep in mind that the fields of this can be basically whatever you want (you can sort this out in `training_docs` \ `validation_docs` \ `test_docs` if you need to customise things - see above), just remember to be consistent with them throughout the rest of the `Task` you write up.
A `Request` is an object that takes the text prompt you want to present to a model and computes one of a few different types of response. These are evaluated lazily (meaning, only when the result is actually needed). If your task requires generating text you'll need to return a `rf.greedy_until` request otherwise an `rf.loglikelihood` across all labels in a classification tasks will do. A `Request` is an object that takes the text prompt you want to present to a model and computes one of a few different types of response. These are evaluated lazily (meaning, only when the result is actually needed). If your task requires generating text you'll need to return a `rf.greedy_until` request otherwise an `rf.loglikelihood` across all labels in a classification tasks will do.
The function `construct_requests` can return a list of `Request`s or an iterable; it's perfectly fine to `yield` them from something or other. This is particularly handy if you are creating more than one request per `doc` (usually because you're up to something like multi-task learning). The objects this function returns then get consumed one by one and turned into result objects. The function `construct_requests` can return a list of `Request`s or an iterable; it's perfectly fine to `yield` them from something or other. This is particularly handy if you are creating more than one request per `doc` (usually because you're up to something like multi-task learning). The objects this function returns then get consumed one by one and turned into result objects.
...@@ -232,7 +232,7 @@ def aggregation(self): ...@@ -232,7 +232,7 @@ def aggregation(self):
``` ```
In `process_results`, model outputs are converted into metrics. These metrics are per document metrics, however; the `aggregation` function is used to work out what to do with them to create a corpus-level metric. Imagine you have a bunch of documents, for each of which you have calculated an F1 score. What should that mean overall? Should they be summed, averaged, the min/max found? This function handles that problem. In `process_results`, model outputs are converted into metrics. These metrics are per document metrics, however; the `aggregation` function is used to work out what to do with them to create a corpus-level metric. Imagine you have a bunch of documents, for each of which you have calculated an F1 score. What should that mean overall? Should they be summed, averaged, the min/max found? This function handles that problem.
The contents of the function itself are pretty straightforward; it should simply return a dict that maps from each metric label that could be returned by `process_results` to a function that can be used to aggregate that metric. That is to say, if the metrics that `process_results` could return are given by `{'a', 'b', 'c'}`, then all of these keys should be present in the dict returned by `aggregation`. The contents of the function itself are pretty straightforward; it should simply return a dict that maps from each metric label that could be returned by `process_results` to a function that can be used to aggregate that metric. That is to say, if the metrics that `process_results` could return are given by `{'a', 'b', 'c'}`, then all of these keys should be present in the dict returned by `aggregation`.
__NOTE__: See `lm_eval/metrics.py` for a few "built-in" aggregate metrics you can easily import. The standard metrics available in this package are generally based on `sklearn` functions, so if you are in any doubt for how to set things up the documentation over there can be of assistance. If you need to write a custom metric for some reason, start by looking at the existing ones in `lm_eval/metrics.py` for an idea about what the function signature needs to be. __NOTE__: See `lm_eval/metrics.py` for a few "built-in" aggregate metrics you can easily import. The standard metrics available in this package are generally based on `sklearn` functions, so if you are in any doubt for how to set things up the documentation over there can be of assistance. If you need to write a custom metric for some reason, start by looking at the existing ones in `lm_eval/metrics.py` for an idea about what the function signature needs to be.
```python ```python
...@@ -295,6 +295,11 @@ class TaskName(...): ...@@ -295,6 +295,11 @@ class TaskName(...):
## Submitting your Task ## Submitting your Task
Although we currently do not work behind a specific style guide, we'd appreciate if you tidy up your file/s with the `black` formatter (which should've been install through the `requirements.txt`). Keep things clean…ish 🙂. You can format your changes and perform flake8 standard checks by running the following commands:
```sh
pre-commit install
pre-commit run --all-files
```
Now push your work and make a pull request! Thanks for the contribution 👍. If there are any questions, leave a message in the `#lm-thunderdome` channel on the EAI discord. Now push your work and make a pull request! Thanks for the contribution 👍. If there are any questions, leave a message in the `#lm-thunderdome` channel on the EAI discord.
ROUGE
rouge
nin
...@@ -24,17 +24,17 @@ class LM(abc.ABC): ...@@ -24,17 +24,17 @@ class LM(abc.ABC):
@abstractmethod @abstractmethod
def loglikelihood(self, requests): def loglikelihood(self, requests):
"""Compute log-likelihood of generating a continuation from a context. """Compute log-likelihood of generating a continuation from a context.
Downstream tasks should attempt to use loglikelihood instead of other Downstream tasks should attempt to use loglikelihood instead of other
LM calls whenever possible. LM calls whenever possible.
:param requests: list :param requests: list
A list of pairs (context, continuation) A list of pairs (context, continuation)
context: str context: str
Context string. Implementations of LM must be able to handle an Context string. Implementations of LM must be able to handle an
empty context string. empty context string.
continuation: str continuation: str
The continuation over which log likelihood will be calculated. If The continuation over which log likelihood will be calculated. If
there is a word boundary, the space should be in the continuation. there is a word boundary, the space should be in the continuation.
For example, context="hello" continuation=" world" is correct. For example, context="hello" continuation=" world" is correct.
:return: list :return: list
A list of pairs (logprob, isgreedy) A list of pairs (logprob, isgreedy)
...@@ -51,7 +51,7 @@ class LM(abc.ABC): ...@@ -51,7 +51,7 @@ class LM(abc.ABC):
- We will use the full max context length of the model. - We will use the full max context length of the model.
- For inputs that exceed the max context length, we divide the tokenized string into chunks of up to - For inputs that exceed the max context length, we divide the tokenized string into chunks of up to
the max context length. the max context length.
- IMPORTANT: Each document's loglikelihood/perplexity is computed *separately*, unlike other implementaitons - IMPORTANT: Each document's loglikelihood/perplexity is computed *separately*, unlike other implementations
which may simply concatenate multiple documents together. which may simply concatenate multiple documents together.
- IMPORTANT: We maximize the amount of context for each prediction. Specifically, for inputs that we break into - IMPORTANT: We maximize the amount of context for each prediction. Specifically, for inputs that we break into
multiple chunks, the last input will still a full-sized context. multiple chunks, the last input will still a full-sized context.
...@@ -97,7 +97,7 @@ class LM(abc.ABC): ...@@ -97,7 +97,7 @@ class LM(abc.ABC):
context: str context: str
Context string Context string
until: [str] until: [str]
The string sequences to generate until. These string sequences The string sequences to generate until. These string sequences
may each span across multiple tokens, or may be part of one token. may each span across multiple tokens, or may be part of one token.
:return: list :return: list
A list of strings continuation A list of strings continuation
...@@ -118,7 +118,6 @@ class LM(abc.ABC): ...@@ -118,7 +118,6 @@ class LM(abc.ABC):
class BaseLM(LM): class BaseLM(LM):
@property @property
@abstractmethod @abstractmethod
def eot_token_id(self): def eot_token_id(self):
...@@ -145,13 +144,16 @@ class BaseLM(LM): ...@@ -145,13 +144,16 @@ class BaseLM(LM):
pass pass
@abstractmethod @abstractmethod
def tok_encode(self, string: str): pass def tok_encode(self, string: str):
pass
@abstractmethod @abstractmethod
def tok_decode(self, tokens: Iterable[int]): pass def tok_decode(self, tokens: Iterable[int]):
pass
@abstractmethod @abstractmethod
def _model_generate(self, context, max_length, eos_token_id): pass def _model_generate(self, context, max_length, eos_token_id):
pass
@abstractmethod @abstractmethod
def _model_call(self, inps): def _model_call(self, inps):
...@@ -187,23 +189,30 @@ class BaseLM(LM): ...@@ -187,23 +189,30 @@ class BaseLM(LM):
# TODO: automatic batch size detection for vectorization # TODO: automatic batch size detection for vectorization
loglikelihoods = [] loglikelihoods = []
for string, in tqdm(requests): for (string,) in tqdm(requests):
rolling_token_windows = list(map(utils.make_disjoint_window, utils.get_rolling_token_windows( rolling_token_windows = list(
token_list=self.tok_encode(string), map(
prefix_token=self.eot_token_id, utils.make_disjoint_window,
max_seq_len=self.max_length, utils.get_rolling_token_windows(
context_len=1, token_list=self.tok_encode(string),
))) prefix_token=self.eot_token_id,
max_seq_len=self.max_length,
context_len=1,
),
)
)
rolling_token_windows = [(None,) + x for x in rolling_token_windows] rolling_token_windows = [(None,) + x for x in rolling_token_windows]
# TODO: extract out this call so it only gets called once and also somehow figure out partial caching for # TODO: extract out this call so it only gets called once and also somehow figure out partial caching for
# that # that
string_nll = self._loglikelihood_tokens(rolling_token_windows, disable_tqdm=True) string_nll = self._loglikelihood_tokens(
rolling_token_windows, disable_tqdm=True
)
# discard is_greedy # discard is_greedy
string_nll = [x[0] for x in string_nll] string_nll = [x[0] for x in string_nll]
string_nll = sum(string_nll) string_nll = sum(string_nll)
loglikelihoods.append(string_nll) loglikelihoods.append(string_nll)
...@@ -223,10 +232,12 @@ class BaseLM(LM): ...@@ -223,10 +232,12 @@ class BaseLM(LM):
toks = x[1] + x[2] toks = x[1] + x[2]
return -len(toks), tuple(toks) return -len(toks), tuple(toks)
# TODO: automatic (variable) batch size detection for vectorization # TODO: automatic (variable) batch size detection for vectorization
reord = utils.Reorderer(requests, _collate) re_ord = utils.Reorderer(requests, _collate)
for chunk in utils.chunks(tqdm(reord.get_reordered(), disable=disable_tqdm), self.batch_size): for chunk in utils.chunks(
tqdm(re_ord.get_reordered(), disable=disable_tqdm), self.batch_size
):
inps = [] inps = []
cont_toks_list = [] cont_toks_list = []
inplens = [] inplens = []
...@@ -252,44 +263,60 @@ class BaseLM(LM): ...@@ -252,44 +263,60 @@ class BaseLM(LM):
# when too long to fit in context, truncate from the left # when too long to fit in context, truncate from the left
inp = torch.tensor( inp = torch.tensor(
(context_enc + continuation_enc)[-(self.max_length+1):][:-1], (context_enc + continuation_enc)[-(self.max_length + 1) :][:-1],
dtype=torch.long dtype=torch.long,
).to(self.device) ).to(self.device)
inplen, = inp.shape (inplen,) = inp.shape
cont = continuation_enc cont = continuation_enc
# since in _collate we make sure length is descending, the longest is always the first one. # since in _collate we make sure length is descending, the longest is always the first one.
padding_length = padding_length if padding_length is not None else inplen padding_length = (
padding_length if padding_length is not None else inplen
)
# pad length from seq to padding_length # pad length from seq to padding_length
inp = torch.cat([ inp = torch.cat(
inp, # [seq] [
torch.zeros(padding_length - inplen, dtype=torch.long).to(inp.device) # [padding_length - seq] inp, # [seq]
], dim=0) torch.zeros(padding_length - inplen, dtype=torch.long).to(
inp.device
), # [padding_length - seq]
],
dim=0,
)
inps.append(inp.unsqueeze(0)) # [1, padding_length] inps.append(inp.unsqueeze(0)) # [1, padding_length]
cont_toks_list.append(cont) cont_toks_list.append(cont)
inplens.append(inplen) inplens.append(inplen)
batched_inps = torch.cat(inps, dim=0) # [batch, padding_length batched_inps = torch.cat(inps, dim=0) # [batch, padding_length
multi_logits = F.log_softmax(self._model_call(batched_inps), dim=-1).cpu() # [batch, padding_length, vocab] multi_logits = F.log_softmax(
self._model_call(batched_inps), dim=-1
).cpu() # [batch, padding_length, vocab]
for (cache_key, _, _), logits, inp, inplen, cont_toks \ for (cache_key, _, _), logits, inp, inplen, cont_toks in zip(
in zip(chunk, multi_logits, inps, inplens, cont_toks_list): chunk, multi_logits, inps, inplens, cont_toks_list
):
# Slice to original seq length # Slice to original seq length
contlen = len(cont_toks) contlen = len(cont_toks)
logits = logits[inplen-contlen:inplen].unsqueeze(0) # [1, seq, vocab] logits = logits[inplen - contlen : inplen].unsqueeze(
0
) # [1, seq, vocab]
# Check if per-token argmax is exactly equal to continuation # Check if per-token argmax is exactly equal to continuation
greedy_tokens = logits.argmax(dim=-1) greedy_tokens = logits.argmax(dim=-1)
cont_toks = torch.tensor(cont_toks, dtype=torch.long).unsqueeze(0) # [1, seq] cont_toks = torch.tensor(cont_toks, dtype=torch.long).unsqueeze(
0
) # [1, seq]
max_equal = (greedy_tokens == cont_toks).all() max_equal = (greedy_tokens == cont_toks).all()
# Obtain log-probs at the corresponding continuation token indices # Obtain log-probs at the corresponding continuation token indices
# last_token_slice = logits[:, -1, :].squeeze(0).tolist() # last_token_slice = logits[:, -1, :].squeeze(0).tolist()
logits = torch.gather(logits, 2, cont_toks.unsqueeze(-1)).squeeze(-1) # [1, seq] logits = torch.gather(logits, 2, cont_toks.unsqueeze(-1)).squeeze(
-1
) # [1, seq]
# Answer: (log prob, is-exact-match) # Answer: (log prob, is-exact-match)
answer = (float(logits.sum()), bool(max_equal)) answer = (float(logits.sum()), bool(max_equal))
...@@ -300,10 +327,10 @@ class BaseLM(LM): ...@@ -300,10 +327,10 @@ class BaseLM(LM):
res.append(answer) res.append(answer)
return reord.get_original(res) return re_ord.get_original(res)
def greedy_until(self, requests): def greedy_until(self, requests):
# TODO: implement fully general `until` that handles untils that are # TODO: implement fully general `until` that handles until that are
# multiple tokens or that span multiple tokens correctly # multiple tokens or that span multiple tokens correctly
# TODO: extract to TokenizedLM? # TODO: extract to TokenizedLM?
...@@ -312,30 +339,34 @@ class BaseLM(LM): ...@@ -312,30 +339,34 @@ class BaseLM(LM):
def _collate(x): def _collate(x):
toks = self.tok_encode(x[0]) toks = self.tok_encode(x[0])
return len(toks), x[0] return len(toks), x[0]
reord = utils.Reorderer(requests, _collate)
for context, until in tqdm(reord.get_reordered()): re_ord = utils.Reorderer(requests, _collate)
for context, until in tqdm(re_ord.get_reordered()):
if isinstance(until, str): if isinstance(until, str):
until = [until] until = [until]
primary_until, = self.tok_encode(until[0]) (primary_until,) = self.tok_encode(until[0])
context_enc = torch.tensor([self.tok_encode(context)[self.max_gen_toks - self.max_length:]]).to(self.device)
cont = self._model_generate(context_enc, context_enc.shape[1] + self.max_gen_toks, primary_until) context_enc = torch.tensor(
[self.tok_encode(context)[self.max_gen_toks - self.max_length :]]
).to(self.device)
s = self.tok_decode(cont[0].tolist()[context_enc.shape[1]:]) cont = self._model_generate(
context_enc, context_enc.shape[1] + self.max_gen_toks, primary_until
)
s = self.tok_decode(cont[0].tolist()[context_enc.shape[1] :])
for term in until: for term in until:
s = s.split(term)[0] s = s.split(term)[0]
# partial caching # partial caching
self.cache_hook.add_partial("greedy_until", (context, until), s) self.cache_hook.add_partial("greedy_until", (context, until), s)
res.append(s) res.append(s)
return reord.get_original(res) return re_ord.get_original(res)
class Task(abc.ABC): class Task(abc.ABC):
...@@ -383,7 +414,7 @@ class Task(abc.ABC): ...@@ -383,7 +414,7 @@ class Task(abc.ABC):
self._fewshot_docs = None self._fewshot_docs = None
def download(self, data_dir=None, cache_dir=None, download_mode=None): def download(self, data_dir=None, cache_dir=None, download_mode=None):
""" Downloads and returns the task dataset. """Downloads and returns the task dataset.
Override this method to download the dataset from a custom API. Override this method to download the dataset from a custom API.
:param data_dir: str :param data_dir: str
...@@ -412,7 +443,7 @@ class Task(abc.ABC): ...@@ -412,7 +443,7 @@ class Task(abc.ABC):
name=self.DATASET_NAME, name=self.DATASET_NAME,
data_dir=data_dir, data_dir=data_dir,
cache_dir=cache_dir, cache_dir=cache_dir,
download_mode=download_mode download_mode=download_mode,
) )
def should_decontaminate(self): def should_decontaminate(self):
...@@ -473,8 +504,10 @@ class Task(abc.ABC): ...@@ -473,8 +504,10 @@ class Task(abc.ABC):
return rnd.sample(self._training_docs, k) return rnd.sample(self._training_docs, k)
def doc_to_decontamination_query(self, doc): def doc_to_decontamination_query(self, doc):
print("Override doc_to_decontamination_query with document specific decontamination query.") print(
assert(False) "Override doc_to_decontamination_query with document specific decontamination query."
)
assert False
@abstractmethod @abstractmethod
def doc_to_text(self, doc): def doc_to_text(self, doc):
...@@ -486,22 +519,22 @@ class Task(abc.ABC): ...@@ -486,22 +519,22 @@ class Task(abc.ABC):
@abstractmethod @abstractmethod
def construct_requests(self, doc, ctx): def construct_requests(self, doc, ctx):
""" Uses RequestFactory to construct Requests and returns an iterable of """Uses RequestFactory to construct Requests and returns an iterable of
Requests which will be sent to the LM. Requests which will be sent to the LM.
:param doc: :param doc:
The document as returned from training_docs, validation_docs, or test_docs. The document as returned from training_docs, validation_docs, or test_docs.
:param ctx: str :param ctx: str
The context string, generated by fewshot_context. This includes the natural The context string, generated by fewshot_context. This includes the natural
language description, as well as the few shot examples, and the question language description, as well as the few shot examples, and the question
part of the document for `doc`. part of the document for `doc`.
""" """
pass pass
@abstractmethod @abstractmethod
def process_results(self, doc, results): def process_results(self, doc, results):
"""Take a single document and the LM results and evaluates, returning a """Take a single document and the LM results and evaluates, returning a
dict where keys are the names of submetrics and values are the values of dict where keys are the names of submetrics and values are the values of
the metric for that one document the metric for that one document
:param doc: :param doc:
...@@ -515,7 +548,7 @@ class Task(abc.ABC): ...@@ -515,7 +548,7 @@ class Task(abc.ABC):
def aggregation(self): def aggregation(self):
""" """
:returns: {str: [metric_score] -> float} :returns: {str: [metric_score] -> float}
A dictionary where keys are the names of submetrics and values are A dictionary where keys are the names of submetrics and values are
functions that aggregate a list of metric scores functions that aggregate a list of metric scores
""" """
pass pass
...@@ -524,22 +557,26 @@ class Task(abc.ABC): ...@@ -524,22 +557,26 @@ class Task(abc.ABC):
def higher_is_better(self): def higher_is_better(self):
""" """
:returns: {str: bool} :returns: {str: bool}
A dictionary where keys are the names of submetrics and values are A dictionary where keys are the names of submetrics and values are
whether a higher value of the submetric is better whether a higher value of the submetric is better
""" """
pass pass
def fewshot_description(self): def fewshot_description(self):
import warnings import warnings
warnings.warn( warnings.warn(
"`fewshot_description` will be removed in futures versions. Pass " "`fewshot_description` will be removed in futures versions. Pass "
"any custom descriptions to the `evaluate` function instead.", "any custom descriptions to the `evaluate` function instead.",
DeprecationWarning) DeprecationWarning,
)
return "" return ""
@utils.positional_deprecated @utils.positional_deprecated
def fewshot_context(self, doc, num_fewshot, provide_description=None, rnd=None, description=None): def fewshot_context(
""" Returns a fewshot context string that is made up of a prepended description self, doc, num_fewshot, provide_description=None, rnd=None, description=None
):
"""Returns a fewshot context string that is made up of a prepended description
(if provided), the `num_fewshot` number of examples, and an appended prompt example. (if provided), the `num_fewshot` number of examples, and an appended prompt example.
:param doc: str :param doc: str
...@@ -556,7 +593,9 @@ class Task(abc.ABC): ...@@ -556,7 +593,9 @@ class Task(abc.ABC):
:returns: str :returns: str
The fewshot context. The fewshot context.
""" """
assert rnd is not None, "A `random.Random` generator argument must be provided to `rnd`" assert (
rnd is not None
), "A `random.Random` generator argument must be provided to `rnd`"
assert not provide_description, ( assert not provide_description, (
"The `provide_description` arg will be removed in future versions. To prepend " "The `provide_description` arg will be removed in future versions. To prepend "
"a custom description to the context, supply the corresponding string via the " "a custom description to the context, supply the corresponding string via the "
...@@ -564,7 +603,9 @@ class Task(abc.ABC): ...@@ -564,7 +603,9 @@ class Task(abc.ABC):
) )
if provide_description is not None: if provide_description is not None:
# nudge people to not specify it at all # nudge people to not specify it at all
print("WARNING: provide_description is deprecated and will be removed in a future version in favor of description_dict") print(
"WARNING: provide_description is deprecated and will be removed in a future version in favor of description_dict"
)
description = description + "\n\n" if description else "" description = description + "\n\n" if description else ""
...@@ -577,7 +618,9 @@ class Task(abc.ABC): ...@@ -577,7 +618,9 @@ class Task(abc.ABC):
else: else:
if self._fewshot_docs is None: if self._fewshot_docs is None:
self._fewshot_docs = list( self._fewshot_docs = list(
self.validation_docs() if self.has_validation_docs() else self.test_docs() self.validation_docs()
if self.has_validation_docs()
else self.test_docs()
) )
fewshotex = rnd.sample(self._fewshot_docs, num_fewshot + 1) fewshotex = rnd.sample(self._fewshot_docs, num_fewshot + 1)
...@@ -585,23 +628,27 @@ class Task(abc.ABC): ...@@ -585,23 +628,27 @@ class Task(abc.ABC):
# get rid of the doc that's the one we're evaluating, if it's in the fewshot # get rid of the doc that's the one we're evaluating, if it's in the fewshot
fewshotex = [x for x in fewshotex if x != doc][:num_fewshot] fewshotex = [x for x in fewshotex if x != doc][:num_fewshot]
labeled_examples = "\n\n".join( labeled_examples = (
[self.doc_to_text(doc) + self.doc_to_target(doc) for doc in fewshotex] "\n\n".join(
) + "\n\n" [
self.doc_to_text(doc) + self.doc_to_target(doc)
for doc in fewshotex
]
)
+ "\n\n"
)
example = self.doc_to_text(doc) example = self.doc_to_text(doc)
return description + labeled_examples + example return description + labeled_examples + example
class MultipleChoiceTask(Task): class MultipleChoiceTask(Task):
def doc_to_target(self, doc): def doc_to_target(self, doc):
return " " + doc['choices'][doc['gold']] return " " + doc["choices"][doc["gold"]]
def construct_requests(self, doc, ctx): def construct_requests(self, doc, ctx):
lls = [ lls = [
rf.loglikelihood(ctx, " {}".format(choice))[0] rf.loglikelihood(ctx, " {}".format(choice))[0] for choice in doc["choices"]
for choice in doc['choices']
] ]
return lls return lls
...@@ -609,21 +656,21 @@ class MultipleChoiceTask(Task): ...@@ -609,21 +656,21 @@ class MultipleChoiceTask(Task):
def process_results(self, doc, results): def process_results(self, doc, results):
gold = doc["gold"] gold = doc["gold"]
acc = 1. if np.argmax(results) == gold else 0. acc = 1.0 if np.argmax(results) == gold else 0.0
completion_len = np.array([float(len(i)) for i in doc["choices"]]) completion_len = np.array([float(len(i)) for i in doc["choices"]])
acc_norm = 1. if np.argmax(results / completion_len) == gold else 0. acc_norm = 1.0 if np.argmax(results / completion_len) == gold else 0.0
return { return {
"acc": acc, "acc": acc,
"acc_norm": acc_norm, "acc_norm": acc_norm,
} }
def higher_is_better(self): def higher_is_better(self):
return { return {
"acc": True, "acc": True,
"acc_norm": True, "acc_norm": True,
} }
def aggregation(self): def aggregation(self):
return { return {
"acc": mean, "acc": mean,
...@@ -632,7 +679,6 @@ class MultipleChoiceTask(Task): ...@@ -632,7 +679,6 @@ class MultipleChoiceTask(Task):
class PerplexityTask(Task, abc.ABC): class PerplexityTask(Task, abc.ABC):
def should_decontaminate(self): def should_decontaminate(self):
"""Whether this task supports decontamination against model training set.""" """Whether this task supports decontamination against model training set."""
return True return True
...@@ -644,9 +690,15 @@ class PerplexityTask(Task, abc.ABC): ...@@ -644,9 +690,15 @@ class PerplexityTask(Task, abc.ABC):
assert k == 0 assert k == 0
return [] return []
def fewshot_context(self, doc, num_fewshot, provide_description=None, rnd=None, description=None): def fewshot_context(
assert num_fewshot == 0, "The number of fewshot examples must be 0 for perplexity tasks." self, doc, num_fewshot, provide_description=None, rnd=None, description=None
assert rnd is not None, "A `random.Random` generator argument must be provided to `rnd`." ):
assert (
num_fewshot == 0
), "The number of fewshot examples must be 0 for perplexity tasks."
assert (
rnd is not None
), "A `random.Random` generator argument must be provided to `rnd`."
assert not provide_description, ( assert not provide_description, (
"The `provide_description` arg will be removed in future versions. To prepend " "The `provide_description` arg will be removed in future versions. To prepend "
"a custom description to the context, supply the corresponding string via the " "a custom description to the context, supply the corresponding string via the "
...@@ -654,7 +706,9 @@ class PerplexityTask(Task, abc.ABC): ...@@ -654,7 +706,9 @@ class PerplexityTask(Task, abc.ABC):
) )
if provide_description is not None: if provide_description is not None:
# nudge people to not specify it at all # nudge people to not specify it at all
print("WARNING: provide_description is deprecated and will be removed in a future version in favor of description_dict") print(
"WARNING: provide_description is deprecated and will be removed in a future version in favor of description_dict"
)
return "" return ""
...@@ -680,7 +734,7 @@ class PerplexityTask(Task, abc.ABC): ...@@ -680,7 +734,7 @@ class PerplexityTask(Task, abc.ABC):
return req return req
def process_results(self, doc, results): def process_results(self, doc, results):
loglikelihood, = results (loglikelihood,) = results
words = self.count_words(doc) words = self.count_words(doc)
bytes_ = self.count_bytes(doc) bytes_ = self.count_bytes(doc)
return { return {
...@@ -702,23 +756,23 @@ class PerplexityTask(Task, abc.ABC): ...@@ -702,23 +756,23 @@ class PerplexityTask(Task, abc.ABC):
@classmethod @classmethod
def count_words(cls, doc): def count_words(cls, doc):
""" Downstream tasks with custom word boundaries should override this! """ """Downstream tasks with custom word boundaries should override this!"""
return len(re.split(r"\s+", doc)) return len(re.split(r"\s+", doc))
def hash_args(attr, args): def hash_args(attr, args):
dat = json.dumps([attr] + list(args)) dat = json.dumps([attr] + list(args))
return hashlib.sha256(dat.encode('utf-8')).hexdigest() return hashlib.sha256(dat.encode("utf-8")).hexdigest()
class CacheHook: class CacheHook:
def __init__(self, cachinglm): def __init__(self, cachinglm):
if cachinglm is None: if cachinglm is None:
self.dbdict = None self.dbdict = None
return return
self.dbdict = cachinglm.dbdict self.dbdict = cachinglm.dbdict
def add_partial(self, attr, req, res): def add_partial(self, attr, req, res):
if self.dbdict is None: if self.dbdict is None:
return return
...@@ -748,7 +802,7 @@ class CachingLM: ...@@ -748,7 +802,7 @@ class CachingLM:
def fn(requests): def fn(requests):
res = [] res = []
remaining_reqs = [] remaining_reqs = []
# figure out which ones are cached and which ones are new # figure out which ones are cached and which ones are new
for req in requests: for req in requests:
hsh = hash_args(attr, req) hsh = hash_args(attr, req)
...@@ -761,7 +815,7 @@ class CachingLM: ...@@ -761,7 +815,7 @@ class CachingLM:
else: else:
res.append(None) res.append(None)
remaining_reqs.append(req) remaining_reqs.append(req)
# actually run the LM on the requests that do not have cached results # actually run the LM on the requests that do not have cached results
rem_res = getattr(self.lm, attr)(remaining_reqs) rem_res = getattr(self.lm, attr)(remaining_reqs)
...@@ -779,41 +833,48 @@ class CachingLM: ...@@ -779,41 +833,48 @@ class CachingLM:
self.dbdict.commit() self.dbdict.commit()
return res return res
return fn return fn
def get_cache_hook(self): def get_cache_hook(self):
return CacheHook(self) return CacheHook(self)
REQUEST_RETURN_LENGTHS = { REQUEST_RETURN_LENGTHS = {
'loglikelihood': 2, "loglikelihood": 2,
'greedy_until': None, "greedy_until": None,
'loglikelihood_rolling': None, "loglikelihood_rolling": None,
} }
class Request: class Request:
def __init__(self, request_type, args, index=None): def __init__(self, request_type, args, index=None):
if request_type not in REQUEST_RETURN_LENGTHS.keys(): if request_type not in REQUEST_RETURN_LENGTHS.keys():
raise NotImplementedError('The request type {} is not implemented!'.format(request_type)) raise NotImplementedError(
"The request type {} is not implemented!".format(request_type)
)
self.request_type = request_type self.request_type = request_type
self.args = args self.args = args
self.index = index self.index = index
def __iter__(self): def __iter__(self):
if REQUEST_RETURN_LENGTHS[self.request_type] is None: if REQUEST_RETURN_LENGTHS[self.request_type] is None:
raise IndexError('This request type does not return multiple arguments!') raise IndexError("This request type does not return multiple arguments!")
for i in range(REQUEST_RETURN_LENGTHS[self.request_type]): for i in range(REQUEST_RETURN_LENGTHS[self.request_type]):
yield Request(self.request_type, self.args, i) yield Request(self.request_type, self.args, i)
def __getitem__(self, i): def __getitem__(self, i):
if REQUEST_RETURN_LENGTHS[self.request_type] is None: if REQUEST_RETURN_LENGTHS[self.request_type] is None:
raise IndexError('This request type does not return multiple arguments!') raise IndexError("This request type does not return multiple arguments!")
return Request(self.request_type, self.args, i) return Request(self.request_type, self.args, i)
def __eq__(self, other): def __eq__(self, other):
return self.request_type == other.request_type and self.args == other.args and self.index == other.index return (
self.request_type == other.request_type
and self.args == other.args
and self.index == other.index
)
def __repr__(self): def __repr__(self):
return f"Req_{self.request_type}{self.args}[{self.index}]\n" return f"Req_{self.request_type}{self.args}[{self.index}]\n"
...@@ -823,6 +884,7 @@ class RequestFactory: ...@@ -823,6 +884,7 @@ class RequestFactory:
def __getattr__(self, attr): def __getattr__(self, attr):
def fn(*args): def fn(*args):
return Request(attr, args) return Request(attr, args)
return fn return fn
......
...@@ -2,5 +2,5 @@ ...@@ -2,5 +2,5 @@
This directory contains custom EleutherAI datasets not available in the HuggingFace `datasets` hub. This directory contains custom EleutherAI datasets not available in the HuggingFace `datasets` hub.
In the rare case that you need to add a custom dataset to this collection, follow the In the rare case that you need to add a custom dataset to this collection, follow the
HuggingFace `datasets` guide found [here](https://huggingface.co/docs/datasets/dataset_script). HuggingFace `datasets` guide found [here](https://huggingface.co/docs/datasets/dataset_script).
\ No newline at end of file
...@@ -68,61 +68,111 @@ class Arithmetic(datasets.GeneratorBasedBuilder): ...@@ -68,61 +68,111 @@ class Arithmetic(datasets.GeneratorBasedBuilder):
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_2da", name="arithmetic_2da",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_addition.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_addition.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="2-digit addition", description="2-digit addition",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_2ds", name="arithmetic_2ds",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_subtraction.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_subtraction.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="2-digit subtraction", description="2-digit subtraction",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_3da", name="arithmetic_3da",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_addition.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_addition.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="3-digit addition", description="3-digit addition",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_3ds", name="arithmetic_3ds",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_subtraction.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_subtraction.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="3-digit subtraction", description="3-digit subtraction",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_4da", name="arithmetic_4da",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_addition.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_addition.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="4-digit addition", description="4-digit addition",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_4ds", name="arithmetic_4ds",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_subtraction.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_subtraction.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="4-digit subtraction", description="4-digit subtraction",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_5da", name="arithmetic_5da",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_addition.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_addition.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="5-digit addition", description="5-digit addition",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_5ds", name="arithmetic_5ds",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_subtraction.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_subtraction.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="5-digit subtraction", description="5-digit subtraction",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_2dm", name="arithmetic_2dm",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_multiplication.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_multiplication.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="2-digit multiplication", description="2-digit multiplication",
), ),
ArithmeticConfig( ArithmeticConfig(
name="arithmetic_1dc", name="arithmetic_1dc",
url="https://raw.githubusercontent.com/openai/gpt-3/master/data/single_digit_three_ops.jsonl", url="https://raw.githubusercontent.com/openai/gpt-3/master/data/single_digit_three_ops.jsonl",
features=datasets.Features({"context": datasets.Value("string"), "completion": datasets.Value("string")}), features=datasets.Features(
{
"context": datasets.Value("string"),
"completion": datasets.Value("string"),
}
),
description="Single digit 3 operations", description="Single digit 3 operations",
), ),
] ]
...@@ -155,9 +205,12 @@ class Arithmetic(datasets.GeneratorBasedBuilder): ...@@ -155,9 +205,12 @@ class Arithmetic(datasets.GeneratorBasedBuilder):
with open(filepath, encoding="utf-8") as f: with open(filepath, encoding="utf-8") as f:
for key, row in enumerate(f): for key, row in enumerate(f):
data = json.loads(row) data = json.loads(row)
context = data['context'].strip() \ context = (
.replace('\n\n', '\n') \ data["context"]
.replace('Q:', 'Question:') \ .strip()
.replace('A:', 'Answer:') .replace("\n\n", "\n")
completion = data['completion'] .replace("Q:", "Question:")
yield key, {'context': context, 'completion': completion} .replace("A:", "Answer:")
)
completion = data["completion"]
yield key, {"context": context, "completion": completion}
{"arithmetic_2da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n2-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_2da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 96624, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_addition.jsonl": {"num_bytes": 138624, "checksum": "75a54b7a3db3b23369df74fe440c23025f3d3c51f664300bd3d56632b2617b3d"}}, "download_size": 138624, "post_processing_size": null, "dataset_size": 96624, "size_in_bytes": 235248}, "arithmetic_2ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n2-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_2ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 98216, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_subtraction.jsonl": {"num_bytes": 140216, "checksum": "da956066ff108c00b341d360567472784f5fd872d6465071b44a14291205bc03"}}, "download_size": 140216, "post_processing_size": null, "dataset_size": 98216, "size_in_bytes": 238432}, "arithmetic_3da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n3-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_3da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 102612, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_addition.jsonl": {"num_bytes": 144612, "checksum": "124865e30efd2abfbc1855dd34c218fc02d32d780ace970ab9b4ea3fa74c798b"}}, "download_size": 144612, "post_processing_size": null, "dataset_size": 102612, "size_in_bytes": 247224}, "arithmetic_3ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n3-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_3ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 104150, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_subtraction.jsonl": {"num_bytes": 146150, "checksum": "7fc6aaedcb0e2bd17c398dd4147c5585b1e608278a8e98b914e69656707d6a29"}}, "download_size": 146150, "post_processing_size": null, "dataset_size": 104150, "size_in_bytes": 250300}, "arithmetic_4da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n4-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_4da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 108570, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_addition.jsonl": {"num_bytes": 150570, "checksum": "459c6f75baa2e8d7cf50bdd07db6d0ca9133a6b137d95d09267db85b6e07f391"}}, "download_size": 150570, "post_processing_size": null, "dataset_size": 108570, "size_in_bytes": 259140}, "arithmetic_4ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n4-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_4ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 110150, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_subtraction.jsonl": {"num_bytes": 152150, "checksum": "0c47db40a10c052ef0cf732a9ef2edaa53d66377d43eb47a9c382d33a8af7102"}}, "download_size": 152150, "post_processing_size": null, "dataset_size": 110150, "size_in_bytes": 262300}, "arithmetic_5da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n5-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_5da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 114476, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_addition.jsonl": {"num_bytes": 156476, "checksum": "30ada42efe315b958c6e9649274005d3b720e50298e92c3a2d321f8996e58f54"}}, "download_size": 156476, "post_processing_size": null, "dataset_size": 114476, "size_in_bytes": 270952}, "arithmetic_5ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n5-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_5ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 116119, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_subtraction.jsonl": {"num_bytes": 158119, "checksum": "8b98ccfc943cbf9193bcf1984954aa0b1a4527016072d972a2b055cc1482ca3c"}}, "download_size": 158119, "post_processing_size": null, "dataset_size": 116119, "size_in_bytes": 274238}, "arithmetic_2dm": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n2-digit multiplication", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_2dm", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 100685, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_multiplication.jsonl": {"num_bytes": 142685, "checksum": "5613d1d1cc3b2c03edc1990252247d34c10ec82944b2cdeb19e71b00f237f431"}}, "download_size": 142685, "post_processing_size": null, "dataset_size": 100685, "size_in_bytes": 243370}, "arithmetic_1dc": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\nSingle digit 3 operations", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_1dc", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 97651, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/single_digit_three_ops.jsonl": {"num_bytes": 139651, "checksum": "08b34e3272a8ff1d4932d63f251519d14c485c38d582366e1e323d0b859c3925"}}, "download_size": 139651, "post_processing_size": null, "dataset_size": 97651, "size_in_bytes": 237302}} {"arithmetic_2da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n2-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_2da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 96624, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_addition.jsonl": {"num_bytes": 138624, "checksum": "75a54b7a3db3b23369df74fe440c23025f3d3c51f664300bd3d56632b2617b3d"}}, "download_size": 138624, "post_processing_size": null, "dataset_size": 96624, "size_in_bytes": 235248}, "arithmetic_2ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n2-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_2ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 98216, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_subtraction.jsonl": {"num_bytes": 140216, "checksum": "da956066ff108c00b341d360567472784f5fd872d6465071b44a14291205bc03"}}, "download_size": 140216, "post_processing_size": null, "dataset_size": 98216, "size_in_bytes": 238432}, "arithmetic_3da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n3-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_3da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 102612, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_addition.jsonl": {"num_bytes": 144612, "checksum": "124865e30efd2abfbc1855dd34c218fc02d32d780ace970ab9b4ea3fa74c798b"}}, "download_size": 144612, "post_processing_size": null, "dataset_size": 102612, "size_in_bytes": 247224}, "arithmetic_3ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n3-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_3ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 104150, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/three_digit_subtraction.jsonl": {"num_bytes": 146150, "checksum": "7fc6aaedcb0e2bd17c398dd4147c5585b1e608278a8e98b914e69656707d6a29"}}, "download_size": 146150, "post_processing_size": null, "dataset_size": 104150, "size_in_bytes": 250300}, "arithmetic_4da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n4-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_4da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 108570, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_addition.jsonl": {"num_bytes": 150570, "checksum": "459c6f75baa2e8d7cf50bdd07db6d0ca9133a6b137d95d09267db85b6e07f391"}}, "download_size": 150570, "post_processing_size": null, "dataset_size": 108570, "size_in_bytes": 259140}, "arithmetic_4ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n4-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_4ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 110150, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/four_digit_subtraction.jsonl": {"num_bytes": 152150, "checksum": "0c47db40a10c052ef0cf732a9ef2edaa53d66377d43eb47a9c382d33a8af7102"}}, "download_size": 152150, "post_processing_size": null, "dataset_size": 110150, "size_in_bytes": 262300}, "arithmetic_5da": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n5-digit addition", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_5da", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 114476, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_addition.jsonl": {"num_bytes": 156476, "checksum": "30ada42efe315b958c6e9649274005d3b720e50298e92c3a2d321f8996e58f54"}}, "download_size": 156476, "post_processing_size": null, "dataset_size": 114476, "size_in_bytes": 270952}, "arithmetic_5ds": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n5-digit subtraction", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_5ds", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 116119, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/five_digit_subtraction.jsonl": {"num_bytes": 158119, "checksum": "8b98ccfc943cbf9193bcf1984954aa0b1a4527016072d972a2b055cc1482ca3c"}}, "download_size": 158119, "post_processing_size": null, "dataset_size": 116119, "size_in_bytes": 274238}, "arithmetic_2dm": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\n2-digit multiplication", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_2dm", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 100685, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/two_digit_multiplication.jsonl": {"num_bytes": 142685, "checksum": "5613d1d1cc3b2c03edc1990252247d34c10ec82944b2cdeb19e71b00f237f431"}}, "download_size": 142685, "post_processing_size": null, "dataset_size": 100685, "size_in_bytes": 243370}, "arithmetic_1dc": {"description": "A small battery of 10 tests that involve asking language models a simple arithmetic\nproblem in natural language.\n\nSingle digit 3 operations", "citation": "@inproceedings{NEURIPS2020_1457c0d6,\n author = {Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winter, Clemens and Hesse, Chris and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},\n booktitle = {Advances in Neural Information Processing Systems},\n editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},\n pages = {1877--1901},\n publisher = {Curran Associates, Inc.},\n title = {Language Models are Few-Shot Learners},\n url = {https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf},\n volume = {33},\n year = {2020}\n}\n", "homepage": "https://github.com/openai/gpt-3/tree/master/data", "license": "", "features": {"context": {"dtype": "string", "id": null, "_type": "Value"}, "completion": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "arithmetic", "config_name": "arithmetic_1dc", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 97651, "num_examples": 2000, "dataset_name": "arithmetic"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/gpt-3/master/data/single_digit_three_ops.jsonl": {"num_bytes": 139651, "checksum": "08b34e3272a8ff1d4932d63f251519d14c485c38d582366e1e323d0b859c3925"}}, "download_size": 139651, "post_processing_size": null, "dataset_size": 97651, "size_in_bytes": 237302}}
\ No newline at end of file
...@@ -50,13 +50,16 @@ _URLS = "https://github.com/chaochun/nlu-asdiv-dataset/archive/55790e5270bb91ccf ...@@ -50,13 +50,16 @@ _URLS = "https://github.com/chaochun/nlu-asdiv-dataset/archive/55790e5270bb91ccf
class ASDiv(datasets.GeneratorBasedBuilder): class ASDiv(datasets.GeneratorBasedBuilder):
""" ASDiv: A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers """ """ASDiv: A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers"""
VERSION = datasets.Version("0.0.1") VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [ BUILDER_CONFIGS = [
datasets.BuilderConfig(name="asdiv", version=VERSION, datasets.BuilderConfig(
description="A diverse corpus for evaluating and developing english math word problem solvers") name="asdiv",
version=VERSION,
description="A diverse corpus for evaluating and developing english math word problem solvers",
)
] ]
def _info(self): def _info(self):
...@@ -86,7 +89,9 @@ class ASDiv(datasets.GeneratorBasedBuilder): ...@@ -86,7 +89,9 @@ class ASDiv(datasets.GeneratorBasedBuilder):
name=datasets.Split.VALIDATION, name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples # These kwargs will be passed to _generate_examples
gen_kwargs={ gen_kwargs={
"filepath": os.path.join(data_dir, base_filepath, "dataset", "ASDiv.xml"), "filepath": os.path.join(
data_dir, base_filepath, "dataset", "ASDiv.xml"
),
"split": datasets.Split.VALIDATION, "split": datasets.Split.VALIDATION,
}, },
), ),
......
{"asdiv": {"description": "ASDiv (Academia Sinica Diverse MWP Dataset) is a diverse (in terms of both language\npatterns and problem types) English math word problem (MWP) corpus for evaluating\nthe capability of various MWP solvers. Existing MWP corpora for studying AI progress\nremain limited either in language usage patterns or in problem types. We thus present\na new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem\ntypes taught in elementary school. Each MWP is annotated with its problem type and grade\nlevel (for indicating the level of difficulty).\n", "citation": "@misc{miao2021diverse,\n title={A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers},\n author={Shen-Yun Miao and Chao-Chun Liang and Keh-Yih Su},\n year={2021},\n eprint={2106.15772},\n archivePrefix={arXiv},\n primaryClass={cs.AI}\n}\n", "homepage": "https://github.com/chaochun/nlu-asdiv-dataset", "license": "", "features": {"body": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "solution_type": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}, "formula": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "as_div", "config_name": "asdiv", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 501489, "num_examples": 2305, "dataset_name": "as_div"}}, "download_checksums": {"https://github.com/chaochun/nlu-asdiv-dataset/archive/55790e5270bb91ccfa5053194b25732534696b50.zip": {"num_bytes": 440966, "checksum": "8f1fe4f6d5f170ec1e24ab78c244153c14c568b1bb2b1dad0324e71f37939a2d"}}, "download_size": 440966, "post_processing_size": null, "dataset_size": 501489, "size_in_bytes": 942455}} {"asdiv": {"description": "ASDiv (Academia Sinica Diverse MWP Dataset) is a diverse (in terms of both language\npatterns and problem types) English math word problem (MWP) corpus for evaluating\nthe capability of various MWP solvers. Existing MWP corpora for studying AI progress\nremain limited either in language usage patterns or in problem types. We thus present\na new English MWP corpus with 2,305 MWPs that cover more text patterns and most problem\ntypes taught in elementary school. Each MWP is annotated with its problem type and grade\nlevel (for indicating the level of difficulty).\n", "citation": "@misc{miao2021diverse,\n title={A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers},\n author={Shen-Yun Miao and Chao-Chun Liang and Keh-Yih Su},\n year={2021},\n eprint={2106.15772},\n archivePrefix={arXiv},\n primaryClass={cs.AI}\n}\n", "homepage": "https://github.com/chaochun/nlu-asdiv-dataset", "license": "", "features": {"body": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "solution_type": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}, "formula": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "as_div", "config_name": "asdiv", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"validation": {"name": "validation", "num_bytes": 501489, "num_examples": 2305, "dataset_name": "as_div"}}, "download_checksums": {"https://github.com/chaochun/nlu-asdiv-dataset/archive/55790e5270bb91ccfa5053194b25732534696b50.zip": {"num_bytes": 440966, "checksum": "8f1fe4f6d5f170ec1e24ab78c244153c14c568b1bb2b1dad0324e71f37939a2d"}}, "download_size": 440966, "post_processing_size": null, "dataset_size": 501489, "size_in_bytes": 942455}}
\ No newline at end of file
...@@ -61,7 +61,7 @@ _EMPTY_ADDITIONAL_ANSWER = { ...@@ -61,7 +61,7 @@ _EMPTY_ADDITIONAL_ANSWER = {
"span_end": -1, "span_end": -1,
"span_text": "", "span_text": "",
"input_text": "", "input_text": "",
"turn_id": -1 "turn_id": -1,
} }
], ],
"1": [ "1": [
...@@ -70,7 +70,7 @@ _EMPTY_ADDITIONAL_ANSWER = { ...@@ -70,7 +70,7 @@ _EMPTY_ADDITIONAL_ANSWER = {
"span_end": -1, "span_end": -1,
"span_text": "", "span_text": "",
"input_text": "", "input_text": "",
"turn_id": -1 "turn_id": -1,
} }
], ],
"2": [ "2": [
...@@ -79,7 +79,7 @@ _EMPTY_ADDITIONAL_ANSWER = { ...@@ -79,7 +79,7 @@ _EMPTY_ADDITIONAL_ANSWER = {
"span_end": -1, "span_end": -1,
"span_text": "", "span_text": "",
"input_text": "", "input_text": "",
"turn_id": -1 "turn_id": -1,
} }
], ],
} }
...@@ -91,8 +91,9 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -91,8 +91,9 @@ class Coqa(datasets.GeneratorBasedBuilder):
VERSION = datasets.Version("0.0.1") VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [ BUILDER_CONFIGS = [
datasets.BuilderConfig(name="coqa", version=VERSION, datasets.BuilderConfig(
description="The CoQA dataset."), name="coqa", version=VERSION, description="The CoQA dataset."
),
] ]
def _info(self): def _info(self):
...@@ -101,41 +102,52 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -101,41 +102,52 @@ class Coqa(datasets.GeneratorBasedBuilder):
"id": datasets.Value("string"), "id": datasets.Value("string"),
"source": datasets.Value("string"), "source": datasets.Value("string"),
"story": datasets.Value("string"), "story": datasets.Value("string"),
"questions": datasets.features.Sequence({ "questions": datasets.features.Sequence(
"input_text": datasets.Value("string"), {
"turn_id": datasets.Value("int32"),
}),
"answers": datasets.features.Sequence({
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}),
"additional_answers": {
"0": datasets.features.Sequence({
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}),
"1": datasets.features.Sequence({
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"), "input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"), "turn_id": datasets.Value("int32"),
}), }
"2": datasets.features.Sequence({ ),
"answers": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"), "span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"), "span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"), "span_text": datasets.Value("string"),
"input_text": datasets.Value("string"), "input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"), "turn_id": datasets.Value("int32"),
}), }
} ),
}) "additional_answers": {
"0": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
"1": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
"2": datasets.features.Sequence(
{
"span_start": datasets.Value("int32"),
"span_end": datasets.Value("int32"),
"span_text": datasets.Value("string"),
"input_text": datasets.Value("string"),
"turn_id": datasets.Value("int32"),
}
),
},
}
)
return datasets.DatasetInfo( return datasets.DatasetInfo(
description=_DESCRIPTION, description=_DESCRIPTION,
features=features, features=features,
...@@ -175,10 +187,7 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -175,10 +187,7 @@ class Coqa(datasets.GeneratorBasedBuilder):
source = row["source"] source = row["source"]
story = row["story"] story = row["story"]
questions = [ questions = [
{ {"input_text": q["input_text"], "turn_id": q["turn_id"]}
"input_text": q["input_text"],
"turn_id": q["turn_id"]
}
for q in row["questions"] for q in row["questions"]
] ]
answers = [ answers = [
...@@ -187,7 +196,7 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -187,7 +196,7 @@ class Coqa(datasets.GeneratorBasedBuilder):
"span_end": a["span_end"], "span_end": a["span_end"],
"span_text": a["span_text"], "span_text": a["span_text"],
"input_text": a["input_text"], "input_text": a["input_text"],
"turn_id": a["turn_id"] "turn_id": a["turn_id"],
} }
for a in row["answers"] for a in row["answers"]
] ]
...@@ -201,7 +210,7 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -201,7 +210,7 @@ class Coqa(datasets.GeneratorBasedBuilder):
"span_end": a0["span_end"], "span_end": a0["span_end"],
"span_text": a0["span_text"], "span_text": a0["span_text"],
"input_text": a0["input_text"], "input_text": a0["input_text"],
"turn_id": a0["turn_id"] "turn_id": a0["turn_id"],
} }
for a0 in row["additional_answers"]["0"] for a0 in row["additional_answers"]["0"]
], ],
...@@ -211,7 +220,7 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -211,7 +220,7 @@ class Coqa(datasets.GeneratorBasedBuilder):
"span_end": a1["span_end"], "span_end": a1["span_end"],
"span_text": a1["span_text"], "span_text": a1["span_text"],
"input_text": a1["input_text"], "input_text": a1["input_text"],
"turn_id": a1["turn_id"] "turn_id": a1["turn_id"],
} }
for a1 in row["additional_answers"]["1"] for a1 in row["additional_answers"]["1"]
], ],
...@@ -221,7 +230,7 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -221,7 +230,7 @@ class Coqa(datasets.GeneratorBasedBuilder):
"span_end": a2["span_end"], "span_end": a2["span_end"],
"span_text": a2["span_text"], "span_text": a2["span_text"],
"input_text": a2["input_text"], "input_text": a2["input_text"],
"turn_id": a2["turn_id"] "turn_id": a2["turn_id"],
} }
for a2 in row["additional_answers"]["2"] for a2 in row["additional_answers"]["2"]
], ],
...@@ -232,5 +241,5 @@ class Coqa(datasets.GeneratorBasedBuilder): ...@@ -232,5 +241,5 @@ class Coqa(datasets.GeneratorBasedBuilder):
"source": source, "source": source,
"questions": questions, "questions": questions,
"answers": answers, "answers": answers,
"additional_answers": additional_answers "additional_answers": additional_answers,
} }
{"coqa": {"description": "CoQA is a large-scale dataset for building Conversational Question Answering\nsystems. The goal of the CoQA challenge is to measure the ability of machines to\nunderstand a text passage and answer a series of interconnected questions that\nappear in a conversation.\n", "citation": "@misc{reddy2018coqa,\n title={CoQA: A Conversational Question Answering Challenge},\n author={Siva Reddy and Danqi Chen and Christopher D. Manning},\n year={2018},\n eprint={1808.07042},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://stanfordnlp.github.io/coqa/", "license": "", "features": {"id": {"dtype": "string", "id": null, "_type": "Value"}, "source": {"dtype": "string", "id": null, "_type": "Value"}, "story": {"dtype": "string", "id": null, "_type": "Value"}, "questions": {"feature": {"input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "answers": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "additional_answers": {"0": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "1": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "2": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "coqa", "config_name": "coqa", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 26250528, "num_examples": 7199, "dataset_name": "coqa"}, "validation": {"name": "validation", "num_bytes": 3765933, "num_examples": 500, "dataset_name": "coqa"}}, "download_checksums": {"https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json": {"num_bytes": 49001836, "checksum": "b0fdb2bc1bd38dd3ca2ce5fa2ac3e02c6288ac914f241ac409a655ffb6619fa6"}, "https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json": {"num_bytes": 9090845, "checksum": "dfa367a9733ce53222918d0231d9b3bedc2b8ee831a2845f62dfc70701f2540a"}}, "download_size": 58092681, "post_processing_size": null, "dataset_size": 30016461, "size_in_bytes": 88109142}} {"coqa": {"description": "CoQA is a large-scale dataset for building Conversational Question Answering\nsystems. The goal of the CoQA challenge is to measure the ability of machines to\nunderstand a text passage and answer a series of interconnected questions that\nappear in a conversation.\n", "citation": "@misc{reddy2018coqa,\n title={CoQA: A Conversational Question Answering Challenge},\n author={Siva Reddy and Danqi Chen and Christopher D. Manning},\n year={2018},\n eprint={1808.07042},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://stanfordnlp.github.io/coqa/", "license": "", "features": {"id": {"dtype": "string", "id": null, "_type": "Value"}, "source": {"dtype": "string", "id": null, "_type": "Value"}, "story": {"dtype": "string", "id": null, "_type": "Value"}, "questions": {"feature": {"input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "answers": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "additional_answers": {"0": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "1": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}, "2": {"feature": {"span_start": {"dtype": "int32", "id": null, "_type": "Value"}, "span_end": {"dtype": "int32", "id": null, "_type": "Value"}, "span_text": {"dtype": "string", "id": null, "_type": "Value"}, "input_text": {"dtype": "string", "id": null, "_type": "Value"}, "turn_id": {"dtype": "int32", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "coqa", "config_name": "coqa", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 26250528, "num_examples": 7199, "dataset_name": "coqa"}, "validation": {"name": "validation", "num_bytes": 3765933, "num_examples": 500, "dataset_name": "coqa"}}, "download_checksums": {"https://nlp.stanford.edu/data/coqa/coqa-train-v1.0.json": {"num_bytes": 49001836, "checksum": "b0fdb2bc1bd38dd3ca2ce5fa2ac3e02c6288ac914f241ac409a655ffb6619fa6"}, "https://nlp.stanford.edu/data/coqa/coqa-dev-v1.0.json": {"num_bytes": 9090845, "checksum": "dfa367a9733ce53222918d0231d9b3bedc2b8ee831a2845f62dfc70701f2540a"}}, "download_size": 58092681, "post_processing_size": null, "dataset_size": 30016461, "size_in_bytes": 88109142}}
\ No newline at end of file
{"drop": {"description": "DROP is a QA dataset which tests comprehensive understanding of paragraphs. In \nthis crowdsourced, adversarially-created, 96k question-answering benchmark, a \nsystem must resolve multiple references in a question, map them onto a paragraph,\nand perform discrete operations over them (such as addition, counting, or sorting).\n", "citation": "@misc{dua2019drop,\n title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs}, \n author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},\n year={2019},\n eprint={1903.00161},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://allenai.org/data/drop", "license": "", "features": {"section_id": {"dtype": "string", "id": null, "_type": "Value"}, "passage": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "query_id": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"number": {"dtype": "string", "id": null, "_type": "Value"}, "date": {"day": {"dtype": "string", "id": null, "_type": "Value"}, "month": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}}, "spans": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "worker_id": {"dtype": "string", "id": null, "_type": "Value"}, "hit_id": {"dtype": "string", "id": null, "_type": "Value"}}, "validated_answers": {"feature": {"number": {"dtype": "string", "id": null, "_type": "Value"}, "date": {"day": {"dtype": "string", "id": null, "_type": "Value"}, "month": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}}, "spans": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "worker_id": {"dtype": "string", "id": null, "_type": "Value"}, "hit_id": {"dtype": "string", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "drop", "config_name": "drop", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 108858121, "num_examples": 77409, "dataset_name": "drop"}, "validation": {"name": "validation", "num_bytes": 12560739, "num_examples": 9536, "dataset_name": "drop"}}, "download_checksums": {"https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip": {"num_bytes": 8308692, "checksum": "39d2278a29fd729de301b111a45f434c24834f40df8f4ff116d864589e3249d6"}}, "download_size": 8308692, "post_processing_size": null, "dataset_size": 121418860, "size_in_bytes": 129727552}} {"drop": {"description": "DROP is a QA dataset which tests comprehensive understanding of paragraphs. In \nthis crowdsourced, adversarially-created, 96k question-answering benchmark, a \nsystem must resolve multiple references in a question, map them onto a paragraph,\nand perform discrete operations over them (such as addition, counting, or sorting).\n", "citation": "@misc{dua2019drop,\n title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs}, \n author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},\n year={2019},\n eprint={1903.00161},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n", "homepage": "https://allenai.org/data/drop", "license": "", "features": {"section_id": {"dtype": "string", "id": null, "_type": "Value"}, "passage": {"dtype": "string", "id": null, "_type": "Value"}, "question": {"dtype": "string", "id": null, "_type": "Value"}, "query_id": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"number": {"dtype": "string", "id": null, "_type": "Value"}, "date": {"day": {"dtype": "string", "id": null, "_type": "Value"}, "month": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}}, "spans": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "worker_id": {"dtype": "string", "id": null, "_type": "Value"}, "hit_id": {"dtype": "string", "id": null, "_type": "Value"}}, "validated_answers": {"feature": {"number": {"dtype": "string", "id": null, "_type": "Value"}, "date": {"day": {"dtype": "string", "id": null, "_type": "Value"}, "month": {"dtype": "string", "id": null, "_type": "Value"}, "year": {"dtype": "string", "id": null, "_type": "Value"}}, "spans": {"feature": {"dtype": "string", "id": null, "_type": "Value"}, "length": -1, "id": null, "_type": "Sequence"}, "worker_id": {"dtype": "string", "id": null, "_type": "Value"}, "hit_id": {"dtype": "string", "id": null, "_type": "Value"}}, "length": -1, "id": null, "_type": "Sequence"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "drop", "config_name": "drop", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 108858121, "num_examples": 77409, "dataset_name": "drop"}, "validation": {"name": "validation", "num_bytes": 12560739, "num_examples": 9536, "dataset_name": "drop"}}, "download_checksums": {"https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip": {"num_bytes": 8308692, "checksum": "39d2278a29fd729de301b111a45f434c24834f40df8f4ff116d864589e3249d6"}}, "download_size": 8308692, "post_processing_size": null, "dataset_size": 121418860, "size_in_bytes": 129727552}}
\ No newline at end of file
...@@ -12,7 +12,7 @@ ...@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
# #
# Custom DROP dataet that, unlike HF, keeps all question-answer pairs # Custom DROP dataset that, unlike HF, keeps all question-answer pairs
# even if there are multiple types of answers for the same question. # even if there are multiple types of answers for the same question.
"""DROP dataset.""" """DROP dataset."""
...@@ -25,7 +25,7 @@ import datasets ...@@ -25,7 +25,7 @@ import datasets
_CITATION = """\ _CITATION = """\
@misc{dua2019drop, @misc{dua2019drop,
title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs}, title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs},
author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner}, author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
year={2019}, year={2019},
eprint={1903.00161}, eprint={1903.00161},
...@@ -35,8 +35,8 @@ _CITATION = """\ ...@@ -35,8 +35,8 @@ _CITATION = """\
""" """
_DESCRIPTION = """\ _DESCRIPTION = """\
DROP is a QA dataset which tests comprehensive understanding of paragraphs. In DROP is a QA dataset which tests comprehensive understanding of paragraphs. In
this crowdsourced, adversarially-created, 96k question-answering benchmark, a this crowdsourced, adversarially-created, 96k question-answering benchmark, a
system must resolve multiple references in a question, map them onto a paragraph, system must resolve multiple references in a question, map them onto a paragraph,
and perform discrete operations over them (such as addition, counting, or sorting). and perform discrete operations over them (such as addition, counting, or sorting).
""" """
...@@ -50,17 +50,19 @@ _URLS = { ...@@ -50,17 +50,19 @@ _URLS = {
"drop": "https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip", "drop": "https://s3-us-west-2.amazonaws.com/allennlp/datasets/drop/drop_dataset.zip",
} }
_EMPTY_VALIDATED_ANSWER = [{ _EMPTY_VALIDATED_ANSWER = [
"number": "", {
"date": { "number": "",
"day": "", "date": {
"month": "", "day": "",
"year": "", "month": "",
}, "year": "",
"spans": [], },
"worker_id": "", "spans": [],
"hit_id": "" "worker_id": "",
}] "hit_id": "",
}
]
class Drop(datasets.GeneratorBasedBuilder): class Drop(datasets.GeneratorBasedBuilder):
...@@ -69,39 +71,44 @@ class Drop(datasets.GeneratorBasedBuilder): ...@@ -69,39 +71,44 @@ class Drop(datasets.GeneratorBasedBuilder):
VERSION = datasets.Version("0.0.1") VERSION = datasets.Version("0.0.1")
BUILDER_CONFIGS = [ BUILDER_CONFIGS = [
datasets.BuilderConfig(name="drop", version=VERSION, datasets.BuilderConfig(
description="The DROP dataset."), name="drop", version=VERSION, description="The DROP dataset."
),
] ]
def _info(self): def _info(self):
features = datasets.Features({ features = datasets.Features(
"section_id": datasets.Value("string"), {
"passage": datasets.Value("string"), "section_id": datasets.Value("string"),
"question": datasets.Value("string"), "passage": datasets.Value("string"),
"query_id": datasets.Value("string"), "question": datasets.Value("string"),
"answer": { "query_id": datasets.Value("string"),
"number": datasets.Value("string"), "answer": {
"date": { "number": datasets.Value("string"),
"day": datasets.Value("string"), "date": {
"month": datasets.Value("string"), "day": datasets.Value("string"),
"year": datasets.Value("string"), "month": datasets.Value("string"),
"year": datasets.Value("string"),
},
"spans": datasets.features.Sequence(datasets.Value("string")),
"worker_id": datasets.Value("string"),
"hit_id": datasets.Value("string"),
}, },
"spans": datasets.features.Sequence(datasets.Value("string")), "validated_answers": datasets.features.Sequence(
"worker_id": datasets.Value("string"), {
"hit_id": datasets.Value("string"), "number": datasets.Value("string"),
}, "date": {
"validated_answers": datasets.features.Sequence({ "day": datasets.Value("string"),
"number": datasets.Value("string"), "month": datasets.Value("string"),
"date": { "year": datasets.Value("string"),
"day": datasets.Value("string"), },
"month": datasets.Value("string"), "spans": datasets.features.Sequence(datasets.Value("string")),
"year": datasets.Value("string"), "worker_id": datasets.Value("string"),
}, "hit_id": datasets.Value("string"),
"spans": datasets.features.Sequence(datasets.Value("string")), }
"worker_id": datasets.Value("string"), ),
"hit_id": datasets.Value("string"), }
}), )
})
return datasets.DatasetInfo( return datasets.DatasetInfo(
description=_DESCRIPTION, description=_DESCRIPTION,
features=features, features=features,
...@@ -118,7 +125,9 @@ class Drop(datasets.GeneratorBasedBuilder): ...@@ -118,7 +125,9 @@ class Drop(datasets.GeneratorBasedBuilder):
name=datasets.Split.TRAIN, name=datasets.Split.TRAIN,
# These kwargs will be passed to _generate_examples # These kwargs will be passed to _generate_examples
gen_kwargs={ gen_kwargs={
"filepath": os.path.join(data_dir, "drop_dataset", "drop_dataset_train.json"), "filepath": os.path.join(
data_dir, "drop_dataset", "drop_dataset_train.json"
),
"split": "train", "split": "train",
}, },
), ),
...@@ -126,7 +135,9 @@ class Drop(datasets.GeneratorBasedBuilder): ...@@ -126,7 +135,9 @@ class Drop(datasets.GeneratorBasedBuilder):
name=datasets.Split.VALIDATION, name=datasets.Split.VALIDATION,
# These kwargs will be passed to _generate_examples # These kwargs will be passed to _generate_examples
gen_kwargs={ gen_kwargs={
"filepath": os.path.join(data_dir, "drop_dataset", "drop_dataset_dev.json"), "filepath": os.path.join(
data_dir, "drop_dataset", "drop_dataset_dev.json"
),
"split": "validation", "split": "validation",
}, },
), ),
......
{"gsm8k": {"description": "State-of-the-art language models can match human performance on many tasks, but \nthey still struggle to robustly perform multi-step mathematical reasoning. To \ndiagnose the failures of current models and support research, we introduce GSM8K,\na dataset of 8.5K high quality linguistically diverse grade school math word problems.\nWe find that even the largest transformer models fail to achieve high test performance, \ndespite the conceptual simplicity of this problem distribution.\n", "citation": "@misc{cobbe2021training,\n title={Training Verifiers to Solve Math Word Problems},\n author={Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman},\n year={2021},\n eprint={2110.14168},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n", "homepage": "https://github.com/openai/grade-school-math", "license": "", "features": {"question": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "gsm8_k", "config_name": "gsm8k", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 3963202, "num_examples": 7473, "dataset_name": "gsm8_k"}, "test": {"name": "test", "num_bytes": 713732, "num_examples": 1319, "dataset_name": "gsm8_k"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/train.jsonl": {"num_bytes": 4166206, "checksum": "17f347dc51477c50d4efb83959dbb7c56297aba886e5544ee2aaed3024813465"}, "https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl": {"num_bytes": 749738, "checksum": "3730d312f6e3440559ace48831e51066acaca737f6eabec99bccb9e4b3c39d14"}}, "download_size": 4915944, "post_processing_size": null, "dataset_size": 4676934, "size_in_bytes": 9592878}} {"gsm8k": {"description": "State-of-the-art language models can match human performance on many tasks, but \nthey still struggle to robustly perform multi-step mathematical reasoning. To \ndiagnose the failures of current models and support research, we introduce GSM8K,\na dataset of 8.5K high quality linguistically diverse grade school math word problems.\nWe find that even the largest transformer models fail to achieve high test performance, \ndespite the conceptual simplicity of this problem distribution.\n", "citation": "@misc{cobbe2021training,\n title={Training Verifiers to Solve Math Word Problems},\n author={Karl Cobbe and Vineet Kosaraju and Mohammad Bavarian and Jacob Hilton and Reiichiro Nakano and Christopher Hesse and John Schulman},\n year={2021},\n eprint={2110.14168},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n", "homepage": "https://github.com/openai/grade-school-math", "license": "", "features": {"question": {"dtype": "string", "id": null, "_type": "Value"}, "answer": {"dtype": "string", "id": null, "_type": "Value"}}, "post_processed": null, "supervised_keys": null, "task_templates": null, "builder_name": "gsm8_k", "config_name": "gsm8k", "version": {"version_str": "0.0.1", "description": null, "major": 0, "minor": 0, "patch": 1}, "splits": {"train": {"name": "train", "num_bytes": 3963202, "num_examples": 7473, "dataset_name": "gsm8_k"}, "test": {"name": "test", "num_bytes": 713732, "num_examples": 1319, "dataset_name": "gsm8_k"}}, "download_checksums": {"https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/train.jsonl": {"num_bytes": 4166206, "checksum": "17f347dc51477c50d4efb83959dbb7c56297aba886e5544ee2aaed3024813465"}, "https://raw.githubusercontent.com/openai/grade-school-math/master/grade_school_math/data/test.jsonl": {"num_bytes": 749738, "checksum": "3730d312f6e3440559ace48831e51066acaca737f6eabec99bccb9e4b3c39d14"}}, "download_size": 4915944, "post_processing_size": null, "dataset_size": 4676934, "size_in_bytes": 9592878}}
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment