"...resnet50_tensorflow.git" did not exist on "1fed7f941903f3e0102c12a22dddd63b6e11abba"
Commit d19e90de authored by lintangsutawika's avatar lintangsutawika
Browse files

edit pile

parent 7714e30b
# The Pile # The Pile
### Paper ### Paper
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Title: The Pile: An 800GB Dataset of Diverse Text for Language Modeling
https://arxiv.org/pdf/2101.00027.pdf
Abstract: https://arxiv.org/abs/2101.00027
The Pile is a 825 GiB diverse, open source language modelling data set that consists The Pile is a 825 GiB diverse, open source language modelling data set that consists
of 22 smaller, high-quality datasets combined together. To score well on Pile of 22 smaller, high-quality datasets combined together. To score well on Pile
...@@ -21,3 +22,47 @@ Homepage: https://pile.eleuther.ai/ ...@@ -21,3 +22,47 @@ Homepage: https://pile.eleuther.ai/
year={2020} year={2020}
} }
``` ```
### Groups and Tasks
#### Groups
* `pile`
#### Tasks
* `pile_arxiv`
* `pile_bookcorpus2`
* `pile_books3`
* `pile_dm-mathematics`
* `pile_enron`
* `pile_europarl`
* `pile_freelaw`
* `pile_github`
* `pile_gutenberg`
* `pile_hackernews`
* `pile_nih-exporter`
* `pile_opensubtitles`
* `pile_openwebtext2`
* `pile_philpapers`
* `pile_pile-cc`
* `pile_pubmed-abstracts`
* `pile_pubmed-central`
* `pile_stackexchange`
* `pile_ubuntu-irc`
* `pile_uspto`
* `pile_wikipedia`
* `pile_youtubesubtitles`
### Checklist
For adding novel benchmarks/datasets to the library:
* [ ] Is the task an existing benchmark in the literature?
* [ ] Have you referenced the original paper that introduced the task?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
If other tasks on this dataset are already supported:
* [ ] Is the "Main" variant of this task clearly denoted?
* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?
group: group:
- pile - pile
- perplexity
- loglikelihood_rolling
task: pile_arxiv task: pile_arxiv
dataset_path: EleutherAI/pile dataset_path: EleutherAI/pile
dataset_name: pile_arxiv dataset_name: pile_arxiv
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment