*`wikitext`: measure perplexity on the Wikitext dataset, via rolling loglikelihoods.
### Checklist
### Checklist
- [x] Is in Eval-harness v1.0 ?
* [x] Is the task an existing benchmark in the literature?
- [x] Has been checked for regression from v1.0?
* [x] Have you referenced the original paper that introduced the task?
- [ ] Has been checked for equivalence with original paper methodology?
* [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
- [ ] "Main" checked variant clearly denoted?
If other tasks on this dataset are already supported:
* [x] Is the "Main" variant of this task clearly denoted?
* [x] Have you provided a short sentence in a README on what each new variant adds / evaluates?
* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?