This folder is meant to contain instructions and task setups required to evaluate certain papers which may perform non-standard evaluation setups.

Tasks can be supported already in the library under `lm_eval/tasks`, or if highly paper-specific, may remain as YAMLs in the respective `examples/paper-title` folder.

## Verified Papers:

* [WIP] [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
  * Further details can be found in the `chain_of_thought` subfolder.

## Candidates to Support:

* Least-to-Most Prompting
* Algorithmic Prompting
* Other in-scope prompting techniques
  * Multi-turn prompting strategies are likely out of scope for the repository.
* Pythia Suite: Term Frequencies over training
* All setups from GPT-3 Paper
* Varying few-shot orderings + selection ; Varying the label choices for multiple-choice tasks

* Your Paper Here!