This folder is meant to contain instructions and task setups required to evaluate certain papers which may perform non-standard evaluation setups.
Tasks can be supported already in the library under `lm_eval/tasks`, or if highly paper-specific, may remain as YAMLs in the respective `examples/paper-title` folder.
## Verified Papers:
*[WIP] [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903)
* Further details can be found in the `chain_of_thought` subfolder.
## Candidates to Support:
* Least-to-Most Prompting
* Algorithmic Prompting
* Other in-scope prompting techniques
* Multi-turn prompting strategies are likely out of scope for the repository.
* Pythia Suite: Term Frequencies over training
* All setups from GPT-3 Paper
* Varying few-shot orderings + selection ; Varying the label choices for multiple-choice tasks