This folder is meant to contain instructions and task setups required to evaluate certain papers which may perform non-standard evaluation setups. Tasks can be supported already in the library under `lm_eval/tasks`, or if highly paper-specific, may remain as YAMLs in the respective `examples/paper-title` folder. ## Verified Papers: * [WIP] [Chain-of-Thought Prompting Elicits Reasoning in Large Language Models](https://arxiv.org/abs/2201.11903) * Further details can be found in the `chain_of_thought` subfolder. ## Candidates to Support: * Least-to-Most Prompting * Algorithmic Prompting * Other in-scope prompting techniques * Multi-turn prompting strategies are likely out of scope for the repository. * Pythia Suite: Term Frequencies over training * All setups from GPT-3 Paper * Varying few-shot orderings + selection ; Varying the label choices for multiple-choice tasks * Your Paper Here!