# Evaluation Harness for Large Language Models ### Currently based on evaluations of GPT3 as mentioned in https://arxiv.org/pdf/2005.14165.pdf ## Summary (need to convert from google docs at some point): https://docs.google.com/document/d/177dwJpH8GHebISXYZSn4NL98sXdCtQMH82b7O5F7jmw/edit?usp=sharing