# Evaluation Harness for Large Language Models### Currently based on evaluations of GPT3 as mentioned in https://arxiv.org/pdf/2005.14165.pdf## Summary (need to convert from google docs at some point):https://docs.google.com/document/d/177dwJpH8GHebISXYZSn4NL98sXdCtQMH82b7O5F7jmw/edit?usp=sharing