# Evaluation Harness for Large Language Models

### Currently based on evaluations of GPT3 as mentioned in https://arxiv.org/pdf/2005.14165.pdf

## Summary (need to convert from google docs at some point):
https://docs.google.com/document/d/177dwJpH8GHebISXYZSn4NL98sXdCtQMH82b7O5F7jmw/edit?usp=sharing