**WARNING**: This project is currently under active development. Interfaces and task implementations may change rapidly and without warning.
## Overview
## Overview
This project provides a unified framework to test autoregressive language models (GPT-2, GPT-3, GPTNeo, etc) on a large number of different evaluation tasks.
This project provides a unified framework to test autoregressive language models (GPT-2, GPT-3, GPTNeo, etc) on a large number of different evaluation tasks.
Features:
- 100+ tasks implemented
- Support for GPT-2, GPT-3, GPT-Neo, GPT-NeoX, and GPT-J, with flexible tokenization-agnostic interface
- Task versioning to ensure reproducibility
## Install
## Install
```bash
```bash
...
@@ -48,6 +51,7 @@ python main.py \
...
@@ -48,6 +51,7 @@ python main.py \
--tasks lambada,hellaswag
--tasks lambada,hellaswag
```
```
To evaluate mesh-transformer-jax models that are not available on HF, please invoke eval harness through [this script](https://github.com/kingoflolz/mesh-transformer-jax/blob/master/eval_harness.py).