@@ -51,7 +51,7 @@ The best way to get support is to open an issue on this repo or join the Eleuthe
...
@@ -51,7 +51,7 @@ The best way to get support is to open an issue on this repo or join the Eleuthe
To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models)(e.g. GPT-J-6B) on `hellaswag` you can use the following command:
To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models)(e.g. GPT-J-6B) on `hellaswag` you can use the following command:
```bash
```bash
python -mlm_eval \
lm_eval \
--model hf \
--model hf \
--model_argspretrained=EleutherAI/gpt-j-6B \
--model_argspretrained=EleutherAI/gpt-j-6B \
--tasks hellaswag \
--tasks hellaswag \
...
@@ -62,7 +62,7 @@ python -m lm_eval \
...
@@ -62,7 +62,7 @@ python -m lm_eval \
Additional arguments can be provided to the model constructor using the `--model_args` flag. Most notably, this supports the common practice of using the `revisions` feature on the Hub to store partially trained checkpoints, or to specify the datatype for running a model:
Additional arguments can be provided to the model constructor using the `--model_args` flag. Most notably, this supports the common practice of using the `revisions` feature on the Hub to store partially trained checkpoints, or to specify the datatype for running a model:
@@ -75,7 +75,7 @@ Models that are loaded via both `transformers.AutoModelForCausalLM` (autoregress
...
@@ -75,7 +75,7 @@ Models that are loaded via both `transformers.AutoModelForCausalLM` (autoregress
Batch size selection can be automated by setting the ```--batch_size``` flag to ```auto```. This will perform automatic detection of the largest batch size that will fit on your device. On tasks where there is a large difference between the longest and shortest example, it can be helpful to periodically recompute the largest batch size, to gain a further speedup. To do this, append ```:N``` to above flag to automatically recompute the largest batch size ```N``` times. For example, to recompute the batch size 4 times, the command would be:
Batch size selection can be automated by setting the ```--batch_size``` flag to ```auto```. This will perform automatic detection of the largest batch size that will fit on your device. On tasks where there is a large difference between the longest and shortest example, it can be helpful to periodically recompute the largest batch size, to gain a further speedup. To do this, append ```:N``` to above flag to automatically recompute the largest batch size ```N``` times. For example, to recompute the batch size 4 times, the command would be:
Alternatively, you can use `lm-eval` or `lm_eval` instead of `python -m lm_eval` to call lm eval from anywhere.
Alternatively, you can use `lm-eval` or `lm_eval` instead of `lm_eval` to call lm eval from anywhere.
### Multi-GPU Evaluation with Hugging Face `accelerate`
### Multi-GPU Evaluation with Hugging Face `accelerate`
...
@@ -124,7 +124,7 @@ pip install -e .[vllm]
...
@@ -124,7 +124,7 @@ pip install -e .[vllm]
Then, you can run the library as normal, for single-GPU or tensor-parallel inference, for example:
Then, you can run the library as normal, for single-GPU or tensor-parallel inference, for example:
```bash
```bash
python -mlm_eval \
lm_eval \
--model vllm \
--model vllm \
--model_argspretrained={model_name},tensor_parallel_size={number of GPUs to use},dtype=auto,gpu_memory_utilization=0.8
--model_argspretrained={model_name},tensor_parallel_size={number of GPUs to use},dtype=auto,gpu_memory_utilization=0.8
--tasks lambada_openai
--tasks lambada_openai
...
@@ -156,7 +156,7 @@ Our library supports language models served via the OpenAI Completions API as fo
...
@@ -156,7 +156,7 @@ Our library supports language models served via the OpenAI Completions API as fo
```bash
```bash
export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
export OPENAI_API_SECRET_KEY=YOUR_KEY_HERE
python -mlm_eval \
lm_eval \
--model openai-completions \
--model openai-completions \
--model_argsengine=davinci \
--model_argsengine=davinci \
--tasks lambada_openai,hellaswag
--tasks lambada_openai,hellaswag
...
@@ -187,7 +187,7 @@ This will write out one text file for each task.
...
@@ -187,7 +187,7 @@ This will write out one text file for each task.
To verify the data integrity of the tasks you're performing in addition to running the tasks themselves, you can use the `--check_integrity` flag:
To verify the data integrity of the tasks you're performing in addition to running the tasks themselves, you can use the `--check_integrity` flag:
```bash
```bash
python -mlm_eval \
lm_eval \
--model openai \
--model openai \
--model_argsengine=davinci \
--model_argsengine=davinci \
--tasks lambada_openai,hellaswag \
--tasks lambada_openai,hellaswag \
...
@@ -198,7 +198,7 @@ python -m lm_eval \
...
@@ -198,7 +198,7 @@ python -m lm_eval \
For models loaded with the HuggingFace `transformers` library, any arguments provided via `--model_args` get passed to the relevant constructor directly. This means that anything you can do with `AutoModel` can be done with our library. For example, you can pass a local path via `pretrained=` or use models finetuned with [PEFT](https://github.com/huggingface/peft) by taking the call you would run to evaluate the base model and add `,peft=PATH` to the `model_args` argument:
For models loaded with the HuggingFace `transformers` library, any arguments provided via `--model_args` get passed to the relevant constructor directly. This means that anything you can do with `AutoModel` can be done with our library. For example, you can pass a local path via `pretrained=` or use models finetuned with [PEFT](https://github.com/huggingface/peft) by taking the call you would run to evaluate the base model and add `,peft=PATH` to the `model_args` argument:
[GPTQ](https://github.com/PanQiWei/AutoGPTQ) quantized models can be loaded by specifying their file names in `,gptq=NAME` (or `,gptq=True` for default names) in the `model_args` argument:
[GPTQ](https://github.com/PanQiWei/AutoGPTQ) quantized models can be loaded by specifying their file names in `,gptq=NAME` (or `,gptq=True` for default names) in the `model_args` argument: