Update README.md

00c05d63 · Stella Biderman · GitHub · b189066d · 00c05d63
Unverified Commit 00c05d63 authored Nov 27, 2023 by Stella Biderman Committed by GitHub Nov 27, 2023
Hide whitespace changes
Inline Side-by-side

Showing with 20 additions and 41 deletions

README.md README.md +20 -41

No files found.
--- a/README.md
+++ b/README.md
@@ -4,17 +4,16 @@

 This project provides a unified framework to test generative language models on a large number of different evaluation tasks.

-Features:
-
+**Features:**
 - Over 60 standard academic benchmarks for LLMs, with hundreds of subtasks and variants implemented.
 - Support for models loaded via [transformers](https://github.com/huggingface/transformers/) (including quantization via [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ)), [GPT-NeoX](https://github.com/EleutherAI/gpt-neox), and [Megatron-DeepSpeed](https://github.com/microsoft/Megatron-DeepSpeed/), with a flexible tokenization-agnostic interface.
 - Support for commercial APIs including [OpenAI](https://openai.com), [goose.ai](https://goose.ai), and [TextSynth](https://textsynth.com/).
 - Support for evaluation on adapters (e.g. LoRA) supported in [HuggingFace's PEFT library](https://github.com/huggingface/peft).
 - Support for local models and benchmarks.
 - Evaluation with publicly available prompts ensures reproducibility and comparability between papers.
+- Easy support for custom prompts and evaluation metrics.

-The Language Model Evaluation Harness is the backend for 🤗 Hugging Face's popular [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and is used internally by dozens of companies including NVIDIA, Cohere, Booz Allen Hamilton, and Mosaic ML.
-
+The Language Model Evaluation Harness is the backend for 🤗 Hugging Face's popular [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), has been used in [hundreds of papers](https://scholar.google.com/scholar?oi=bibs&hl=en&authuser=2&cites=15052937328817631261,4097184744846514103,17476825572045927382,18443729326628441434,12854182577605049984) is used internally by dozens of companies including NVIDIA, Cohere, Booz Allen Hamilton, and Mosaic ML.

 ## Install

@@ -26,25 +25,22 @@ cd lm-evaluation-harness
 pip install -e .
 ```

-To install additional multilingual tokenization and text segmentation packages, you must install the package with the `multilingual` extra:
-
-```bash
-pip install -e ".[multilingual]"
-```
-
-To support loading GPTQ quantized models, install the package with the `gptq` extra:
+We also provide a number of optional dependencies for . Extras can be installed via `pip install -e ".[NAME]"`

-```bash
-pip install -e ".[gptq]"
-```
-
-
-Though we recommend only installing the extras you require, to install the package with all extras, run
-```bash
-pip install -e ".[all]"
-```
+| Name          | Use                                   |
+| ------------- | ------------------------------------- |
+| anthropic     | For using Anthropic's models          |
+| dev           | You probably don't want to use this   |
+| gptq          | For loading models with GPTQ          |
+| testing       | You probably don't want to use this   |
+| multilingual  | For multilingual tokenizers           |
+| openai        | For using OpenAI's models             |
+| promptsource  | For using PromtSource prompts         |
+| sentencepiece | For using the sentencepiece tokenizer |
+| vllm          | For loading models with vLLM          |
+| all           | Loads all extras                      |

-## Support
+### Support

 The best way to get support is to open an issue on this repo or join the EleutherAI discord server](discord.gg/eleutherai). The `#lm-thunderdome` channel is dedicated to developing this project and the `#release-discussion` channel is for receiving support for our releases.

@@ -54,7 +50,6 @@ The best way to get support is to open an issue on this repo or join the Eleuthe

 To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models) (e.g. GPT-J-6B) on `hellaswag` you can use the following command:

-
 ```bash
 python -m lm_eval \
    --model hf \
@@ -92,9 +87,7 @@ Alternatively, you can use `lm-eval` or `lm_eval` instead of `python -m lm_eval`

 ### Multi-GPU Evaluation with Hugging Face `accelerate`

-To parallelize evaluation of HuggingFace models across multiple GPUs, we allow for two different types of multi-GPU evaluation.
-
-The first is performed by launching evaluation via the `accelerate` library as follows:
+To parallelize evaluation of HuggingFace models across multiple GPUs, we leverage the [accelerate 🚀](https://github.com/huggingface/accelerate) library as follows:

 ```
 accelerate launch -m lm_eval \
@@ -107,32 +100,18 @@ This will perform *data-parallel evaluation*: that is, placing a **single full c

 If your model is *is too large to be run on a single one of your GPUs* then you can use `accelerate` with Fully Sharded Data Parallel (FSDP) that splits the weights of the model across your data parallel ranks. To enable this, ensure you select `YES` when asked ```Do you want to use FullyShardedDataParallel?``` when running `accelerate config`. To enable memory-efficient loading, select `YES` when asked `Do you want each individually wrapped FSDP unit to broadcast module parameters from rank 0 at the start?`. This will ensure only the rank 0 process loads the model and then broadcasts the parameters to the other ranks instead of having each rank load all parameters which can lead to large RAM usage spikes around the start of the script that may cause errors.

-
-We also provide an second method to run these large models: use of the `parallelize` argument.
-```
-python -m lm_eval \
-    --model hf \
-    --model_args pretrained=EleutherAI/pythia-12b,parallelize=True
-    --tasks lambada_openai,arc_easy \
-    --batch_size 16
-```
-
 To pass even more advanced keyword arguments to `accelerate`, we allow for the following arguments as well:
 - `device_map_option`: How to split model weights across available GPUs. defaults to "auto".
 - `max_memory_per_gpu`: the max GPU memory to use per GPU in loading the model.
 - `max_cpu_memory`: the max amount of CPU memory to use when offloading the model weights to RAM.
 - `offload_folder`: a folder where model weights will be offloaded to disk if needed.

-Note that this method naively splits models across GPUs, resulting in only a single GPU performing work at any point in time, and so is much slower than launching with `accelerate launch`, possibly by a factor of the total # of GPUs.
-
-**Note that this option requires launching evaluation via `python -m lm_eval` rather than `accelerate launch -m lm_eval`.**
-
 To use `accelerate` with the `lm-eval` command, use
 ```
 accelerate launch --no_python lm-eval --model ...
 ```

-### Tensor Parallel + Optimized Inference with vLLM
+#### Tensor Parallel + Optimized Inference with vLLM

 We also support vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html).

@@ -166,7 +145,7 @@ A full accounting of the supported and planned libraries + APIs can be seen belo
 | Anthropic                   | :heavy_check_mark:              | `anthropic`                                                                      | [Supported Anthropic Engines](https://docs.anthropic.com/claude/reference/selecting-a-model)  | `generate_until` (no logprobs)                             |
 | GooseAI                     | :heavy_check_mark: (not separately maintained)  | `openai`, `openai-completions`, `gooseai` (same interface as OpenAI Completions) |                                                                                               | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
 | Textsynth                   | Needs testing                   | `textsynth`                                                                      | ???                                                                                           | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
-| Cohere                      | :hourglass: - blocked on Cohere API bug | N/A                                                                              | [All `cohere.generate()` engines](https://docs.cohere.com/docs/models)                        | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
+| Cohere                      | [:hourglass: - blocked on Cohere API bug](https://github.com/EleutherAI/lm-evaluation-harness/pull/395) | N/A                                                                              | [All `cohere.generate()` engines](https://docs.cohere.com/docs/models)                        | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
 | GGML/[Llama.cpp](https://github.com/ggerganov/llama.cpp) (via [llama-cpp-python](https://github.com/abetlen/llama-cpp-python))                        | :heavy_check_mark:              | `gguf`, `ggml`                                                                   | Llama-architecture models (Llama, Llama 2, Llemma, Mistral(?), Llama finetunes)               | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
 | vLLM                        | :heavy_check_mark:       | `vllm`                                                                           | [Most HF Causal Language Models](https://docs.vllm.ai/en/latest/models/supported_models.html) | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                             |
 | Your inference server here! | ...                             | ...                                                                              | ...                                                                                           | ...                                                      |                                | ...                                                      |