Add vLLM FAQs to README (#1625) (#1633)

a97fde23 · Hailey Schoelkopf · GitHub · cffc1bd3 · a97fde23
Unverified Commit a97fde23 authored Mar 25, 2024 by Hailey Schoelkopf Committed by GitHub Mar 25, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 3 deletions

README.md README.md +6 -3

No files found.
--- a/README.md
+++ b/README.md
@@ -144,6 +144,12 @@ To use vllm, do `pip install lm_eval[vllm]`. For a full list of supported vLLM c

 vLLM occasionally differs in output from Huggingface. We treat Huggingface as the reference implementation, and provide a [script](./scripts/model_comparator.py) for checking the validity of vllm results against HF.

+> [!Tip]
+> For fastest performance, we recommend using `--batch_size auto` for vLLM whenever possible, to leverage its continuous batching functionality!
+
+> [!Tip]
+> Passing `max_model_len=4096` or some other reasonable default to vLLM through model args may cause speedups or prevent out-of-memory errors when trying to use auto batch size, such as for Mistral-7B-v0.1 which defaults to a maximum length of 32k.
+
 ### Model APIs and Inference Servers

 Our library also supports the evaluation of models served via several commercial APIs, and we hope to implement support for the most commonly used performant local/self-hosted inference servers.
@@ -240,9 +246,6 @@ Additionally, one can provide a directory with `--use_cache` to cache the result

 For a full list of supported arguments, check out the [interface](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md) guide in our documentation!

-> [!Tip]
-> Running lm-evaluation-harness as an external library and can't find (almost) any tasks available? Run `lm_eval.tasks.initialize_tasks()` to load the library's stock tasks before calling `lm_eval.evaluate()` or `lm_eval.simple_evaluate()` !
-
 ## Visualizing Results

 You can seamlessly visualize and analyze the results of your evaluation harness runs using both Weights & Biases (W&B) and Zeno.