[Readme change for SGLang] fix error in readme and add OOM solutions for sglang (#2738)

* initial components to support sglang * init of class SGLangLM * draft for generate_until of SGLang model * mock loglikelihood * initial loglikelihood_tokens * todo: fix bug of sglang engine init * implement generation tasks and test * support output type loglikelihood and loglikelihood_rolling (#1) * . * loglikelihood_rolling * / * support dp_size>1 * typo * add tests and clean code * skip tests of sglang for now * fix OOM error of sglang pytest * finish test for sglang * add sglang to readme * fix OOM of tests and clean SGLang model * update readme * clean pyproject and add tests for evaluator * add accuracy tests and it passed locally * add notes for test * Update README.md update readme * pre-commit * add OOM guideline for sglang and fix readme error * fix typo * fix typo * add readme --------- Co-authored-by: Xiaotong Jiang <xiaotong.jiang@databricks.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: Baber <baber@hey.com>

[Readme change for SGLang] fix error in readme and add OOM solutions for sglang (#2738)
* initial components to support sglang * init of class SGLangLM * draft for generate_until of SGLang model * mock loglikelihood * initial loglikelihood_tokens * todo: fix bug of sglang engine init * implement generation tasks and test * support output type loglikelihood and loglikelihood_rolling (#1) * . * loglikelihood_rolling * / * support dp_size>1 * typo * add tests and clean code * skip tests of sglang for now * fix OOM error of sglang pytest * finish test for sglang * add sglang to readme * fix OOM of tests and clean SGLang model * update readme * clean pyproject and add tests for evaluator * add accuracy tests and it passed locally * add notes for test * Update README.md update readme * pre-commit * add OOM guideline for sglang and fix readme error * fix typo * fix typo * add readme --------- Co-authored-by: Xiaotong Jiang <xiaotong.jiang@databricks.com> Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by: Baber <baber@hey.com>
529f4805 · Jinwei · GitHub · a87fe425 · 529f4805
Unverified Commit 529f4805 authored Mar 02, 2025 by Jinwei Committed by GitHub Mar 03, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 1 deletion

README.md README.md +7 -1

No files found.
--- a/README.md
+++ b/README.md
@@ -250,10 +250,16 @@ To use SGLang as the evaluation backend, please **install it in advance** via SG
 SGLang's server arguments are slightly different from other backends, see [here](https://docs.sglang.ai/backend/server_arguments.html) for more information. We provide an example of the usage here:
 ```bash
 lm_eval --model sglang \
-    --model_args pretrained={model_name},dp_size={data_parallel_size},tp_size={tensor_parallel_size},dtype=auto,mem-fraction-static=0.9, \
+    --model_args pretrained={model_name},dp_size={data_parallel_size},tp_size={tensor_parallel_size},dtype=auto \
    --tasks gsm8k_cot \
    --batch_size auto
 ```
+> [!Tip]
+> When encountering out of memory (OOM) errors (especially for multiple-choice tasks), try these solutions:
+> 1. Use a manual `batch_size`, rather than `auto`.
+> 2. Lower KV cache pool memory usage by adjusting `mem_fraction_static` - Add to your model arguments for example `--model_args pretrained=...,mem_fraction_static=0.7`.
+> 3. Increase tensor parallel size `tp_size` (if using multiple GPUs).
+
 ### Model APIs and Inference Servers

 Our library also supports the evaluation of models served via several commercial APIs, and we hope to implement support for the most commonly used performant local/self-hosted inference servers.