- 27 Feb, 2024 2 commits
- 26 Feb, 2024 3 commits
-
-
Woosuk Kwon authored
-
Philipp Moritz authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
Jared Moore authored
-
- 25 Feb, 2024 1 commit
-
-
Harry Mellor authored
-
- 23 Feb, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 22 Feb, 2024 10 commits
-
-
zhaoyang-star authored
-
Ronen Schaffer authored
-
Woosuk Kwon authored
-
44670 authored
-
Woosuk Kwon authored
-
Massimiliano Pronesti authored
-
Woosuk Kwon authored
-
Roy authored
-
Mustafa Eyceoz authored
-
Ronen Schaffer authored
-
- 21 Feb, 2024 7 commits
-
-
Zhuohan Li authored
This version is for more model support. Add support for Gemma models (#2964) and OLMo models (#2832).
-
Nick Hill authored
-
Woosuk Kwon authored
-
Zhuohan Li authored
-
Woosuk Kwon authored
-
Xiang Xu authored
-
Antoni Baum authored
-
- 20 Feb, 2024 3 commits
-
-
Antoni Baum authored
-
Zhuohan Li authored
-
James Whedbee authored
-
- 19 Feb, 2024 4 commits
-
-
Ronen Schaffer authored
-
Simon Mo authored
-
Isotr0py authored
-
Zhuohan Li authored
-
- 18 Feb, 2024 2 commits
-
-
Zhuohan Li authored
-
Mark Mozolewski authored
-
- 17 Feb, 2024 2 commits
-
-
jvmncs authored
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models
-
Nick Hill authored
If the SamplingParams object passed to LLMEngine.add_request() is mutated after it returns, it could affect the async sampling process for that request. Suggested by @Yard1 https://github.com/vllm-project/vllm/pull/2514#discussion_r1490106059
-
- 16 Feb, 2024 2 commits
-
-
Woosuk Kwon authored
-
shiyi.c_98 authored
-
- 15 Feb, 2024 3 commits
-
-
Hongxia Yang authored
-
Philipp Moritz authored
-
Woosuk Kwon authored
-