- 22 Jul, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 18 Jul, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:Cyrus Leung <cyrus.tl.leung@gmail.com>
-
- 16 Jul, 2024 1 commit
-
-
Joe authored
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 03 Jul, 2024 1 commit
-
-
Cyrus Leung authored
Signed-off-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by:
ywang96 <ywang@roblox.com> Co-authored-by:
xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by:
Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 26 Jun, 2024 1 commit
-
-
sasha0552 authored
-
- 05 Jun, 2024 1 commit
-
-
tomeras91 authored
[Frontend] OpenAI API server: Add `add_special_tokens` to ChatCompletionRequest (default False) (#5278)
-
- 02 Jun, 2024 1 commit
-
-
Avinash Raj authored
-
- 30 May, 2024 1 commit
-
-
Breno Faria authored
Co-authored-by:Breno Faria <breno.faria@intrafind.com>
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 25 May, 2024 1 commit
-
-
Eric Xihui Lin authored
Co-authored-by:
beagleski <yunanzhang@microsoft.com> Co-authored-by:
bapatra <bapatra@microsoft.com> Co-authored-by:
Barun Patra <codedecde@users.noreply.github.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 17 May, 2024 1 commit
-
-
bofeng huang authored
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 1 commit
-
-
Cyrus Leung authored
-
- 03 May, 2024 1 commit
-
-
Sebastian Schoennenbeck authored
-
- 27 Apr, 2024 1 commit
-
-
Cyrus Leung authored
-
- 23 Apr, 2024 2 commits
-
-
Jack Gordley authored
-
SangBin Cho authored
-
- 20 Apr, 2024 1 commit
-
-
Chirag Jain authored
-
- 18 Apr, 2024 2 commits
-
-
James Whedbee authored
-
Harry Mellor authored
Co-authored-by:Alexandre Payot <alexandrep@graphcore.ai>
-
- 11 Apr, 2024 1 commit
-
-
Dylan Hawk authored
Co-authored-by:Dylan Hawk <dylanwawk@gmail.com>
-
- 05 Apr, 2024 1 commit
-
-
Thomas Parnell authored
-
- 29 Mar, 2024 1 commit
-
-
Roy authored
-
- 25 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 21 Mar, 2024 1 commit
-
-
Roy authored
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 04 Mar, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:Avnish Narayan <avnish@anyscale.com>
-
- 17 Feb, 2024 1 commit
-
-
jvmncs authored
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models
-
- 19 Jan, 2024 1 commit
-
-
Simon Mo authored
-
- 17 Jan, 2024 1 commit
-
-
FlorianJoncour authored
-