- 05 Apr, 2024 1 commit
-
-
Sean Gallen authored
Co-authored-by:
Simon Mo <simon.mo@hey.com> Co-authored-by:
Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 03 Apr, 2024 1 commit
-
-
Adrian Abeyta authored
Co-authored-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
HaiShaw <hixiao@gmail.com> Co-authored-by:
AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by:
mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by:
ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by:
guofangze <guofangze@kuaishou.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 02 Apr, 2024 2 commits
-
-
Roger Wang authored
-
bigPYJ1151 authored
Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com> Co-authored-by:
Yuan Zhou <yuan.zhou@intel.com>
-
- 30 Mar, 2024 1 commit
-
-
youkaichao authored
[Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746) Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 29 Mar, 2024 1 commit
-
-
yhu422 authored
-
- 28 Mar, 2024 1 commit
-
-
wenyujin333 authored
-
- 27 Mar, 2024 4 commits
-
-
Woosuk Kwon authored
-
Megha Agarwal authored
-
Woosuk Kwon authored
-
Jee Li authored
-
- 25 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
youkaichao authored
-
- 21 Mar, 2024 1 commit
-
-
Lalit Pradhan authored
-
- 19 Mar, 2024 3 commits
-
-
Jim Burtoft authored
-
Jim Burtoft authored
-
Simon Mo authored
-
- 15 Mar, 2024 1 commit
-
-
laneeee authored
-
- 12 Mar, 2024 1 commit
-
-
Sherlock Xu authored
Signed-off-by:Sherlock113 <sherlockxu07@gmail.com>
-
- 11 Mar, 2024 2 commits
-
-
Zhuohan Li authored
-
Philipp Moritz authored
-
- 08 Mar, 2024 1 commit
-
-
Roger Wang authored
-
- 04 Mar, 2024 2 commits
-
-
Jialun Lyu authored
-
Liangfu Chen authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 01 Mar, 2024 1 commit
-
-
Yuan Tang authored
Signed-off-by:Yuan Tang <terrytangyuan@gmail.com>
-
- 28 Feb, 2024 1 commit
-
-
Ganesh Jagadeesan authored
-
- 27 Feb, 2024 2 commits
-
-
Woosuk Kwon authored
-
张大成 authored
Co-authored-by:
zhangdacheng <zhangdacheng@ainirobot.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 25 Feb, 2024 1 commit
-
-
Harry Mellor authored
-
- 21 Feb, 2024 1 commit
-
-
Zhuohan Li authored
-
- 19 Feb, 2024 1 commit
-
-
Isotr0py authored
-
- 17 Feb, 2024 1 commit
-
-
jvmncs authored
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models
-
- 13 Feb, 2024 2 commits
-
-
Philipp Moritz authored
Co-authored-by:Roy <jasonailu87@gmail.com>
-
Simon Mo authored
-
- 12 Feb, 2024 1 commit
-
-
Philipp Moritz authored
-
- 11 Feb, 2024 1 commit
-
-
Hongxia Yang authored
-
- 07 Feb, 2024 1 commit
-
-
Philipp Moritz authored
-
- 04 Feb, 2024 1 commit
-
-
Massimiliano Pronesti authored
-
- 01 Feb, 2024 1 commit
-
-
Fengzhe Zhou authored
-