- 29 May, 2024 1 commit
-
-
Ronen Schaffer authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 17 May, 2024 1 commit
-
-
Antoni Baum authored
-
- 16 May, 2024 2 commits
-
-
Alex Wu authored
-
Aurick Qiao authored
Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 15 May, 2024 1 commit
-
-
Alex Wu authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
- 13 May, 2024 1 commit
-
-
Sanger Steel authored
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update `tensorizer` to version 2.9.0 (#4208)
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 09 May, 2024 2 commits
-
-
Hao Zhang authored
Co-authored-by:
Dash Desai <1723932+iamontheinet@users.noreply.github.com> Co-authored-by:
Aurick Qiao <qiao@aurick.net> Co-authored-by:
Aurick Qiao <aurick.qiao@snowflake.com> Co-authored-by:
Aurick Qiao <aurickq@users.noreply.github.com> Co-authored-by:
Cody Yu <hao.yu.cody@gmail.com>
-
Robert Shaw authored
-
- 02 May, 2024 1 commit
-
-
Danny Guinther authored
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 23 Apr, 2024 1 commit
-
-
James Fleming authored
Co-authored-by:mgoin <michael@neuralmagic.com>
-
- 22 Apr, 2024 1 commit
-
-
Harry Mellor authored
Co-authored-by:Harry Mellor <hmellor@oxts.com>
-
- 16 Apr, 2024 1 commit
-
-
Antoni Baum authored
-
- 15 Apr, 2024 1 commit
-
-
Sanger Steel authored
Co-authored-by:Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 14 Apr, 2024 1 commit
-
-
Sanger Steel authored
-
- 05 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 03 Apr, 2024 2 commits
-
-
Adrian Abeyta authored
Co-authored-by:
Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by:
HaiShaw <hixiao@gmail.com> Co-authored-by:
AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by:
Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by:
root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by:
mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by:
ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by:
guofangze <guofangze@kuaishou.com> Co-authored-by:
Michael Goin <mgoin64@gmail.com> Co-authored-by:
jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
zhuwenwen authored
-
- 28 Mar, 2024 2 commits
-
-
Woosuk Kwon authored
-
Simon Mo authored
-
- 25 Mar, 2024 2 commits
-
-
xwjiang2010 authored
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 16 Mar, 2024 2 commits
-
-
Simon Mo authored
-
Dinghow Yang authored
-
- 15 Mar, 2024 2 commits
-
-
Dinghow Yang authored
-
Dinghow Yang authored
-
- 14 Mar, 2024 1 commit
-
-
Allen.Dou authored
-
- 11 Mar, 2024 1 commit
-
-
DAIZHENWEI authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 28 Feb, 2024 1 commit
-
-
Liangfu Chen authored
-
- 17 Feb, 2024 1 commit
-
-
jvmncs authored
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models
-
- 02 Feb, 2024 1 commit
-
-
Cheng Su authored
-
- 31 Jan, 2024 1 commit
-
-
Robert Shaw authored
-
- 23 Jan, 2024 2 commits
-
-
Simon Mo authored
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-
- 18 Jan, 2024 2 commits
-
-
Jason Zhu authored
-
shiyi.c_98 authored
Co-authored-by:
DouHappy <2278958187@qq.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-