- 10 Dec, 2024 1 commit
-
-
guanyu authored
test_audio文件将TEST_AUDIO_URLS改为本地;testchat文件360行路径改为本地;test_metrics文件197行将base_url从0.0.0.0改为localhost;test_run_batch将INPUT_BATCH,INVALID_INPUT_BATCH,INPUT_EMBEDDING_BATCH改为原来的格式;test_tokenizer_group将18行的gpt路径改为修改后的路径;test_braodcast将model的判断改为if llava-hf/llava-1.5-7b-hf in model
-
- 29 Nov, 2024 1 commit
-
-
zhuwenwen authored
-
- 27 Nov, 2024 1 commit
-
-
zhuwenwen authored
add VLLM_OPTEST_MODELS_PATH/OPTEST_MODELS_PATH to load models from local path instead of Hugging Face Hub
-
- 15 Nov, 2024 3 commits
- 23 Sep, 2024 1 commit
-
-
Jee Jee Li authored
-
- 18 Sep, 2024 2 commits
-
-
Aaron Pham authored
Signed-off-by:
Aaron Pham <contact@aarnphm.xyz> Co-authored-by:
Cyrus Leung <cyrus.tl.leung@gmail.com>
-
Cyrus Leung authored
-
- 04 Sep, 2024 2 commits
-
-
Woosuk Kwon authored
-
alexeykondrat authored
Co-authored-by:Simon Mo <simon.mo@hey.com>
-
- 23 Aug, 2024 1 commit
-
-
Alexander Matveev authored
-
- 16 Aug, 2024 1 commit
-
-
jon-chuang authored
-
- 14 Aug, 2024 1 commit
-
-
Jee Jee Li authored
-
- 06 Aug, 2024 1 commit
-
-
Jee Jee Li authored
-
- 03 Aug, 2024 1 commit
-
-
Jee Jee Li authored
-
- 01 Aug, 2024 1 commit
-
-
Jee Jee Li authored
-
- 22 Jul, 2024 1 commit
-
-
Jiaxin Shan authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 09 Jul, 2024 1 commit
-
-
Swapnil Parekh authored
Co-authored-by:
Swapnil Parekh <swapnilp@ibm.com> Co-authored-by:
Joe G <joseph.granados@h2o.ai> Co-authored-by:
Antoni Baum <antoni.baum@protonmail.com>
-
- 02 Jul, 2024 1 commit
-
-
Qubitium-ModelCloud authored
Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com> Co-authored-by:
ZX <zx@lbx.dev>
-
- 30 Jun, 2024 1 commit
-
-
SangBin Cho authored
Co-authored-by:sang <sangcho@anyscale.com>
-
- 29 Jun, 2024 1 commit
-
-
Joe Runde authored
Signed-off-by:Joe Runde <joe@joerun.de>
-
- 21 Jun, 2024 3 commits
-
-
rohithkrn authored
-
Jee Li authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
Jinzhen Lin authored
-
- 18 Jun, 2024 2 commits
-
-
sergey-tinkoff authored
-
Joe Runde authored
Signed-off-by:Joe Runde <Joseph.Runde@ibm.com>
-
- 15 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 13 Jun, 2024 1 commit
-
-
youkaichao authored
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293)
-
- 10 Jun, 2024 1 commit
-
-
Cyrus Leung authored
-
- 07 Jun, 2024 1 commit
-
-
Antoni Baum authored
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 22 May, 2024 2 commits
-
-
raywanb authored
-
SangBin Cho authored
-
- 21 May, 2024 1 commit
-
-
Isotr0py authored
-
- 18 May, 2024 1 commit
-
-
SangBin Cho authored
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files
-
- 16 May, 2024 1 commit
-
-
Silencio authored
Co-authored-by:Silencio <silencio@adsl-99-6-187-6.dsl.irvnca.sbcglobal.net>
-
- 14 May, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:SAHIL SUNEJA <suneja@us.ibm.com>
-
- 27 Apr, 2024 1 commit
-
-
Austin Veselka authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 24 Apr, 2024 1 commit
-
-
Woosuk Kwon authored
-