- 25 Feb, 2024 1 commit
-
-
Harry Mellor authored
-
- 22 Feb, 2024 3 commits
-
-
Ronen Schaffer authored
-
Woosuk Kwon authored
-
Massimiliano Pronesti authored
-
- 21 Feb, 2024 2 commits
-
-
Nick Hill authored
-
Antoni Baum authored
-
- 20 Feb, 2024 1 commit
-
-
Zhuohan Li authored
-
- 19 Feb, 2024 3 commits
-
-
Ronen Schaffer authored
-
Isotr0py authored
-
Zhuohan Li authored
-
- 17 Feb, 2024 1 commit
-
-
jvmncs authored
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models
-
- 15 Feb, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 13 Feb, 2024 1 commit
-
-
Terry authored
* add mixtral lora support * formatting * fix incorrectly ported logic * polish tests * minor fixes and refactoring * minor fixes * formatting * rename and remove redundant logic * refactoring * refactoring * minor fix * minor refactoring * fix code smell
-
- 06 Feb, 2024 2 commits
-
-
Lily Liu authored
-
Woosuk Kwon authored
-
- 05 Feb, 2024 1 commit
-
-
Hongxia Yang authored
-
- 01 Feb, 2024 1 commit
-
-
Kunshang Ji authored
Co-authored-by:
Jiang Li <jiang1.li@intel.com> Co-authored-by:
Kunshang Ji <kunshang.ji@intel.com>
-
- 31 Jan, 2024 2 commits
-
-
Philipp Moritz authored
-
Philipp Moritz authored
-
- 30 Jan, 2024 2 commits
-
-
Vladimir authored
-
wangding zeng authored
Co-authored-by:roy <jasonailu87@gmail.com>
-
- 29 Jan, 2024 1 commit
-
-
zhaoyang-star authored
Co-authored-by:
zhaoyang <zhao.yang16@zte.com.cn> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 27 Jan, 2024 1 commit
-
-
Hanzhi Zhou authored
-
- 25 Jan, 2024 1 commit
-
-
Simon Mo authored
-
- 24 Jan, 2024 1 commit
-
-
Nikola Borisov authored
-
- 23 Jan, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:
Chen Shen <scv119@gmail.com> Co-authored-by:
Shreyas Krishnaswamy <shrekris@anyscale.com> Co-authored-by:
Avnish Narayan <avnish@anyscale.com>
-
- 22 Jan, 2024 2 commits
-
-
Jason Zhu authored
Add a 1-line docstring to explain why calling context_attention_fwd twice in test_prefix_prefill.py (#2553)
-
Cade Daniel authored
-
- 19 Jan, 2024 2 commits
-
-
Zhuohan Li authored
-
Simon Mo authored
-
- 18 Jan, 2024 1 commit
-
-
shiyi.c_98 authored
Co-authored-by:
DouHappy <2278958187@qq.com> Co-authored-by:
Zhuohan Li <zhuohan123@gmail.com>
-
- 17 Jan, 2024 2 commits
-
-
FlorianJoncour authored
-
Hyunsung Lee authored
-
- 14 Jan, 2024 1 commit
-
-
Simon Mo authored
-
- 12 Jan, 2024 1 commit
-
-
陈序 authored
* Align top_p and top_k with huggingface * remove _get_prompt_and_output_tokens * rename _apply_top_p_top_k * compare top_p top_k with hf * fix test errors
-
- 09 Jan, 2024 1 commit
-
-
Cade Daniel authored
-
- 04 Jan, 2024 1 commit
-
-
Woosuk Kwon authored
-
- 03 Jan, 2024 2 commits
-
-
Zhuohan Li authored
-
Jee Li authored
-
- 27 Dec, 2023 1 commit
-
-
Zhuohan Li authored
-