Commits · e0ade06d6305cf84b41c1962cdd9dfdbfee16ac9 · norm / vllm

27 Feb, 2024 1 commit
- Support logit bias for OpenAI API (#3027) · e0ade06d
  Dylan Hawk authored Feb 26, 2024
  
  e0ade06d
26 Feb, 2024 1 commit
- Add LogProbs for Chat Completions in OpenAI (#2918) · 70f3e8e3
  Jared Moore authored Feb 25, 2024
  
  70f3e8e3
17 Feb, 2024 1 commit

multi-LoRA as extra models in OpenAI server (#2775) · 8f36444c

jvmncs authored Feb 17, 2024

how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models

8f36444c

25 Jan, 2024 1 commit
- Support Batch Completion in Server (#2529) · 3a7dd7e3
  Simon Mo authored Jan 24, 2024
  
  3a7dd7e3
19 Jan, 2024 1 commit
- refactor complemention api for readability (#2499) · dd7e8f5f
  Simon Mo authored Jan 18, 2024
  
  dd7e8f5f
17 Jan, 2024 1 commit
- OpenAI Server refactoring (#2360) · 14cc317b
  FlorianJoncour authored Jan 17, 2024
  
  14cc317b