Commits · 739b61a348afa5da297a80ff15f4e39d6e524b53 · OpenDAS / vllm_cscc

22 Jul, 2024 1 commit
- [Frontend] Refactor prompt processing (#4028) · 739b61a3
  Cyrus Leung authored Jul 23, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  739b61a3
18 Jul, 2024 1 commit
- [BugFix][Frontend] Use LoRA tokenizer in OpenAI APIs (#6227) · e2fbaee7
  Nick Hill authored Jul 18, 2024
```
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
```
  e2fbaee7
16 Jul, 2024 1 commit
- [Bugfix][CI/Build] Test prompt adapters in openai entrypoint tests (#6419) · d92b3c5c
  Joe authored Jul 15, 2024
  
  d92b3c5c
09 Jul, 2024 1 commit

[CORE] Adding support for insertion of soft-tuned prompts (#4645) · 4d6ada94

Swapnil Parekh authored Jul 09, 2024


Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>

4d6ada94

03 Jul, 2024 1 commit

[Core] Dynamic image size support for VLMs (#5276) · 9831aec4

Cyrus Leung authored Jul 03, 2024


Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>

9831aec4

26 Jun, 2024 1 commit
- [Frontend] Add tokenize/detokenize endpoints (#5054) · c54269d9
  sasha0552 authored Jun 26, 2024
  
  c54269d9
05 Jun, 2024 1 commit
- [Frontend] OpenAI API server: Add `add_special_tokens` to... · f0a50054
  tomeras91 authored Jun 05, 2024
```
[Frontend] OpenAI API server: Add `add_special_tokens` to ChatCompletionRequest (default False) (#5278)
```
  f0a50054
02 Jun, 2024 1 commit
- [Frontend][OpenAI] Support for returning max_model_len on /v1/models response (#4643) · f790ad3c
  Avinash Raj authored Jun 02, 2024
  
  f790ad3c
30 May, 2024 1 commit
- [BUGFIX] [FRONTEND] Correct chat logprobs (#5029) · 87d41c84
  Breno Faria authored May 30, 2024
```
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
```
  87d41c84
28 May, 2024 1 commit
- [Core] Consolidate prompt arguments to LLM engines (#4328) · 5ae5ed1e
  Cyrus Leung authored May 29, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  5ae5ed1e
25 May, 2024 1 commit

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) · 8e192ff9

Eric Xihui Lin authored May 25, 2024


Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

8e192ff9

17 May, 2024 1 commit
- [Frontend] OpenAI API server: Do not add bos token by default when encoding (#4688) · 0150a106
  bofeng huang authored May 17, 2024
  
  0150a106
11 May, 2024 1 commit
- [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) · e254497b
  Chang Su authored May 11, 2024
  
  e254497b
09 May, 2024 1 commit
- [Frontend] Move async logic outside of constructor (#4674) · f12b20de
  Cyrus Leung authored May 09, 2024
  
  f12b20de
03 May, 2024 1 commit
- Fix/async chat serving (#2727) · f8e7adda
  Sebastian Schoennenbeck authored May 03, 2024
  
  f8e7adda
27 Apr, 2024 1 commit
- [Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355) · 8947bc3c
  Cyrus Leung authored Apr 27, 2024
  
  8947bc3c
23 Apr, 2024 2 commits
- [Bugfix] Fixing max token error message for openai compatible server (#4016) · d3c8180a
  Jack Gordley authored Apr 23, 2024
  
  d3c8180a
- [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) · 0ae11f78
  SangBin Cho authored Apr 23, 2024
  
  0ae11f78
20 Apr, 2024 1 commit
- Pass `tokenizer_revision` when getting tokenizer in openai serving (#4214) · bc9df157
  Chirag Jain authored Apr 20, 2024
  
  bc9df157
18 Apr, 2024 2 commits
- [Bugfix] Support logprobs when using guided_json and other constrained decoding fields (#4149) · e1bb2fd5
  James Whedbee authored Apr 18, 2024
  
  e1bb2fd5
- Allow model to be served under multiple names (#2894) · 66ded030
  Harry Mellor authored Apr 18, 2024
```
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
```
  66ded030
11 Apr, 2024 1 commit
- Fix echo/logprob OpenAI completion bug (#3441) · 95e7d4a9
  Dylan Hawk authored Apr 11, 2024
```
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
```
  95e7d4a9
05 Apr, 2024 1 commit
- Add option to completion API to truncate prompt tokens (#3144) · 1d7c940d
  Thomas Parnell authored Apr 05, 2024
  
  1d7c940d
29 Mar, 2024 1 commit
- [BugFix] Fix tokenizer out of vocab size (#3685) · 6110c39d
  Roy authored Mar 29, 2024
  
  6110c39d
25 Mar, 2024 1 commit
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
21 Mar, 2024 1 commit
- [Misc][Log] Add log for tokenizer length not equal to vocabulary size (#3500) · 86573234
  Roy authored Mar 21, 2024
  
  86573234
11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
04 Mar, 2024 1 commit
- Push logprob generation to LLMEngine (#3065) · 22de4523
  Antoni Baum authored Mar 04, 2024
```
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
```
  22de4523
17 Feb, 2024 1 commit

multi-LoRA as extra models in OpenAI server (#2775) · 8f36444c

jvmncs authored Feb 17, 2024

how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models

8f36444c

19 Jan, 2024 1 commit
- refactor complemention api for readability (#2499) · dd7e8f5f
  Simon Mo authored Jan 18, 2024
  
  dd7e8f5f
17 Jan, 2024 1 commit
- OpenAI Server refactoring (#2360) · 14cc317b
  FlorianJoncour authored Jan 17, 2024
  
  14cc317b