Commits · 03dccc886ef7e5d0dd67512f3e9748ee00c21fb2 · OpenDAS / vllm_cscc

11 Jun, 2024 2 commits
- [Bugfix][Frontend] Cleanup "fix chat logprobs" (#5026) · 640052b0
  Cyrus Leung authored Jun 11, 2024
  
  640052b0
- [Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (#5312) · 351d5e7b
  maor-ps authored Jun 11, 2024
```
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
```
  351d5e7b
10 Jun, 2024 1 commit
- [Feature][Frontend]: Continued `stream_options` implementation also in CompletionRequest (#5319) · 774d1035
  Itay Etelis authored Jun 10, 2024
  
  774d1035
07 Jun, 2024 1 commit
- [Feature][Frontend]: Add support for `stream_options` in `ChatCompletionRequest` (#5135) · baa15a9e
  Itay Etelis authored Jun 07, 2024
  
  baa15a9e
04 Jun, 2024 1 commit
- [Bugfix] Support `prompt_logprobs==0` (#5217) · 06b2550c
  Toshiki Kataoka authored Jun 04, 2024
  
  06b2550c
03 Jun, 2024 1 commit
- [FRONTEND] OpenAI `tools` support named functions (#5032) · f775a07e
  Breno Faria authored Jun 04, 2024
  
  f775a07e
30 May, 2024 1 commit
- [BUGFIX] [FRONTEND] Correct chat logprobs (#5029) · 87d41c84
  Breno Faria authored May 30, 2024
```
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
```
  87d41c84
28 May, 2024 1 commit
- [Core] Consolidate prompt arguments to LLM engines (#4328) · 5ae5ed1e
  Cyrus Leung authored May 29, 2024
```
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  5ae5ed1e
15 May, 2024 1 commit
- [Frontend] Re-enable custom roles in Chat Completions API (#4758) · fc0d9dfc
  Cyrus Leung authored May 16, 2024
  
  fc0d9dfc
13 May, 2024 1 commit

[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425) · 350f9e10

Cyrus Leung authored May 13, 2024

Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)

Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.

350f9e10

11 May, 2024 1 commit
- [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) · e254497b
  Chang Su authored May 11, 2024
  
  e254497b
03 May, 2024 1 commit
- Fix/async chat serving (#2727) · f8e7adda
  Sebastian Schoennenbeck authored May 03, 2024
  
  f8e7adda
01 May, 2024 1 commit
- [Bugfix] Add validation for seed (#4529) · c47ba4aa
  sasha0552 authored May 01, 2024
  
  c47ba4aa
30 Apr, 2024 1 commit
- [Frontend] Support complex message content for chat completions endpoint (#3467) · a4941404
  Florian Greinacher authored May 01, 2024
```
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
```
  a4941404
27 Apr, 2024 1 commit
- [Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355) · 8947bc3c
  Cyrus Leung authored Apr 27, 2024
  
  8947bc3c
20 Apr, 2024 1 commit
- [Bugfix] Add fix for JSON whitespace (#4189) · 138485a8
  Ayush Rautwar authored Apr 19, 2024
```
Co-authored-by: Ubuntu <ubuntu@ip-172-31-13-147.ec2.internal>
```
  138485a8
18 Apr, 2024 1 commit
- [Bugfix] Support logprobs when using guided_json and other constrained decoding fields (#4149) · e1bb2fd5
  James Whedbee authored Apr 18, 2024
  
  e1bb2fd5
16 Apr, 2024 1 commit
- LM Format Enforcer Guided Decoding Support (#3868) · 05434764
  Noam Gat authored Apr 16, 2024
```
Co-authored-by: Simon Mo <simon.mo@hey.com>
```
  05434764
11 Apr, 2024 2 commits
- Fix echo/logprob OpenAI completion bug (#3441) · 95e7d4a9
  Dylan Hawk authored Apr 11, 2024
```
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
```
  95e7d4a9
- [Core][5/N] Fully working chunked prefill e2e (#3884) · 67b4221a
  SangBin Cho authored Apr 11, 2024
  
  67b4221a
29 Mar, 2024 1 commit
- [BugFix][Frontend] Fix completion logprobs=0 error (#3731) · f510395b
  Roy authored Mar 30, 2024
  
  f510395b
25 Mar, 2024 2 commits
- [Bugfix] API stream returning two stops (#3450) · 0b4997e0
  Dylan Hawk authored Mar 25, 2024
```
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
```
  0b4997e0
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
16 Mar, 2024 1 commit
- Support arbitrary json_object in OpenAI and Context Free Grammar (#3211) · 120157fd
  Simon Mo authored Mar 16, 2024
  
  120157fd
11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
04 Mar, 2024 1 commit
- Push logprob generation to LLMEngine (#3065) · 22de4523
  Antoni Baum authored Mar 04, 2024
```
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
```
  22de4523
29 Feb, 2024 1 commit
- Add guided decoding for OpenAI API server (#2819) · 703e42ee
  felixzhu555 authored Feb 29, 2024
```
Co-authored-by: br3no <breno@veltefaria.de>
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  703e42ee
27 Feb, 2024 1 commit
- Support logit bias for OpenAI API (#3027) · e0ade06d
  Dylan Hawk authored Feb 26, 2024
  
  e0ade06d
26 Feb, 2024 1 commit
- Add LogProbs for Chat Completions in OpenAI (#2918) · 70f3e8e3
  Jared Moore authored Feb 25, 2024
  
  70f3e8e3
17 Feb, 2024 1 commit

multi-LoRA as extra models in OpenAI server (#2775) · 8f36444c

jvmncs authored Feb 17, 2024

how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models

8f36444c

25 Jan, 2024 1 commit
- Support Batch Completion in Server (#2529) · 3a7dd7e3
  Simon Mo authored Jan 24, 2024
  
  3a7dd7e3
19 Jan, 2024 1 commit
- refactor complemention api for readability (#2499) · dd7e8f5f
  Simon Mo authored Jan 18, 2024
  
  dd7e8f5f
17 Jan, 2024 1 commit
- OpenAI Server refactoring (#2360) · 14cc317b
  FlorianJoncour authored Jan 17, 2024
  
  14cc317b