Commits · cd2f63fb362b2c53b993e7edf6565ab6a5f9f260 · OpenDAS / vllm_cscc

18 Apr, 2024 2 commits
- [CI/CD] add neuron docker and ci test scripts (#3571) · cd2f63fb
  Liangfu Chen authored Apr 18, 2024
  
  cd2f63fb
- [Typing] Mypy typing part 2 (#4043) · 533d2a1f
  SangBin Cho authored Apr 18, 2024
```
Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
```
  533d2a1f
16 Apr, 2024 2 commits
- [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) · e95cd879
  Cade Daniel authored Apr 16, 2024
  
  e95cd879
- [Core] Fix engine-use-ray broken (#4105) · 4e7ee664
  SangBin Cho authored Apr 16, 2024
  
  4e7ee664
03 Apr, 2024 1 commit
- [Speculative decoding] Adding configuration object for speculative decoding (#3706) · 5757d90e
  Cade Daniel authored Apr 02, 2024
```
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
```
  5757d90e
29 Mar, 2024 1 commit
- Usage Stats Collection (#2852) · d8658c8c
  yhu422 authored Mar 28, 2024
  
  d8658c8c
25 Mar, 2024 2 commits
- [Feature] Add vision language model support. (#3042) · 64172a97
  xwjiang2010 authored Mar 25, 2024
  
  64172a97
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
22 Mar, 2024 1 commit
- [Hardware][Neuron] Refactor neuron support (#3471) · e90fc21f
  Zhuohan Li authored Mar 21, 2024
  
  e90fc21f
15 Mar, 2024 1 commit
- Fixes the misuse/mixuse of time.time()/time.monotonic() (#3220) · 14b8ae02
  Tao He authored Mar 16, 2024
```
Signed-off-by: Tao He <sighingnow@gmail.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  14b8ae02
11 Mar, 2024 2 commits
- Add distributed model executor abstraction (#3191) · 4c922709
  Zhuohan Li authored Mar 11, 2024
  
  4c922709
- [BugFix] Fix get tokenizer when using ray (#3301) · 9e8744a5
  Roy authored Mar 11, 2024
  
  9e8744a5
04 Mar, 2024 2 commits
- Add health check, make async Engine more robust (#3015) · ff578cae
  Antoni Baum authored Mar 04, 2024
```
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
```
  ff578cae
- Push logprob generation to LLMEngine (#3065) · 22de4523
  Antoni Baum authored Mar 04, 2024
```
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
```
  22de4523
02 Mar, 2024 1 commit

Add Automatic Prefix Caching (#2762) · ce4f5a29

Sage Moore authored Mar 02, 2024


Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

ce4f5a29

29 Feb, 2024 1 commit
- Add guided decoding for OpenAI API server (#2819) · 703e42ee
  felixzhu555 authored Feb 29, 2024
```
Co-authored-by: br3no <breno@veltefaria.de>
Co-authored-by: simon-mo <simon.mo@hey.com>
```
  703e42ee
31 Jan, 2024 1 commit
- fix some bugs (#2689) · c664b0e6
  zspo authored Feb 01, 2024
  
  c664b0e6
30 Jan, 2024 1 commit
- Fix 'Actor methods cannot be called directly' when using `--engine-use-ray` (#2664) · d79ced32
  Wen Sun authored Jan 31, 2024
```
* fix: engine-useray complain

* fix: typo
```
  d79ced32
28 Jan, 2024 1 commit
- Small async_llm_engine refactor (#2618) · 89be30fa
  Murali Andoorveedu authored Jan 27, 2024
  
  89be30fa
23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa

18 Jan, 2024 1 commit

[Experimental] Prefix Caching Support (#1669) · d10f8e1d

shiyi.c_98 authored Jan 17, 2024


Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

d10f8e1d

12 Jan, 2024 1 commit
- [DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011) · 6549aef2
  Jiaxiang authored Jan 12, 2024
  
  6549aef2
05 Jan, 2024 1 commit
- Ensure metrics are logged regardless of requests (#2347) · d0215a58
  Iskren Ivov Chernev authored Jan 05, 2024
  
  d0215a58
03 Jan, 2024 1 commit
- Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) · fd4ea8ef
  Zhuohan Li authored Jan 04, 2024
  
  fd4ea8ef
26 Dec, 2023 1 commit
- [BUGFIX] Do not return ignored sentences twice in async llm engine (#2258) · e0ff9200
  Zhuohan Li authored Dec 26, 2023
  
  e0ff9200
14 Dec, 2023 1 commit
- Fix typing in AsyncLLMEngine & add toml to requirements-dev (#2100) · 6774bd50
  mezuzza authored Dec 14, 2023
  
  6774bd50
03 Dec, 2023 1 commit
- Fix num_gpus when TP > 1 (#1852) · 464dd985
  Woosuk Kwon authored Dec 03, 2023
  
  464dd985
16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

11 Nov, 2023 1 commit
- Run default _AsyncLLMEngine._run_workers_async in threadpool (#1628) · 1b290ace
  Dominik Schwabe authored Nov 11, 2023
  
  1b290ace
01 Nov, 2023 1 commit
- [BugFix] Set engine_use_ray=True when TP>1 (#1531) · 5687d584
  ljss authored Nov 01, 2023
  
  5687d584
03 Oct, 2023 1 commit
- Use monotonic time where appropriate (#1249) · acbed3ef
  Antoni Baum authored Oct 02, 2023
  
  acbed3ef
18 Sep, 2023 1 commit
- align llm_engine and async_engine. (#1081) · 95592fa0
  Roy authored Sep 19, 2023
  
  95592fa0
17 Sep, 2023 1 commit
- Remove AsyncLLMEngine busy loop, shield background task (#1059) · ff36139f
  Antoni Baum authored Sep 17, 2023
  
  ff36139f
15 Sep, 2023 1 commit
- Abort when coroutine is cancelled (#1020) · b9fe4616
  Jerry Yang authored Sep 15, 2023
  
  b9fe4616
12 Sep, 2023 1 commit

add option to shorten prompt print in log (#991) · d6545ad2

leiwen83 authored Sep 13, 2023


Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

d6545ad2

08 Sep, 2023 1 commit
- Start background task in `AsyncLLMEngine.generate` (#988) · 08043847
  Antoni Baum authored Sep 08, 2023
```
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
```
  08043847
07 Sep, 2023 1 commit

Make `AsyncLLMEngine` more robust & fix batched abort (#969) · c07ece5c

Antoni Baum authored Sep 07, 2023


Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>

c07ece5c

06 Sep, 2023 1 commit
- Use queue for finished requests (#957) · c9927c1a
  Antoni Baum authored Sep 05, 2023
  
  c9927c1a
05 Sep, 2023 2 commits
- fix: typo (#948) · 22379d55
  Wen Sun authored Sep 05, 2023
  
  22379d55
- Initialize AsyncLLMEngine bg loop correctly (#943) · 16967258
  Antoni Baum authored Sep 04, 2023
  
  16967258