Commits · f8ecb84c0283a7f1ba02ee732c9f044f8f9d36ee · kecinstone / 2024pra-vllm

23 Jan, 2024 1 commit

[Experimental] Add multi-LoRA support (#1804) · 9b945daa

Antoni Baum authored Jan 24, 2024


Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>

9b945daa

18 Jan, 2024 1 commit

[Experimental] Prefix Caching Support (#1669) · d10f8e1d

shiyi.c_98 authored Jan 17, 2024


Co-authored-by: DouHappy <2278958187@qq.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

d10f8e1d

12 Jan, 2024 1 commit
- [DOC] Add additional comments for LLMEngine and AsyncLLMEngine (#1011) · 6549aef2
  Jiaxiang authored Jan 12, 2024
  
  6549aef2
05 Jan, 2024 1 commit
- Ensure metrics are logged regardless of requests (#2347) · d0215a58
  Iskren Ivov Chernev authored Jan 05, 2024
  
  d0215a58
03 Jan, 2024 1 commit
- Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) · fd4ea8ef
  Zhuohan Li authored Jan 04, 2024
  
  fd4ea8ef
26 Dec, 2023 1 commit
- [BUGFIX] Do not return ignored sentences twice in async llm engine (#2258) · e0ff9200
  Zhuohan Li authored Dec 26, 2023
  
  e0ff9200
14 Dec, 2023 1 commit
- Fix typing in AsyncLLMEngine & add toml to requirements-dev (#2100) · 6774bd50
  mezuzza authored Dec 14, 2023
  
  6774bd50
03 Dec, 2023 1 commit
- Fix num_gpus when TP > 1 (#1852) · 464dd985
  Woosuk Kwon authored Dec 03, 2023
  
  464dd985
16 Nov, 2023 1 commit

TP/quantization/weight loading refactor part 2 - Refactor quantized linear... · 7076fa1c

Zhuohan Li authored Nov 15, 2023

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)

Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.

7076fa1c

11 Nov, 2023 1 commit
- Run default _AsyncLLMEngine._run_workers_async in threadpool (#1628) · 1b290ace
  Dominik Schwabe authored Nov 11, 2023
  
  1b290ace
01 Nov, 2023 1 commit
- [BugFix] Set engine_use_ray=True when TP>1 (#1531) · 5687d584
  ljss authored Nov 01, 2023
  
  5687d584
03 Oct, 2023 1 commit
- Use monotonic time where appropriate (#1249) · acbed3ef
  Antoni Baum authored Oct 02, 2023
  
  acbed3ef
18 Sep, 2023 1 commit
- align llm_engine and async_engine. (#1081) · 95592fa0
  Roy authored Sep 19, 2023
  
  95592fa0
17 Sep, 2023 1 commit
- Remove AsyncLLMEngine busy loop, shield background task (#1059) · ff36139f
  Antoni Baum authored Sep 17, 2023
  
  ff36139f
15 Sep, 2023 1 commit
- Abort when coroutine is cancelled (#1020) · b9fe4616
  Jerry Yang authored Sep 15, 2023
  
  b9fe4616
12 Sep, 2023 1 commit

add option to shorten prompt print in log (#991) · d6545ad2

leiwen83 authored Sep 13, 2023


Signed-off-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>

d6545ad2

08 Sep, 2023 1 commit
- Start background task in `AsyncLLMEngine.generate` (#988) · 08043847
  Antoni Baum authored Sep 08, 2023
```
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
```
  08043847
07 Sep, 2023 1 commit

Make `AsyncLLMEngine` more robust & fix batched abort (#969) · c07ece5c

Antoni Baum authored Sep 07, 2023


Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Avnish Narayan <38871737+avnishn@users.noreply.github.com>

c07ece5c

06 Sep, 2023 1 commit
- Use queue for finished requests (#957) · c9927c1a
  Antoni Baum authored Sep 05, 2023
  
  c9927c1a
05 Sep, 2023 2 commits
- fix: typo (#948) · 22379d55
  Wen Sun authored Sep 05, 2023
  
  22379d55
- Initialize AsyncLLMEngine bg loop correctly (#943) · 16967258
  Antoni Baum authored Sep 04, 2023
  
  16967258
04 Sep, 2023 1 commit
- Refactor AsyncLLMEngine (#880) · ce741ba3
  Antoni Baum authored Sep 03, 2023
  
  ce741ba3
20 Jul, 2023 1 commit
- Ray placement group support (#397) · 9925c179
  Antoni Baum authored Jul 19, 2023
  
  9925c179
03 Jul, 2023 3 commits
- Fix an endless loop issue when engine_step throws a RuntimeError (#339) · 7717d083
  coolcloudcol authored Jul 04, 2023
  
  7717d083
- [Quality] Add CI for formatting (#343) · 42e0c1df
  Zhuohan Li authored Jul 03, 2023
  
  42e0c1df
- [Quality] Add code formatter and linter (#326) · d6fa1be3
  Zhuohan Li authored Jul 03, 2023
  
  d6fa1be3
22 Jun, 2023 1 commit
- [Bugfix] Fix a bug in RequestOutput.finished (#202) · 14f0b39c
  Woosuk Kwon authored Jun 22, 2023
  
  14f0b39c
17 Jun, 2023 2 commits
- Change the name to vLLM (#150) · 0b98ba15
  Woosuk Kwon authored Jun 17, 2023
  
  0b98ba15
- Rename servers to engines (#152) · e5464ee4
  Zhuohan Li authored Jun 17, 2023
  
  e5464ee4
16 Jun, 2023 1 commit
- Rename servers and change port numbers to reduce confusion (#149) · eedb46bf
  Zhuohan Li authored Jun 17, 2023
  
  eedb46bf
15 Jun, 2023 1 commit
- Add script for benchmarking serving throughput (#145) · 311490a7
  Woosuk Kwon authored Jun 14, 2023
  
  311490a7
07 Jun, 2023 1 commit
- Add docstrings for LLMServer and related classes and examples (#142) · 42983742
  Zhuohan Li authored Jun 07, 2023
  
  42983742
05 Jun, 2023 1 commit
- Fix various issues of async servers (#135) · 1a956e13
  Zhuohan Li authored Jun 05, 2023
  
  1a956e13
24 May, 2023 1 commit
- OpenAI Compatible Frontend (#116) · 057daef7
  Zhuohan Li authored May 23, 2023
  
  057daef7
22 May, 2023 1 commit
- Introduce LLM class for offline inference (#115) · 655a5e48
  Woosuk Kwon authored May 21, 2023
  
  655a5e48
21 May, 2023 1 commit
- Implement stop strings and best_of (#114) · f746ced0
  Woosuk Kwon authored May 21, 2023
  
  f746ced0
20 May, 2023 1 commit
- Refactor system architecture (#109) · c3442c1f
  Woosuk Kwon authored May 20, 2023
  
  c3442c1f