Commits · 8423aef4c867818524e90b2e2e58730b6ee5592c · OpenDAS / vllm_cscc

31 Aug, 2024 1 commit
- [BugFix][Core] Multistep Fix Crash on Request Cancellation (#8059) · 8423aef4
  Robert Shaw authored Aug 31, 2024
  
  8423aef4
30 Aug, 2024 1 commit
- [Core] Logprobs support in Multi-step (#7652) · 428dd144
  afeldman-nm authored Aug 29, 2024
  
  428dd144
27 Aug, 2024 1 commit
- [Core] Asynchronous Output Processor (#7049) · 2eedede8
  Megha Agarwal authored Aug 26, 2024
```
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
```
  2eedede8
21 Aug, 2024 1 commit
- [mypy] Enable following imports for entrypoints (#7248) · baaedfdb
  Cyrus Leung authored Aug 21, 2024
```
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
```
  baaedfdb
02 Jul, 2024 1 commit
- [Core] Pipeline Parallel Support (#4412) · c5832d2a
  Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
  c5832d2a
11 Jun, 2024 1 commit
- [Misc] Various simplifications and typing fixes (#5368) · a0086298
  Nick Hill authored Jun 10, 2024
  
  a0086298
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

03 May, 2024 1 commit
- [Speculative decoding] Support target-model logprobs (#4378) · ab502751
  Cade Daniel authored May 03, 2024
  
  ab502751
26 Apr, 2024 1 commit
- [Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309) · 603ad848
  SangBin Cho authored Apr 26, 2024
  
  603ad848
23 Apr, 2024 1 commit
- [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) · 0ae11f78
  SangBin Cho authored Apr 23, 2024
  
  0ae11f78
16 Apr, 2024 1 commit
- [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) · e95cd879
  Cade Daniel authored Apr 16, 2024
  
  e95cd879