Commits · 3f8d42c81fe8d3842a9e05c9f5d98290b7f79736 · OpenDAS / vllm_cscc

20 Jul, 2024 1 commit
- Pipeline Parallel: Guard for KeyErrors at request abort (#6587) · 3f8d42c8
  Travis Johnson authored Jul 19, 2024
```
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
```
  3f8d42c8
11 Jul, 2024 1 commit
- [ BugFix ] Prompt Logprobs Detokenization (#6223) · 7ed6a4f0
  Robert Shaw authored Jul 11, 2024
```
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
```
  7ed6a4f0
02 Jul, 2024 1 commit
- [Core] Pipeline Parallel Support (#4412) · c5832d2a
  Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
  c5832d2a
15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
05 Jun, 2024 1 commit
- [Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True (#5226) · 974fc9b8
  zifeitong authored Jun 04, 2024
  
  974fc9b8
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

26 Apr, 2024 1 commit
- [Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309) · 603ad848
  SangBin Cho authored Apr 26, 2024
  
  603ad848
23 Apr, 2024 1 commit
- [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) · 0ae11f78
  SangBin Cho authored Apr 23, 2024
  
  0ae11f78
21 Apr, 2024 1 commit
- Make initialization of tokenizer and detokenizer optional (#3748) · a37d815b
  GeauxEric authored Apr 21, 2024
```
Co-authored-by: Yun Ding <yunding@nvidia.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  a37d815b
16 Apr, 2024 1 commit
- [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) · e95cd879
  Cade Daniel authored Apr 16, 2024
  
  e95cd879