Commits · 776dbd74f1d6a42a1e71c3b18a0d28e61f2e9ea5 · OpenDAS / vllm_cscc

16 Oct, 2024 1 commit
- [CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267) · 776dbd74
  Russell Bryant authored Oct 16, 2024
```
Signed-off-by: Russell Bryant <rbryant@redhat.com>
```
  776dbd74
11 Oct, 2024 1 commit
- [misc] hide best_of from engine (#9261) · cbc2ef55
  youkaichao authored Oct 10, 2024
```
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
```
  cbc2ef55
07 Oct, 2024 1 commit
- [core] remove beam search from the core (#9105) · 18b296fd
  youkaichao authored Oct 06, 2024
  
  18b296fd
03 Sep, 2024 1 commit
- [Bugfix] Fix single output condition in output processor (#7881) · 0fbc6696
  Woosuk Kwon authored Sep 02, 2024
  
  0fbc6696
30 Aug, 2024 1 commit
- [Core] Logprobs support in Multi-step (#7652) · 428dd144
  afeldman-nm authored Aug 29, 2024
  
  428dd144
27 Aug, 2024 1 commit
- [Core] Asynchronous Output Processor (#7049) · 2eedede8
  Megha Agarwal authored Aug 26, 2024
```
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
```
  2eedede8
04 Aug, 2024 1 commit
- [core][misc] simply output processing with shortcut code path (#7117) · 83c644fe
  youkaichao authored Aug 04, 2024
  
  83c644fe
20 Jul, 2024 1 commit
- Pipeline Parallel: Guard for KeyErrors at request abort (#6587) · 3f8d42c8
  Travis Johnson authored Jul 19, 2024
```
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
```
  3f8d42c8
11 Jul, 2024 1 commit
- [ BugFix ] Prompt Logprobs Detokenization (#6223) · 7ed6a4f0
  Robert Shaw authored Jul 11, 2024
```
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
```
  7ed6a4f0
02 Jul, 2024 1 commit
- [Core] Pipeline Parallel Support (#4412) · c5832d2a
  Murali Andoorveedu authored Jul 02, 2024
```
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
```
  c5832d2a
15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
05 Jun, 2024 1 commit
- [Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True (#5226) · 974fc9b8
  zifeitong authored Jun 04, 2024
  
  974fc9b8
18 May, 2024 1 commit

[Lora] Support long context lora (#4787) · 2e9a2227

SangBin Cho authored May 18, 2024

Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files

2e9a2227

26 Apr, 2024 1 commit
- [Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309) · 603ad848
  SangBin Cho authored Apr 26, 2024
  
  603ad848
23 Apr, 2024 1 commit
- [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) · 0ae11f78
  SangBin Cho authored Apr 23, 2024
  
  0ae11f78
21 Apr, 2024 1 commit
- Make initialization of tokenizer and detokenizer optional (#3748) · a37d815b
  GeauxEric authored Apr 21, 2024
```
Co-authored-by: Yun Ding <yunding@nvidia.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
```
  a37d815b
16 Apr, 2024 1 commit
- [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) · e95cd879
  Cade Daniel authored Apr 16, 2024
  
  e95cd879