Commits · ae151d73be479e9c0caa2fdfc30b17f073018ef3 · OpenDAS / vllm_cscc

10 Jul, 2024 1 commit
- [Speculative Decoding] Enabling bonus token in speculative decoding for KV... · ae151d73
  sroy745 authored Jul 10, 2024
```
[Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765)
```
  ae151d73
28 Jun, 2024 1 commit
- [Spec Decode] Introduce DraftModelRunner (#5799) · b2c62023
  Cody Yu authored Jun 28, 2024
  
  b2c62023
25 Jun, 2024 1 commit
- [Speculative Decoding] Support draft model on different tensor-parallel size... · 2ce5d668
  Woo-Yeon Lee authored Jun 25, 2024
```
 [Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414)
```
  2ce5d668
15 Jun, 2024 1 commit
- [mypy] Enable type checking for test directory (#5017) · 0e9164b4
  Cyrus Leung authored Jun 15, 2024
  
  0e9164b4
05 Jun, 2024 1 commit
- [Speculative Decoding] Add `ProposerWorkerBase` abstract class (#5252) · faf71bcd
  Nick Hill authored Jun 05, 2024
  
  faf71bcd
15 May, 2024 1 commit

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode... · 65bf2ac1

SangBin Cho authored May 15, 2024

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681)

This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend.

It also refactors subquery_start_loc which was not refactored in the previous PR

65bf2ac1

10 May, 2024 1 commit

[Core] Fix circular reference which leaked llm instance in local dev env (#4737) · 6a0f6172

SangBin Cho authored May 10, 2024

Storing exception frame is extremely prone to circular refernece because it contains the reference to objects.

When tensorizer is not installed, it leaks llm instance because error frame has references to various modules which cause circular reference problem.

I also found spec decoding has a circular reference issue, and I solved it using weakref.proxy.

6a0f6172

04 May, 2024 1 commit
- [Misc][Refactor] Introduce ExecuteModelData (#4540) · bc8ad684
  Cody Yu authored May 03, 2024
  
  bc8ad684
01 May, 2024 1 commit
- [Speculative decoding] Add ngram prompt lookup decoding (#4237) · b38e42fb
  leiwen83 authored May 02, 2024
```
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
```
  b38e42fb
23 Apr, 2024 1 commit
- [Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951) · 62b8aebc
  Cade Daniel authored Apr 23, 2024
  
  62b8aebc
18 Apr, 2024 1 commit
- [Typing] Mypy typing part 2 (#4043) · 533d2a1f
  SangBin Cho authored Apr 18, 2024
```
Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
```
  533d2a1f
16 Apr, 2024 1 commit
- [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) · e95cd879
  Cade Daniel authored Apr 16, 2024
  
  e95cd879
25 Mar, 2024 1 commit
- [CI] Try introducing isort. (#3495) · 01bfb22b
  SangBin Cho authored Mar 25, 2024
  
  01bfb22b
22 Mar, 2024 1 commit
- [Hardware][Neuron] Refactor neuron support (#3471) · e90fc21f
  Zhuohan Li authored Mar 21, 2024
  
  e90fc21f
11 Mar, 2024 1 commit
- Re-enable the 80 char line width limit (#3305) · 2f8844ba
  Zhuohan Li authored Mar 10, 2024
  
  2f8844ba
09 Mar, 2024 1 commit
- [Speculative decoding 3/9] Worker which speculates, scores, and applies rejection sampling (#3103) · 8437bae6
  Cade Daniel authored Mar 08, 2024
  
  8437bae6