- 29 May, 2024 2 commits
-
-
Cyrus Leung authored
-
afeldman-nm authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
-
- 28 May, 2024 1 commit
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
- 15 May, 2024 1 commit
-
-
SangBin Cho authored
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) This PR combines prepare_prompt and prepare_decode into a single API. This PR also coelsce the attn metadata for prefill/decode to a single class and allow to slice them when running attn backend. It also refactors subquery_start_loc which was not refactored in the previous PR
-
- 14 May, 2024 1 commit
-
-
Kuntai Du authored
[Core][Hash][Automatic Prefix caching] Accelerating the hashing function by avoiding deep copies (#4696)
-
- 11 May, 2024 1 commit
-
-
Chang Su authored
-
- 08 May, 2024 2 commits
-
-
Cody Yu authored
Co-authored-by:Cade Daniel <edacih@gmail.com>
-
youkaichao authored
-
- 07 May, 2024 1 commit
-
-
youkaichao authored
-
- 04 May, 2024 1 commit
-
-
Cody Yu authored
-
- 03 May, 2024 2 commits
-
-
Cade Daniel authored
-
Lily Liu authored
Co-authored-by:LiuXiaoxuanPKU <llilyliupku@gmail.com>
-
- 28 Apr, 2024 1 commit
-
-
Ronen Schaffer authored
Co-authored-by:
Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com> Co-authored-by:
Robert Shaw <rshaw@neuralmagic.com>
-
- 26 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 19 Apr, 2024 1 commit
-
-
Uranus authored
Co-authored-by:Zhong Wang <wangzhong@infini-ai.com>
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 12 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 11 Apr, 2024 2 commits
-
-
Nick Hill authored
-
SangBin Cho authored
-
- 05 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 28 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
Cade Daniel authored
-
- 26 Mar, 2024 1 commit
-
-
Nick Hill authored
Co-authored-by:Sahil Suneja <sahilsuneja@gmail.com>
-
- 25 Mar, 2024 4 commits
-
-
xwjiang2010 authored
-
Swapnil Parekh authored
Co-authored-by:Swapnil Parekh <swapnilp@ibm.com>
-
SangBin Cho authored
-
Woosuk Kwon authored
-
- 20 Mar, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:Roger Wang <136131678+ywang96@users.noreply.github.com>
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 09 Mar, 2024 1 commit
-
-
Cade Daniel authored
-
- 07 Mar, 2024 1 commit
-
-
jacobthebanana authored
Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) (#3263)
-
- 06 Mar, 2024 1 commit
-
-
Cade Daniel authored
-
- 05 Mar, 2024 1 commit
-
-
Nick Hill authored
-
- 04 Mar, 2024 1 commit
-
-
Antoni Baum authored
Co-authored-by:Avnish Narayan <avnish@anyscale.com>
-
- 03 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 02 Mar, 2024 1 commit
-
-
Sage Moore authored
Co-authored-by:
ElizaWszola <eliza@neuralmagic.com> Co-authored-by:
Michael Goin <michael@neuralmagic.com>
-
- 21 Feb, 2024 2 commits
-
-
Nick Hill authored
-
Antoni Baum authored
-
- 01 Feb, 2024 1 commit
-
-
zspo authored
-