- 06 Aug, 2024 1 commit
-
-
afeldman-nm authored
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) Co-authored-by:
Andrew Feldman <afeld2012@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 01 Aug, 2024 1 commit
-
-
youkaichao authored
-
- 22 Jul, 2024 1 commit
-
-
Jiaxin Shan authored
Co-authored-by:Antoni Baum <antoni.baum@protonmail.com>
-
- 19 Jul, 2024 1 commit
-
-
Antoni Baum authored
-
- 02 Jul, 2024 1 commit
-
-
Alexander Matveev authored
-
- 15 Jun, 2024 2 commits
-
-
Cyrus Leung authored
-
leiwen83 authored
Signed-off-by:
Lei Wen <wenlei03@qiyi.com> Co-authored-by:
Lei Wen <wenlei03@qiyi.com>
-
- 12 Jun, 2024 1 commit
-
-
SangBin Cho authored
-
- 03 Jun, 2024 1 commit
-
-
Kaiyang Chen authored
-
- 29 May, 2024 2 commits
-
-
Cyrus Leung authored
-
afeldman-nm authored
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837)
-
- 28 May, 2024 2 commits
-
-
Cyrus Leung authored
Co-authored-by:Roger Wang <ywang@roblox.com>
-
Michał Moskal authored
Co-authored-by:Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
-
- 24 May, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:Lei Wen <wenlei03@qiyi.com>
-
- 13 May, 2024 2 commits
-
-
SangBin Cho authored
Co-authored-by:Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
-
Cyrus Leung authored
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
-
- 10 May, 2024 1 commit
-
-
Robert Shaw authored
-
- 08 May, 2024 1 commit
-
-
youkaichao authored
-
- 07 May, 2024 2 commits
-
-
youkaichao authored
-
youkaichao authored
-
- 02 May, 2024 1 commit
-
-
SangBin Cho authored
-
- 01 May, 2024 1 commit
-
-
leiwen83 authored
Co-authored-by:
Lei Wen <wenlei03@qiyi.com> Co-authored-by:
Sage Moore <sagemoore@utexas.edu>
-
- 23 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 22 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 16 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 11 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 09 Apr, 2024 1 commit
-
-
Cade Daniel authored
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
-
- 05 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 03 Apr, 2024 1 commit
-
-
SangBin Cho authored
-
- 02 Apr, 2024 1 commit
-
-
Cade Daniel authored
[Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup (#3783)
-
- 01 Apr, 2024 1 commit
-
-
Cade Daniel authored
-
- 28 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
Cade Daniel authored
-
- 25 Mar, 2024 1 commit
-
-
SangBin Cho authored
-
- 22 Mar, 2024 1 commit
-
-
Thomas Parnell authored
Co-authored-by:Jan van Lunteren <jvl@zurich.ibm.com>
-
- 20 Mar, 2024 2 commits
-
-
SangBin Cho authored
-
ElizaWszola authored
[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357) Co-authored-by:Zhuohan Li <zhuohan123@gmail.com>
-
- 13 Mar, 2024 1 commit
-
-
Breno Faria authored
-
- 11 Mar, 2024 1 commit
-
-
Zhuohan Li authored
-
- 06 Mar, 2024 1 commit
-
-
Cade Daniel authored
-