- 20 Oct, 2024 1 commit
-
-
Chen Zhang authored
-
- 11 Oct, 2024 1 commit
-
-
Tyler Michael Smith authored
-
- 18 Sep, 2024 1 commit
-
-
Cyrus Leung authored
-
- 06 Aug, 2024 1 commit
-
-
afeldman-nm authored
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) Co-authored-by:
Andrew Feldman <afeld2012@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
-
- 08 Jul, 2024 1 commit
-
-
afeldman-nm authored
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888) Co-authored-by:Woosuk Kwon <woosuk.kwon@berkeley.edu>
-
- 28 Jun, 2024 1 commit
-
-
Ilya Lavrenov authored
-
- 04 Jun, 2024 1 commit
-
-
afeldman-nm authored
[Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210)
-
- 22 May, 2024 1 commit
-
-
Cody Yu authored
-