[Core] Subclass ModelRunner to support cross-attention & encoder sequences...
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) Co-authored-by:Andrew Feldman <afeld2012@gmail.com> Co-authored-by:
Nick Hill <nickhill@us.ibm.com>
Showing
This diff is collapsed.
This diff is collapsed.
vllm/worker/utils.py
0 → 100644
Please register or sign in to comment