[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 (#15211)
Signed-off-by:h-sugi <h.sugi@ieee.org> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>
Showing
Please register or sign in to comment
Signed-off-by:h-sugi <h.sugi@ieee.org> Co-authored-by:
Woosuk Kwon <woosuk.kwon@berkeley.edu>