Unverified Commit 64da65b3 authored by shiyi.c_98's avatar shiyi.c_98 Committed by GitHub
Browse files

Prefix Caching- fix t4 triton error (#2517)

parent 5255d99d
......@@ -618,7 +618,9 @@ if triton.__version__ >= "2.1.0":
b_ctx_len,
max_input_len,
alibi_slopes=None):
BLOCK = 128
cap = torch.cuda.get_device_capability()
BLOCK = 128 if cap[0] >= 8 else 64
# shape constraints
Lq, Lk, Lv = q.shape[-1], k.shape[-1], v.shape[-1]
assert Lq == Lk and Lk == Lv
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment