Commit 8ba8a855 authored by 王敏's avatar 王敏
Browse files

[fix]解决w8a8 低延迟 cudagraph启动报错

parent 0ae68da1
......@@ -471,9 +471,9 @@ def apply_int8_linear(
m_=m
#best_config=W8A8_TRITONJSON.triton_json_dict[0][f"{m}_{n}_{k}"]
elif m<=64:
m_= (m + 3) & -4 #取值到最近的4的倍数
m_= (m + 3) // 4 * 4 #(m + 3) & -4 #取值到最近的4的倍数
elif m<=160:
m_=(m + 7) & -8
m_= (m + 7) // 8 * 8 #(m + 7) & -8
elif m<200: #256
m_=160
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment