Unverified Commit 30132cd1 authored by Xiao Li's avatar Xiao Li Committed by GitHub
Browse files

Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (#35030)


Signed-off-by: default avatarXiao Li <ilx@meta.com>
parent cbd95a2d
...@@ -248,7 +248,7 @@ def apply_top_k_top_p( ...@@ -248,7 +248,7 @@ def apply_top_k_top_p(
if p is None and k is None: if p is None and k is None:
return logits return logits
if HAS_TRITON and logits.shape[0] >= 8: if HAS_TRITON and logits.shape[0] >= 8 and logits.is_cuda:
return apply_top_k_top_p_triton(logits, k, p) return apply_top_k_top_p_triton(logits, k, p)
# Use pytorch sort implementation for small batch sizes. # Use pytorch sort implementation for small batch sizes.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment