Unverified Commit 76f44c2a authored by Richard Zou's avatar Richard Zou Committed by GitHub
Browse files

Fix deepseek-v3 with torch.compile in PyTorch 2.6. (#5213)

parent 1078396f
......@@ -177,7 +177,8 @@ TORCH_LIBRARY_FRAGMENT(sgl_kernel, m) {
*/
m.def(
"bmm_fp8(Tensor A, Tensor B, Tensor! D, Tensor A_scale, Tensor B_scale, Tensor workspace_buffer, int "
"cublas_handle, int cuda_stream) -> ()");
"cublas_handle, int cuda_stream) -> ()",
{at::Tag::needs_fixed_stride_order});
m.impl("bmm_fp8", torch::kCUDA, &bmm_fp8);
m.def(
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment