• Jesse Gross's avatar
    discover: Disable flash attention for Jetson Xavier (CC 7.2) · aa45f7ce
    Jesse Gross authored
    GGML picks the wrong kernel and these systems fail with:
    Sep 28 22:25:39 xavier ollama[48999]: //ml/backend/ggml/ggml/src/ggml-cuda/fattn-wmma-f16.cu:437:
    ERROR: CUDA kernel flash_attn_ext_f16 has no device code compatible with CUDA arch 720. ggml-cuda.cu
    was compiled for: __CUDA_ARCH_LIST__
    
    Fixes #12442
    aa45f7ce
types.go 5.82 KB