• Jesse Gross's avatar
    runner.go: Better abstract vision model integration · c826e574
    Jesse Gross authored
    
    
    -Update mllama to take the cross attention state as embeddings in
    a batch, more similar to how Llava handles it. This improves
    integration with the input cache.
    -Pass locations in a prompt for embeddings using tags similar to Llava.
    -Abstract interface to vision models so the main runner accesses Clip
    and Mllama similarly
    Co-authored-by: default avatarMichael Yang <mxyng@pm.me>
    c826e574
image_test.go 2.43 KB