"torchvision/transforms/v2/_deprecated.py" did not exist on "7de6317154a008d555ec63b590dda9e747a06ed6"
-
Jesse Gross authored
-Update mllama to take the cross attention state as embeddings in a batch, more similar to how Llava handles it. This improves integration with the input cache. -Pass locations in a prompt for embeddings using tags similar to Llava. -Abstract interface to vision models so the main runner accesses Clip and Mllama similarly Co-authored-by:Michael Yang <mxyng@pm.me>
c826e574