• Jesse Gross's avatar
    ollamarunner: Separate text and multimodal graphs · 3c14461d
    Jesse Gross authored
    For some multimodal models (such as gemma3), we create a single
    graph that generates the image embedding and then use this in the
    text model. The embedding tensor is completely opaque to the runner.
    
    However, this doesn't work if we need to use the embedding in multiple
    batches. This can arise if the embedding is larger than the batch size.
    In these cases (as with llama4), we would like to create views that
    are more appropriately sized. However, if we do this then the original
    source tensor is used in multiple graphs, which isn't allowed. To
    avoid that problem, models with this pattern compute the embedding
    tensor on first use and recreate the individual views. There is no
    longer a single vision and text graph.
    
    This codifies the pattern of separating vision and text graphs. The
    logic of computing tensors on demand is moved to the runner, so models
    no longer have to worry about this. It also gives the runner visibility
    into the multimodal tensors, which is important for memory management.
    3c14461d
model.go 7.77 KB