1. 13 Dec, 2025 2 commits
  2. 12 Dec, 2025 1 commit
  3. 09 Dec, 2025 1 commit
  4. 08 Dec, 2025 1 commit
    • Michael Yang's avatar
      refactor rope · 603ceefa
      Michael Yang authored
      change to a flatter directory structure and group the options with the
      function
      
      update models to call rope in one place
      603ceefa
  5. 28 Oct, 2025 2 commits
  6. 19 Sep, 2025 1 commit
  7. 17 Sep, 2025 1 commit
  8. 16 Sep, 2025 1 commit
  9. 15 Sep, 2025 1 commit
  10. 04 Sep, 2025 1 commit
  11. 20 May, 2025 1 commit
  12. 15 May, 2025 1 commit
    • Jesse Gross's avatar
      ollamarunner: Separate text and multimodal graphs · 3c14461d
      Jesse Gross authored
      For some multimodal models (such as gemma3), we create a single
      graph that generates the image embedding and then use this in the
      text model. The embedding tensor is completely opaque to the runner.
      
      However, this doesn't work if we need to use the embedding in multiple
      batches. This can arise if the embedding is larger than the batch size.
      In these cases (as with llama4), we would like to create views that
      are more appropriately sized. However, if we do this then the original
      source tensor is used in multiple graphs, which isn't allowed. To
      avoid that problem, models with this pattern compute the embedding
      tensor on first use and recreate the individual views. There is no
      longer a single vision and text graph.
      
      This codifies the pattern of separating vision and text graphs. The
      logic of computing tensors on demand is moved to the runner, so models
      no longer have to worry about this. It also gives the runner visibility
      into the multimodal tensors, which is important for memory management.
      3c14461d
  13. 13 May, 2025 1 commit
  14. 25 Apr, 2025 1 commit
  15. 03 Apr, 2025 2 commits
  16. 02 Apr, 2025 1 commit
  17. 20 Mar, 2025 1 commit
  18. 14 Mar, 2025 1 commit
    • Jesse Gross's avatar
      ml: Allow models to constrain inputs to a single batch · 9679f401
      Jesse Gross authored
      Models may require that a set of inputs all be processed as part
      of the same batch. For example, if an image has multiple patches
      with fully connected attention between them, we should not split
      the batch in the middle of an image.
      
      Fixes #9697
      9679f401
  19. 12 Mar, 2025 1 commit
  20. 11 Mar, 2025 13 commits