server/sched.go · 82a02e18d96ce2fff1791e6d1a080d3afa11370e · OpenDAS / ollama

sched: Lift parallel restriction for multimodal models except mllama · 6cd56687

Jesse Gross authored Oct 30, 2024

The Go runner does not have a problem with supporting parallel
requests for most multimodal models. Now that we won't be potentially
falling back to server.cpp, this restriction can be lifted.

However, the new mllama model can't support parallel requests, so we
will need to keep a restriction for that.

6cd56687

sched.go 29 KB

Replace sched.go