• Daniel Hiltgen's avatar
    perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd
    Daniel Hiltgen authored
    * perf: build graph for next batch in parallel to keep GPU busy
    
    This refactors the main run loop of the ollama runner to perform the main GPU
    intensive tasks (Compute+Floats) in a go routine so we can prepare the next
    batch in parallel to reduce the amount of time the GPU stalls waiting for the
    next batch of work.
    
    * tests: tune integration tests for ollama engine
    
    This tunes the integration tests to focus more on models supported
    by the new engine.
    517807cd
max_queue_test.go 3.51 KB