1. 29 Aug, 2025 1 commit
    • Daniel Hiltgen's avatar
      perf: build graph for next batch async to keep GPU busy (#11863) · 517807cd
      Daniel Hiltgen authored
      * perf: build graph for next batch in parallel to keep GPU busy
      
      This refactors the main run loop of the ollama runner to perform the main GPU
      intensive tasks (Compute+Floats) in a go routine so we can prepare the next
      batch in parallel to reduce the amount of time the GPU stalls waiting for the
      next batch of work.
      
      * tests: tune integration tests for ollama engine
      
      This tunes the integration tests to focus more on models supported
      by the new engine.
      517807cd
  2. 16 Apr, 2025 1 commit
  3. 08 Apr, 2025 1 commit
  4. 02 Apr, 2025 1 commit
  5. 10 Dec, 2024 1 commit
  6. 22 Nov, 2024 1 commit
  7. 05 Aug, 2024 1 commit
  8. 22 Jul, 2024 1 commit
  9. 16 May, 2024 1 commit
    • Daniel Hiltgen's avatar
      Skip max queue test on remote · 7f2fbad7
      Daniel Hiltgen authored
      This test needs to be able to adjust the queue size down from
      our default setting for a reliable test, so it needs to skip on
      remote test execution mode.
      7f2fbad7
  10. 05 May, 2024 1 commit