1. 17 Mar, 2025 1 commit
  2. 13 Mar, 2025 1 commit
  3. 12 Mar, 2025 3 commits
  4. 10 Mar, 2025 2 commits
  5. 07 Mar, 2025 1 commit
    • Parth Sareen's avatar
      sample: improve ollama engine sampler performance (#9374) · 0682dae0
      Parth Sareen authored
      This change bring in various interface cleanups along with greatly improving the performance of the sampler.
      
      Tested with llama3.2 on local machine.
      Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
      Without topK performance is ~ 110 tokens/s
      0682dae0
  6. 25 Feb, 2025 1 commit