1. 20 Mar, 2025 1 commit
  2. 17 Mar, 2025 1 commit
  3. 13 Mar, 2025 1 commit
  4. 12 Mar, 2025 3 commits
  5. 10 Mar, 2025 2 commits
  6. 07 Mar, 2025 1 commit
    • Parth Sareen's avatar
      sample: improve ollama engine sampler performance (#9374) · 0682dae0
      Parth Sareen authored
      This change bring in various interface cleanups along with greatly improving the performance of the sampler.
      
      Tested with llama3.2 on local machine.
      Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
      Without topK performance is ~ 110 tokens/s
      0682dae0
  7. 25 Feb, 2025 1 commit