• Parth Sareen's avatar
    sample: improve ollama engine sampler performance (#9374) · 0682dae0
    Parth Sareen authored
    This change bring in various interface cleanups along with greatly improving the performance of the sampler.
    
    Tested with llama3.2 on local machine.
    Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
    Without topK performance is ~ 110 tokens/s
    0682dae0
transforms_test.go 3.98 KB