sample/transforms.go · a1cda80bcb0b47d493be9dc061a2dfa8a0ddd61c · OpenDAS / ollama

sample: improve ollama engine sampler performance (#9374) · 0682dae0

Parth Sareen authored Mar 07, 2025

This change bring in various interface cleanups along with greatly improving the performance of the sampler.

Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s

0682dae0

transforms.go 4.36 KB

Replace transforms.go