- 08 Dec, 2025 1 commit
-
-
Michael Yang authored
change to a flatter directory structure and group the options with the function update models to call rope in one place
-
- 02 Dec, 2025 1 commit
-
-
Patrick Devine authored
This change: * fixes rope scaling in the mistral converter * updates ministral to include llama4 scaling * includes a new ministral parser for parsing reasoning and tool calling --------- Co-authored-by:jmorganca <jmorganca@gmail.com>
-
- 13 Nov, 2025 1 commit
-
-
Michael Yang authored
* use slice/chunks * bert * llama4 * gemma3n * gptoss * mistral3 * qwen3vl * qwen25vl * deepseek2 * remove unused ops
-
- 28 Oct, 2025 1 commit
-
-
Michael Yang authored
-
- 23 Sep, 2025 1 commit
-
-
Michael Yang authored
-
- 17 Sep, 2025 1 commit
-
-
Michael Yang authored
* fix(llama): rope scale * spm llama * skip moe models * cleanup
-
- 16 Sep, 2025 1 commit
-
-
Michael Yang authored
* use ggml_*_split activations when possible * forward qkv
-
- 15 Sep, 2025 1 commit
-
-
Michael Yang authored
this cleans up the model interface slightly without too much impact in other areas
-
- 29 Aug, 2025 1 commit
-
-
Daniel Hiltgen authored
* perf: build graph for next batch in parallel to keep GPU busy This refactors the main run loop of the ollama runner to perform the main GPU intensive tasks (Compute+Floats) in a go routine so we can prepare the next batch in parallel to reduce the amount of time the GPU stalls waiting for the next batch of work. * tests: tune integration tests for ollama engine This tunes the integration tests to focus more on models supported by the new engine.
-
- 25 Aug, 2025 1 commit
-
-
Michael Yang authored
-
- 22 May, 2025 1 commit
-
-
Jesse Gross authored
FromFloatSlice and FromIntSlice return an error if the shape doesn't match the passed data or if memory can't be allocated. Since these are inputs, the memory being allocated is system memory rather than VRAM. In many cases, the caller can't really handle the error and panics. Empty and Zeros directly panic if they can't allocate memory. This makes things consistent by panicing for the first two cases, removing a fair amount of error handling code. This is also consistent with how Go typically handles these situations.
-
- 20 May, 2025 1 commit
-
-
Michael Yang authored
-
- 19 May, 2025 1 commit
-
-
Michael Yang authored
* fix llama model * fix mistral3.1 model do not set default vision layers
-
- 16 May, 2025 1 commit
-
-
Michael Yang authored
* get eos_token_id from generation_config.json * refactor * include both ids and strings in trace * comments * remove special case for gemma3 special vocab (#10743)
-
- 15 May, 2025 1 commit
-
-
Jesse Gross authored
For some multimodal models (such as gemma3), we create a single graph that generates the image embedding and then use this in the text model. The embedding tensor is completely opaque to the runner. However, this doesn't work if we need to use the embedding in multiple batches. This can arise if the embedding is larger than the batch size. In these cases (as with llama4), we would like to create views that are more appropriately sized. However, if we do this then the original source tensor is used in multiple graphs, which isn't allowed. To avoid that problem, models with this pattern compute the embedding tensor on first use and recreate the individual views. There is no longer a single vision and text graph. This codifies the pattern of separating vision and text graphs. The logic of computing tensors on demand is moved to the runner, so models no longer have to worry about this. It also gives the runner visibility into the multimodal tensors, which is important for memory management.
-
- 13 May, 2025 1 commit
-
-
Michael Yang authored
-
- 25 Apr, 2025 1 commit
-
-
Michael Yang authored
-
- 24 Apr, 2025 1 commit
-
-
Parth Sareen authored
-
- 03 Apr, 2025 1 commit
-
-
Bruce MacDonald authored
Mistral is a popular research lab making open source models. This updates the forward pass of llama architecture models to support both llama models and mistral models by accounting for additional metadata present in mistral models, and finding the correct dimensions for the output projection.
-