fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011)

That avoids passing the `--model-config` param to dynamo-run when using llamacpp.

fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch (#1011)
That avoids passing the `--model-config` param to dynamo-run when using llamacpp.
d2768c22 · Graham King · GitHub · e9cb035a · d2768c22 · d2768c22
Unverified Commit d2768c22 authored May 09, 2025 by Graham King Committed by GitHub May 09, 2025
Show whitespace changes
Inline Side-by-side

Showing with 4 additions and 3 deletions

docs/guides/dynamo_run.md docs/guides/dynamo_run.md +2 -3

lib/llm/src/gguf.rs lib/llm/src/gguf.rs +2 -0

No files found.
--- a/docs/guides/dynamo_run.md
+++ b/docs/guides/dynamo_run.md
@@ -201,13 +201,12 @@ cargo build --features llamacpp[,cuda|metal|vulkan] -p dynamo-run
 ```
 ```
-dynamo-run out=llamacpp ~/llms/Llama-3.2-3B-Instruct-Q6_K.gguf
+dynamo-run out=llamacpp ~/llms/gemma-3-1b-it-q4_0.gguf
+dynamo-run out=llamacpp ~/llms/Qwen3-0.6B-Q8_0.gguf # From https://huggingface.co/ggml-org
 ```
 Note that in some cases we are unable to extract the tokenizer from the GGUF, and so a Hugging Face checkout of a matching model must also be passed. Dynamo will use the weights from the GGUF and the pre-processor (`tokenizer.json`, etc) from the `--model-config`:
 ```
-dynamo-run out=llamacpp ~/llms/gemma-3-1b-it-q4_0.gguf --model-config ~/llms/gemma-3-1b-it
 dynamo-run out=llamacpp ~/llms/Llama-4-Scout-17B-16E-Instruct-UD-IQ1_S.gguf --model-config ~/llms/Llama-4-Scout-17B-16E-Instruct
 ```

--- a/lib/llm/src/gguf.rs
+++ b/lib/llm/src/gguf.rs
@@ -56,6 +56,8 @@ pub enum GGUFArchitecture {
    Phi3,
    Starcoder2,
    Qwen2,
+    Qwen3,
+    Gemma3,
 }
 // Wraps from_str() for some convenience: