• Graham King's avatar
    feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e
    Graham King authored
    . New mistralrs and llamacpp version
    . mistralrs: Handle Gemma 3 and Llama 4 as vision models
    . Update the dynamo-run docs to use Qwen 3
    . Our pre-processor now supports Llama 4's newer multi-modal `config.json`
    . Upgrade minijinja to handle Qwen 3's prompt template
    
    For Llama 4 we'll need to limit the max seq len. vllm says:
    > To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...
    
    I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.
    ceaeba3e
Cargo.lock 187 KB