Commits · 21fce9ba0dee13319fe66525bf2c7dc55109af21 · OpenDAS / dynamo

"components/backends/sglang/vscode:/vscode.git/clone" did not exist on "844f8819a1af7dce6eff2d0b7911b1ed1c018ab3"

25 Feb, 2026 1 commit
- feat: Tiktoken support (#6460) · 21fce9ba
  Nikita authored Feb 25, 2026
```
Signed-off-by: Nikita Sukharev <kaonael@gmail.com>
```
  21fce9ba
15 Sep, 2025 1 commit
- fix: Handle invalid JSON in config.json (#3043) · b1186aee
  Graham King authored Sep 15, 2025
```
Signed-off-by: Graham King <grahamk@nvidia.com>
```
  b1186aee
22 May, 2025 1 commit

feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32

Graham King authored May 22, 2025

Example:
```
dynamo-run out=<engine> <model> --kv-cache-block-size 64
```

In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.

Previously hard coded to 16, which is now the default.

- Load context_length from model. Closes #1172
- Store context length and KV cache block size in Model Deployment Card #1170

183f2b32

08 May, 2025 1 commit

feat: Qwen3, Gemma3 and Llama4 support (#1002) · ceaeba3e

Graham King authored May 08, 2025

. New mistralrs and llamacpp version
. mistralrs: Handle Gemma 3 and Llama 4 as vision models
. Update the dynamo-run docs to use Qwen 3
. Our pre-processor now supports Llama 4's newer multi-modal `config.json`
. Upgrade minijinja to handle Qwen 3's prompt template

For Llama 4 we'll need to limit the max seq len. vllm says:
> To serve at least one request with the models's max seq len (10485760), (240.00 GiB KV cache is needed,...

I was able to run Llama 4 with llamacpp and a quantized GGUF, with Dynamo doing the pre-processing.

ceaeba3e

25 Feb, 2025 1 commit

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9