1. 23 Apr, 2026 1 commit
  2. 12 Mar, 2026 1 commit
  3. 06 Mar, 2026 2 commits
  4. 02 Jan, 2026 1 commit
  5. 26 Nov, 2025 1 commit
  6. 03 Nov, 2025 1 commit
  7. 08 Oct, 2025 1 commit
  8. 30 Sep, 2025 1 commit
  9. 23 Sep, 2025 1 commit
  10. 15 Sep, 2025 1 commit
  11. 03 Sep, 2025 1 commit
  12. 27 Aug, 2025 1 commit
  13. 22 Aug, 2025 1 commit
  14. 20 Aug, 2025 1 commit
  15. 15 Aug, 2025 1 commit
  16. 13 Aug, 2025 1 commit
  17. 11 Aug, 2025 1 commit
  18. 18 Jul, 2025 1 commit
  19. 15 Jul, 2025 1 commit
  20. 30 Jun, 2025 1 commit
    • Graham King's avatar
      chore(dynamo-run): Refactor to library (#1687) · 92f06b0e
      Graham King authored
      Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.
      
      Example usage:
      
      1. Create a `LocalModel`:
      
      ```
          let local_model = LocalModelBuilder::default()
      	.model_path("Qwen/Qwen3-0.6B")
      	.http_port(8080)
      	.build().await?;
      ```
      
      2. Make an engine:
      
      ```
          let engine_config = EngineConfig::StaticFull {
      	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
      	model: Box::new(local_model),
          };
      ```
      
      3. Connect it to an input and run it
      
      ```
          dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
      ```
      
      For https://github.com/ai-dynamo/dynamo/issues/1647
      
      Code Rabbit summary, thanks:
        * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
        * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
        * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
        * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
        * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
        * Streamlined configuration and validation for flags and router settings.
        * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.
      92f06b0e
  21. 11 Jun, 2025 1 commit
  22. 04 Jun, 2025 1 commit
    • Graham King's avatar
      feat: Support larger Gemma 3 models (#1359) · cfd12d7f
      Graham King authored
      Publish `generation_config.json` from worker to ingress, as part of Model Deployment Card. That allows ingress to read key fields out of it. Gemma 3 4B+ has some important information that's only in there.
      cfd12d7f
  23. 22 May, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32
      Graham King authored
      Example:
      ```
      dynamo-run out=<engine> <model> --kv-cache-block-size 64
      ```
      
      In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.
      
      Previously hard coded to 16, which is now the default.
      
      - Load context_length from model. Closes #1172
      - Store context length and KV cache block size in Model Deployment Card #1170
      183f2b32
  24. 21 May, 2025 3 commits
  25. 14 May, 2025 1 commit
  26. 09 May, 2025 1 commit
  27. 06 May, 2025 1 commit
    • Graham King's avatar
      feat: dynamo-run <-> python interop (#934) · 99cd9d85
      Graham King authored
      Adding this to a Python script makes it register on the network so that `dynamo-run` can discover it and send it requests:
      ```
      from dynamo.llm import register_llm
      
      MODEL = "Qwen/Qwen2.5-0.5B-Instruct"
      await register_llm(endpoint, MODEL, 3)
      ```
      
      Full vllm example, with pre-processing in dynamo:
      - `dynamo-run in=text out=dyn://dynamo.backend.generate`
      - `cd lib/bindings/python/examples/hello_world`
      - `python server_vllm.py`
      
      This builds on top of the work to move pre-processor to ingress side. It means we can decouple Rust and Python using NATS as the bus.
      
      The `register_llm` call does this:
      
      - Download the model from HF if necessary
      - Load the model deployment card from the HF folder or extract from GGUF
      - Push the tokenizer config etc into NATS object store so ingress can access it from a different machine
      - Publish the model deployment card to ETCD
      99cd9d85
  28. 29 Apr, 2025 1 commit
    • Abrar Shivani's avatar
      feat: Add request template support for default inference parameters (#841) · adad2ecd
      Abrar Shivani authored
      Adds support for specifying default request parameters through a json template file that can be applied across all inference requests. This enables consistent parameter settings while still allowing per-request overrides.
      
      Changes:
      - Add --request-template CLI flag to specify template file path
      - Integrate template support in HTTP, batch and text input modes
      - Template values can be overridden by individual request parameters
      - Example template.json:
      ```
      {
          "model": "Qwen2.5-3B-Instruct",
          "temperature": 0.7,
          "max_completion_tokens": 4096
      }
      ```
      adad2ecd
  29. 25 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Publish Model Deployment Card to NATS (#799) · d346782c
      Graham King authored
      This will allow an ingress-side pre-processor to see it without needing a model checkout.
      
      Currently pre-processing is done in the worker, which has access to the model deployment card ("MDC") files (`config.json`, `tokenizer.json` and `tokenizer_config.json`) locally. We want to move the pre-processor to the ingress side to support KV routing. That requires ingress side (i.e the HTTP server), on a different machine than the worker to be able to see those three files.
      
      To support that this PR makes the worker upload the contents of those files to the NATS object store, and publishes the MDC with those NATS urls to the key-value store. 
      
      The key-value store has an interface so any store (nats, etcd, redis, etc) can be supported. Implementations for memory and NATS are provided.
      
      Fetching the MDC from the store, doing pre-processing ingress side, and publishing a card backed by a GGUF, are all for a later commit.
      
      Part of #743 
      d346782c
  30. 04 Apr, 2025 1 commit
  31. 24 Mar, 2025 1 commit
  32. 14 Mar, 2025 1 commit
  33. 09 Mar, 2025 1 commit
  34. 08 Mar, 2025 1 commit
  35. 05 Mar, 2025 1 commit
  36. 25 Feb, 2025 2 commits