1. 17 Oct, 2025 1 commit
  2. 16 Oct, 2025 1 commit
  3. 07 Oct, 2025 3 commits
  4. 03 Oct, 2025 1 commit
  5. 30 Sep, 2025 2 commits
  6. 24 Sep, 2025 1 commit
  7. 18 Sep, 2025 1 commit
  8. 17 Sep, 2025 1 commit
  9. 05 Sep, 2025 1 commit
  10. 03 Sep, 2025 3 commits
  11. 22 Aug, 2025 2 commits
  12. 19 Aug, 2025 1 commit
  13. 15 Aug, 2025 2 commits
  14. 14 Aug, 2025 1 commit
  15. 06 Aug, 2025 1 commit
  16. 23 Jul, 2025 1 commit
  17. 18 Jul, 2025 2 commits
  18. 03 Jul, 2025 1 commit
  19. 26 Jun, 2025 1 commit
  20. 04 Jun, 2025 2 commits
  21. 02 Jun, 2025 2 commits
  22. 22 May, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Allow setting KV cache block size (#1175) · 183f2b32
      Graham King authored
      Example:
      ```
      dynamo-run out=<engine> <model> --kv-cache-block-size 64
      ```
      
      In a distributed system this goes on the worker node and is propagated to ingress via the model deployment card.
      
      Previously hard coded to 16, which is now the default.
      
      - Load context_length from model. Closes #1172
      - Store context length and KV cache block size in Model Deployment Card #1170
      183f2b32
  23. 21 May, 2025 2 commits
  24. 19 May, 2025 2 commits
    • Graham King's avatar
      feat: Support multiple models on single ingress node (#1127) · aeb79e62
      Graham King authored
      We can now do this:
      
      - Node 1:
      
      ```
      dynamo-run in=http out=dyn
      ```
      
      - Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:
      
      ```
      dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
      ```
      
      - Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:
      
      ```
      dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
      ```
      
      The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.
      
      As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.
      
      Also:
      - Refactor endpoint / instance naming now that I understand them
      - Fix removing models when their instance stops.
      aeb79e62
    • Tom O'Brien's avatar
      feat: Add OpenAI Embeddings interface in rust lib (#1110) · 73fdfb8a
      Tom O'Brien authored
      Implements OpenAI embeddings (interface only).
      
      - Adds ModelType::Embedding
      - Adds OpenAI embedding request/response structs
      - Adds support for embedding model discovery
      73fdfb8a
  25. 15 May, 2025 2 commits
    • Graham King's avatar
      chore: Prevent duplicate components with different models. (#1103) · 641234cd
      Graham King authored
      Each namespace is for a single pipeline, so a component must be model-unique. The means we can have several components with the same name running the same model (data parallel), their traffic will be routed according to `--router-mode`, but we cannot have several components with the same name running different models.
      
      Add an `ensure_unique` check to prevent that happening.
      641234cd
    • Graham King's avatar
      fix: Fix default RouterMode value (#1092) · 889ab67e
      Graham King authored
      The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).
      
      Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.
      
      Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.
      889ab67e
  26. 14 May, 2025 1 commit
  27. 01 May, 2025 1 commit