1. 16 Sep, 2025 1 commit
  2. 03 Sep, 2025 1 commit
  3. 02 Sep, 2025 1 commit
  4. 28 Aug, 2025 1 commit
  5. 23 Aug, 2025 1 commit
  6. 22 Aug, 2025 1 commit
  7. 21 Aug, 2025 2 commits
  8. 19 Aug, 2025 3 commits
  9. 18 Aug, 2025 1 commit
  10. 14 Aug, 2025 1 commit
  11. 07 Aug, 2025 2 commits
  12. 05 Aug, 2025 4 commits
  13. 28 Jul, 2025 1 commit
  14. 23 Jul, 2025 1 commit
  15. 18 Jul, 2025 2 commits
  16. 17 Jul, 2025 1 commit
  17. 16 Jul, 2025 1 commit
  18. 07 Jul, 2025 1 commit
  19. 01 Jul, 2025 1 commit
  20. 24 Jun, 2025 1 commit
  21. 13 Jun, 2025 1 commit
  22. 23 May, 2025 1 commit
  23. 22 May, 2025 2 commits
    • Graham King's avatar
      feat(dynamo-run): Allow setting context-length (#1157) · 6d5da821
      Graham King authored
      Llama 4 has a very large context length (aka n_ctx, model_max_length, max_model_len), and vllm won't start unless it can allocate enough KV cache for the entire context.
      
      Allow passing `--context-length <N>` to `dynamo-run` to limit it so long-context models will fit.
      
      Future todo:
      - Restrict every request's `max_tokens` to below the context length. Our pre-processor should do this by setting stop_conditions.max_tokens. mistralrs engine wrapper must do it itself because it does not use the pre-processor.
      - mistralrs and llamacpp currently have a hard-coded max context length if one is not provided on the command line. Change those to be the model's built-in max, read from the GGUF or tokenizer_config.json.
      6d5da821
    • jmswen's avatar
  24. 19 May, 2025 1 commit
    • Graham King's avatar
      feat: Support multiple models on single ingress node (#1127) · aeb79e62
      Graham King authored
      We can now do this:
      
      - Node 1:
      
      ```
      dynamo-run in=http out=dyn
      ```
      
      - Node 2 and 3, two instances of component 'backend' in the nemotron_ultra pipeline:
      
      ```
      dynamo-run in=dyn://nemotron_ultra.backend.generate out=vllm /data/models/NemotronUltra
      ```
      
      - Node 4 and 5, two instances of the 'backend' component in nemotron_super pipeline:
      
      ```
      dynamo-run in=dyn://nemotron_super.backend.generate out=vllm /data/models/NemotronSuper
      ```
      
      The ingress node will discover all four instances and route correctly. We have been planning for this for a long time now.
      
      As part of this auto-discovery is now always `out=dyn`, with no extra URL parts. Previously it could only route to a single pipeline.
      
      Also:
      - Refactor endpoint / instance naming now that I understand them
      - Fix removing models when their instance stops.
      aeb79e62
  25. 15 May, 2025 2 commits
    • Ryan McCormick's avatar
    • Graham King's avatar
      fix: Fix default RouterMode value (#1092) · 889ab67e
      Graham King authored
      The Python bindings use the default value for RouterMode. Previously that was Random (good), but now it became None (bad).
      
      Remove the option and clean up the duplicate RouterMode. I was trying to avoid putting the `KV` enum in dynamo-runtime. Turns out adding those two characters gives us a healthy simplification, and restores the old default router value.
      
      Also clean up two noisy log messages when waiting for KV routing metrics to start in worker.
      889ab67e
  26. 14 May, 2025 1 commit
  27. 29 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Split PushRouter from Client (#817) · a1a10365
      Graham King authored
      In a distributed system we don't know if the remote workers need pre-processing done ingress-side or not. Previously Client required us to decide this before discovering the remote endpoints, which was fine because pre-processing was worker-side.
      
      As part of moving pre-processing back to ingress-side we need to split this into two steps:
      - Client discovers the endpoints, and (later PR) will fetch their Model Deployment Card.
      - PushRouter will use the Model Deployment Card to decide if they need pre-processing or not, which affects the types of the generic parameters.
      
      Part of #743
      a1a10365
  28. 18 Apr, 2025 1 commit
  29. 04 Apr, 2025 1 commit
    • Graham King's avatar
      chore: Upgrade Rust to 1.86 (#518) · e99aa1e1
      Graham King authored
      Also upgrade the cargo resolver to v3, the default.
      
      New clippy lints:
      - `next_back()` instead of `last()` for a double-ended iterator. That avoids walking the whole list.
      - ` repeat_n` instead of `repeat.take`. That avoids cloning.
      - Doc indenting
      e99aa1e1
  30. 08 Mar, 2025 1 commit