1. 07 Apr, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Basic routing choice (#524) · ec2e7307
      Graham King authored
      As a first step towards KV routing:
      - introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet.
      - Make the vllm engine publish the KV events received from our patched vllm.
      
      Now we "just" need to connect the two. Easy right?
      ec2e7307
  2. 03 Apr, 2025 1 commit
  3. 25 Mar, 2025 1 commit
  4. 24 Mar, 2025 1 commit
  5. 19 Mar, 2025 1 commit
  6. 17 Mar, 2025 1 commit
    • Graham King's avatar
      fix(vllm,sglang): Let the engine enforce max tokens (#216) · 05765cd4
      Graham King authored
      Previously several parts of the stack ensured max tokens (for this single request) was set.
      
      Now only text input sets it (to 8k). Everything else leaves as is, potentially blank. The engines themselves have very small defaults, 16 for vllm and 128 for sglang.
      
      Also fix dynamo-run CUDA startup message to only print if we're using an engine that would benefit from it (mistralrs, llamacpp).
      05765cd4
  7. 14 Mar, 2025 1 commit
  8. 13 Mar, 2025 2 commits
  9. 11 Mar, 2025 1 commit
  10. 08 Mar, 2025 1 commit
  11. 05 Mar, 2025 1 commit
  12. 04 Mar, 2025 1 commit
  13. 28 Feb, 2025 1 commit