1. 07 Apr, 2025 1 commit
    • Graham King's avatar
      feat(dynamo-run): Basic routing choice (#524) · ec2e7307
      Graham King authored
      As a first step towards KV routing:
      - introduce a `--router-mode` in dynamo-run that only does random and round-robin right now. Not that interesting yet.
      - Make the vllm engine publish the KV events received from our patched vllm.
      
      Now we "just" need to connect the two. Easy right?
      ec2e7307
  2. 04 Apr, 2025 1 commit
    • Graham King's avatar
      feat: Python decorator dynamo_worker takes optional `static` parameter without etcd (#494) · 88ad3425
      Graham King authored
      Adds `@dynamo_worker(static = True)` to create a static worker which has a predictable name and hence does not require discovery or `etcd` to be running. There can only be a single static worker per namespace / component / endpoint trio.
      
      This contrasts with the default dynamic `dynamo_worker` endpoints we have now, which get a unique random name (based on namespace/component/endpoint), and are discovered by ingress components using etcd.
      
      Also change the hello_world example to use `dynamo_worker(static = True)` so that it is exercised and demonstrated somewhere.
      
      For NIM.
      88ad3425
  3. 24 Mar, 2025 1 commit
  4. 08 Mar, 2025 1 commit
  5. 07 Mar, 2025 1 commit
  6. 05 Mar, 2025 2 commits
  7. 04 Mar, 2025 1 commit
  8. 27 Feb, 2025 2 commits
  9. 25 Feb, 2025 6 commits
  10. 21 Feb, 2025 1 commit
    • Graham King's avatar
      feat(tio): Distributed inference! (#235) · 32a748e4
      Graham King authored
      Add support in tio for distributed components and discovery.
      
      Node 1:
      ```
      tio in=http out=tdr://ns/backend/mistralrs
      ```
      
      Node 2:
      ```
      tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
      ```
      
      This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.
      
      The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.
      32a748e4