Commits · 32a748e4030859dbb2a4dd9eaac389e2c84966b3 · OpenDAS / dynamo

21 Feb, 2025 2 commits

feat(tio): Distributed inference! (#235) · 32a748e4

Graham King authored Feb 21, 2025

Add support in tio for distributed components and discovery.

Node 1:
```
tio in=http out=tdr://ns/backend/mistralrs
```

Node 2:
```
tio in=tdr://ns/backend/mistralrs out=mistralrs ~/llm_models/Llama-3.2-3B-Instruct
```

This will use etcd to auto-discover the model and NATS to talk to it. You can run multiple workers on the same endpoint and it will pick one at random each time.

The `ns/backend/mistralrs` are purely symbolic, pick anything as long as it has three parts, and it matches the other node.

32a748e4

feat: event plane + count · 3b7a462d

Ryan Olson authored Feb 21, 2025


Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

3b7a462d

20 Feb, 2025 3 commits
- feat(tio): Defaults for in and out, support HF repos (#223) · 7ab5df5d
  Graham King authored Feb 20, 2025
```
You can now run an HF repo directly:
```
  tio ~/llm_models/Llama-3.2-1B-Instruct/
```

or a GGUF
```
  tio ~/llm_models/Llama-3.2-1B-Instruct-Q4_K_M.gguf
```

Also cleanup kv_router so I can merge.
```
  7ab5df5d
- feat: add cli args for example http service (#221) · 73c10ae9
  Biswa Panda authored Feb 20, 2025
```
Co-authored-by: Biswa Ranjan Panda <biswaranjanp@nvidia.com>
```
  73c10ae9
- feat: add local model card (#216) · 65a2dfab
  Biswa Panda authored Feb 20, 2025
  
  65a2dfab
19 Feb, 2025 1 commit
- test: add unit tests for RuntimeConfig (#215) · 7f85dcc3
  Thomas Montfort authored Feb 19, 2025
  
  7f85dcc3
18 Feb, 2025 2 commits

feat: http + llmctl (#181) · d0d35a9e
Ryan Olson authored Feb 18, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
d0d35a9e

feat: Add KV publisher and receiver. Add KV aware routing example. · 8588e33a

GuanLuo authored Feb 18, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: aflowers <aflowers@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: hongkuanz <hongkuanz@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

8588e33a

14 Feb, 2025 2 commits

fix: Unique IDs for mistralrs requests (#186) · 45b3505c
Graham King authored Feb 14, 2025
```
Upgrade mistralrs to latest.
```
45b3505c

feat: Add a mistralrs engine to tio (#178) · 2f700421

Graham King authored Feb 14, 2025

This allows us to run a real model.

Build:
```
cargo build --release --features mistralrs,cuda
```

Run:
```
./target/release/tio in=text out=mistralrs --model-path Llama-3.2-1B-Instruct-Q4_K_M.gguf
```

Why [mistral.rs](https://github.com/EricLBuehler/mistral.rs)?

- It has no dependencies. You don't need a container or a virtual env to get started.
- It supports CUDA, Metal (MacOS) and CPU-only. Everyone can join the AI revolution.
- It starts fast and serves fast (with CUDA). That makes it fun to experiment with.
- It runs many models, not just Mistral, that's just it's name.

2f700421

13 Feb, 2025 1 commit
- fix: tcp updates + initial zmq (#176) · 2fd6592f
  Ryan Olson authored Feb 13, 2025
  
  2fd6592f
12 Feb, 2025 1 commit

fix: tcp retry and error handling updates (#169) · dddebc0d

Ryan Olson authored Feb 12, 2025


Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

dddebc0d

11 Feb, 2025 2 commits
- chore: Again: Add rust-toolchain so we're all on the same version (#160) · e1bd07fe
  Graham King authored Feb 11, 2025
  
  e1bd07fe
- chore: update rust versions to v0.2.0 (#155) · 2e409565
  Anant Sharma authored Feb 10, 2025
```
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
```
  2e409565
10 Feb, 2025 1 commit

feat: OpenAI compatible http service (#123) · ffc6dde1

Ryan Olson authored Feb 10, 2025


Signed-off-by: Ryan Olson <ryanolson@users.noreply.github.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>
Co-authored-by: Neelay Shah <neelays@nvidia.com>

ffc6dde1