Commits · 48a59890e057acb141fa29e8306de92708a0f55d · OpenDAS / dynamo

13 Mar, 2025 1 commit

feat(mistralrs): Let the engine enforce max tokens (#134) · 404a78e9

Graham King authored Mar 13, 2025

Previously we tokenized and counted tokens to stop when max tokens was reached. Now we let the mistral.rs engine do it which saves the extra tokenization step.

Also dynamo-run prints which engines are compiled in in help message, and some minor lint fixes.

404a78e9

11 Mar, 2025 1 commit

feat(dynamo-run): Upgrade mistral.rs (#97) · d99b188d

Graham King authored Mar 11, 2025

- Latest from repo, many improvements
- Support most of the OpenAI request features (temperature, top_p, etc)
- Download models from Hugging Face if necessary

d99b188d

08 Mar, 2025 1 commit
- chore: rename dynamo (#44) · 602352ce
  Neelay Shah authored Mar 08, 2025
```
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
```
  602352ce
05 Mar, 2025 2 commits
- fix: mistralrs use auto device map (#31) · 46ed649c
  Graham King authored Mar 05, 2025
```
Fixes a panic.
```
  46ed649c
- refactor: rename triton_distributed to dynemo (#22) · 1af7433b
  Neelay Shah authored Mar 05, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  1af7433b
27 Feb, 2025 2 commits
- feat: llama.cpp engine for tio (#298) · e584e96f
  Graham King authored Feb 27, 2025
```
Docs in README
```
  e584e96f
- refactor: rename ChatCompletionResponseDelta to NvCreateChatCompletionStreamResponse (#292) · 110f3f8c
  Paul Hendricks authored Feb 27, 2025
  
  110f3f8c
26 Feb, 2025 1 commit
- refactor: using async_openai · 86aff237
  Paul Hendricks authored Feb 26, 2025
```
Co-authored-by: Graham King <grahamk@nvidia.com>
```
  86aff237
25 Feb, 2025 1 commit

refactor: move libs to lib dir · 08fcd7e9

Neelay Shah authored Feb 24, 2025


Signed-off-by: Neelay Shah <neelays@nvidia.com>
Co-authored-by: Ryan McCormick <rmccormick@nvidia.com>

08fcd7e9

20 Feb, 2025 1 commit

feat(tio): Defaults for in and out, support HF repos (#223) · 7ab5df5d

Graham King authored Feb 20, 2025

You can now run an HF repo directly:
```
tio ~/llm_models/Llama-3.2-1B-Instruct/
```

or a GGUF
```
tio ~/llm_models/Llama-3.2-1B-Instruct-Q4_K_M.gguf
```

Also cleanup kv_router so I can merge.

7ab5df5d

14 Feb, 2025 2 commits

fix: Unique IDs for mistralrs requests (#186) · 45b3505c
Graham King authored Feb 14, 2025
```
Upgrade mistralrs to latest.
```
45b3505c

feat: Add a mistralrs engine to tio (#178) · 2f700421

Graham King authored Feb 14, 2025

This allows us to run a real model.

Build:
```
cargo build --release --features mistralrs,cuda
```

Run:
```
./target/release/tio in=text out=mistralrs --model-path Llama-3.2-1B-Instruct-Q4_K_M.gguf
```

Why [mistral.rs](https://github.com/EricLBuehler/mistral.rs)?

- It has no dependencies. You don't need a container or a virtual env to get started.
- It supports CUDA, Metal (MacOS) and CPU-only. Everyone can join the AI revolution.
- It starts fast and serves fast (with CUDA). That makes it fun to experiment with.
- It runs many models, not just Mistral, that's just it's name.

2f700421