"vscode:/vscode.git/clone" did not exist on "67f23568011d69149c80382e90b02613992237e6"
Commit 00e54337 authored by Graham King's avatar Graham King Committed by GitHub
Browse files

chore: Make debug profile use all optimizations (#317)

It hardly slows the build down, and it makes things run much faster. That allows us to switch to the debug (default) profile for development, and keep the release profile for, well, releasing.

Motivated by changes in https://github.com/ai-dynamo/dynamo/pull/279
parent d4d93b6a
......@@ -71,6 +71,11 @@ prometheus = { version = "0.13" }
[profile.dev.package]
insta.opt-level = 3
[profile.dev]
# release level optimizations otherwise everything feels slow
opt-level = 3
[profile.release]
# These make the build much slower but shrink the binary, and could help performance
codegen-units = 1
lto = true
......@@ -116,24 +116,26 @@ Optionally can run `cargo build` from any location with arguments:
- Linux with GPU and CUDA (tested on Ubuntu):
```
cargo build --release --features mistralrs,cuda
cargo build --features cuda
```
- macOS with Metal:
```
cargo build --release --features mistralrs,metal
cargo build --features metal
```
- CPU only:
```
cargo build --release --features mistralrs
cargo build
```
The binary will be called `dynamo-run` in `target/release`
The binary will be called `dynamo-run` in `target/debug`
```
cd target/release
cd target/debug
```
Build with `--release` for a smaller binary and better performance, but longer build times. The binary will be in `target/release`.
## sglang
1. Setup the python virtual env:
......@@ -149,7 +151,7 @@ uv pip install "sglang[all]==0.4.2" --find-links https://flashinfer.ai/whl/cu124
2. Build
```
cargo build --release --features sglang
cargo build --features sglang
```
3. Run
......@@ -168,7 +170,7 @@ dynamo-run in=none out=sglang --model-path ~/llm_models/DeepSeek-R1-Distill-Llam
## llama_cpp
- `cargo build --release --features llamacpp,cuda`
- `cargo build --features llamacpp,cuda`
- `dynamo-run out=llama_cpp --model-path ~/llm_models/Llama-3.2-3B-Instruct-Q6_K.gguf --model-config ~/llm_models/Llama-3.2-3B-Instruct/`
......@@ -197,7 +199,7 @@ uv pip install vllm==0.7.3 setuptools
Build:
```
cargo build --release --features vllm
cargo build --features vllm
```
Run (still inside that virtualenv) - HF repo:
......@@ -230,7 +232,7 @@ You can provide your own engine in a Python file. The file must provide a genera
async def generate(request):
```
Build: `cargo build --release --features python`
Build: `cargo build --features python`
### Python does the pre-processing
......@@ -343,7 +345,7 @@ TensorRT-LLM. Requires `clang` and `libclang-dev`.
Build:
```
cargo build --release --features trtllm
cargo build --features trtllm
```
Run:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment