docs: Updated macOS build instructions for dynamo-run. (#131)

05465f78 · Dmitry Tokarev · GitHub · cab65e1a · 05465f78 · 05465f78
Commit 05465f78 authored Mar 13, 2025 by Dmitry Tokarev Committed by GitHub Mar 13, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 79 additions and 36 deletions

container/Dockerfile.vllm container/Dockerfile.vllm +9 -7

launch/README.md launch/README.md +70 -29

No files found.
--- a/container/Dockerfile.vllm
+++ b/container/Dockerfile.vllm
@@ -57,8 +57,10 @@ RUN apt-get install -y linux-tools-common linux-tools-generic ethtool iproute2
 RUN apt-get install -y dkms linux-headers-generic
 RUN apt-get install -y meson ninja-build uuid-dev gdb
-RUN apt-get update && apt install -y wget libglib2.0-0
+RUN apt install -y libglib2.0-0
-RUN wget ${NSYS_URL}${NSYS_PKG} && dpkg -i $NSYS_PKG && rm $NSYS_PKG
+RUN wget ${NSYS_URL}${NSYS_PKG} &&\
+    apt install -y ./${NSYS_PKG} &&\
+    rm ${NSYS_PKG}
 RUN cd /usr/local/src && \
    curl -fSsL "https://content.mellanox.com/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-ubuntu24.04-x86_64.tgz" -o mofed.tgz && \
@@ -66,7 +68,7 @@ RUN cd /usr/local/src && \
    cd MLNX_OFED_LINUX-* && \
    apt-get update && apt-get install -y --no-install-recommends \
        ./DEBS/libibverbs* ./DEBS/ibverbs-providers* ./DEBS/librdmacm* ./DEBS/libibumad* && \
-    rm -rf /var/lib/apt/lists/* /usr/local/src/*
+    rm -rf /var/lib/apt/lists/* /usr/local/src/* mofed.tgz
 ENV LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64 \
    LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
@@ -212,26 +214,26 @@ COPY LICENSE /workspace/
 # Build Rust runtime
 COPY lib/runtime /workspace/lib/runtime
 RUN cd lib/runtime && \
-    cargo build --release --locked && cargo doc --no-deps
+    cargo build --jobs 2 --release --locked && cargo doc --no-deps
 # Build OpenAI HTTP Service binaries
 COPY lib/llm /workspace/lib/llm
 COPY components /workspace/components
 RUN cd components && \
-    cargo build --release && \
+    cargo build --jobs 2 --release && \
    cp target/release/http /usr/local/bin/
 # Build Dynamo Run binaries
 COPY launch /workspace/launch
 RUN cd launch && \
-    cargo build --release --features mistralrs,sglang,vllm,python && \
+    cargo build --jobs 2 --release --features mistralrs,sglang,vllm,python && \
    cp target/release/dynamo-run /usr/local/bin/ && \
    cp target/release/llmctl /usr/local/bin/
 # Generate C bindings for kv cache routing in vLLM
 COPY lib/bindings /workspace/lib/bindings
 RUN cd lib/bindings/c && \
-    cargo build --release --locked && cargo doc --no-deps
+    cargo build --jobs 2 --release --locked && cargo doc --no-deps
 COPY deploy/dynamo/sdk /workspace/deploy/dynamo/sdk
 # Build dynamo wheel

--- a/launch/dynamo-run/README.md
+++ b/launch/dynamo-run/README.md
@@ -4,9 +4,20 @@
 ## Setup
-Libraries (Ubuntu):
+Libraries Ubuntu:
 ```
-apt install -y build-essential libhwloc-dev libudev-dev pkg-config libssl-dev protobuf-compiler python3-dev
+apt install -y build-essential libhwloc-dev libudev-dev pkg-config libssl-dev libclang-dev protobuf-compiler python3-dev
+```
+Libraries macOS:
+```
+brew install cmake protobuf
+# install Xcode from App Store and check that Metal is accessible
+xcrun -sdk macosx metal
+# may have to install Xcode Command Line Tools:
+xcode-select --install
 ```
 Install Rust:
@@ -17,48 +28,78 @@ source $HOME/.cargo/env
 ## Build
- CUDA:
+Navigate to launch/ directory
+```
-`cargo build --release --features mistralrs,cuda`
+cd launch/
+```
+Optionally can run `cargo build` from any location with arguments:
+```
+--target-dir /path/to/target_directory` specify target_directory with write privileges
+--manifest-path /path/to/project/Cargo.toml` if cargo build is run outside of `launch/` directory
+```
- MAC w/ Metal:
+- Linux with GPU and CUDA (tested on Ubuntu):
+```
+cargo build --release --features mistralrs,cuda
+```
-`cargo build --release --features mistralrs,metal`
+- macOS with Metal:
+```
+cargo build --release --features mistralrs,metal
+```
 - CPU only:
+```
+cargo build --release --features mistralrs
+```
-`cargo build --release --features mistralrs`
+The binary will be called `dynamo-run` in `target/release`
+```
-The binary will be called `dynamo-run` in `$REPO_ROOT/launch/target/release`.
+cd target/release
+```
 ## Quickstart
+### Automatically download a model from [Hugging Face](https://huggingface.co/models)
+NOTE: for gated models (e.g. meta-llama/Llama-3.2-3B-Instruct) you have to have an `HF_TOKEN` environment variable set.
-If you have an `HF_TOKEN` environment variable set, this will download Qwen2.5 3B from Hugging Face (6 GiB download) and start it in interactive mode:
 ```
-dynamo-run Qwen/Qwen2.5-3B-Instruct
+./dynamo-run <HUGGING_FACE_ORGANIZATION/MODEL_NAME>
 ```
+This will automatically download Qwen2.5 3B from Hugging Face (6 GiB download) and start it in interactive text mode:
+`./dynamo-run Qwen/Qwen2.5-3B-Instruct`
 The parameter can be the ID of a HuggingFace repository (it will be downloaded), a GGUF file, or a folder containing safetensors, config.json, etc (a locally checked out HuggingFace repository).
 ## Download a model from Hugging Face
-One of these should be fast and good quality on almost any machine: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF
+One of these models should be high quality and fast on almost any machine: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF
+E.g. https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
-## Run
+Download model file:
+```
+curl -L -o Llama-3.2-3B-Instruct-Q4_K_M.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf?download=true"
+```
-*Text interface*
+## Run a model from local file
-`dynamo-run Llama-3.2-1B-Instruct-Q4_K_M.gguf` or path to a Hugging Face repo checkout instead of the GGUF.
+*Text interface*
+```
+dynamo-run Llama-3.2-3B-Instruct-Q4_K_M.gguf # or path to a Hugging Face repo checkout instead of the GGUF
+```
 *HTTP interface*
+```
+dynamo-run in=http Llama-3.2-3B-Instruct-Q4_K_M.gguf
+```
-`dynamo-run in=http Llama-3.2-1B-Instruct-Q4_K_M.gguf`
+*List the models*
+```
-List the models: `curl localhost:8080/v1/models`
+curl localhost:8080/v1/models
+```
-Send a request:
+*Send a request*
 ```
-curl -d '{"model": "Llama-3.2-1B-Instruct-Q4_K_M", "max_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions
+curl -d '{"model": "Llama-3.2-3B-Instruct-Q4_K_M", "max_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions
 ```
 *Multi-node*
@@ -196,21 +237,21 @@ Example engine:
 import asyncio
 async def generate(request):
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":"The","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":"The","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":" capital","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":" capital","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":" of","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":" of","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":" France","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":" France","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":" is","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":" is","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":" Paris","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":" Paris","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":".","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":".","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
    await asyncio.sleep(0.1)
-    yield {"id":"1","choices":[{"index":0,"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
+    yield {"id":"1","choices":[{"index":0,"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
 ```
 Command line arguments are passed to the python engine like this: