Commit 05465f78 authored by Dmitry Tokarev's avatar Dmitry Tokarev Committed by GitHub
Browse files

docs: Updated macOS build instructions for dynamo-run. (#131)

parent cab65e1a
...@@ -57,8 +57,10 @@ RUN apt-get install -y linux-tools-common linux-tools-generic ethtool iproute2 ...@@ -57,8 +57,10 @@ RUN apt-get install -y linux-tools-common linux-tools-generic ethtool iproute2
RUN apt-get install -y dkms linux-headers-generic RUN apt-get install -y dkms linux-headers-generic
RUN apt-get install -y meson ninja-build uuid-dev gdb RUN apt-get install -y meson ninja-build uuid-dev gdb
RUN apt-get update && apt install -y wget libglib2.0-0 RUN apt install -y libglib2.0-0
RUN wget ${NSYS_URL}${NSYS_PKG} && dpkg -i $NSYS_PKG && rm $NSYS_PKG RUN wget ${NSYS_URL}${NSYS_PKG} &&\
apt install -y ./${NSYS_PKG} &&\
rm ${NSYS_PKG}
RUN cd /usr/local/src && \ RUN cd /usr/local/src && \
curl -fSsL "https://content.mellanox.com/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-ubuntu24.04-x86_64.tgz" -o mofed.tgz && \ curl -fSsL "https://content.mellanox.com/ofed/MLNX_OFED-${MOFED_VERSION}/MLNX_OFED_LINUX-${MOFED_VERSION}-ubuntu24.04-x86_64.tgz" -o mofed.tgz && \
...@@ -66,7 +68,7 @@ RUN cd /usr/local/src && \ ...@@ -66,7 +68,7 @@ RUN cd /usr/local/src && \
cd MLNX_OFED_LINUX-* && \ cd MLNX_OFED_LINUX-* && \
apt-get update && apt-get install -y --no-install-recommends \ apt-get update && apt-get install -y --no-install-recommends \
./DEBS/libibverbs* ./DEBS/ibverbs-providers* ./DEBS/librdmacm* ./DEBS/libibumad* && \ ./DEBS/libibverbs* ./DEBS/ibverbs-providers* ./DEBS/librdmacm* ./DEBS/libibumad* && \
rm -rf /var/lib/apt/lists/* /usr/local/src/* rm -rf /var/lib/apt/lists/* /usr/local/src/* mofed.tgz
ENV LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64 \ ENV LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64 \
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
...@@ -212,26 +214,26 @@ COPY LICENSE /workspace/ ...@@ -212,26 +214,26 @@ COPY LICENSE /workspace/
# Build Rust runtime # Build Rust runtime
COPY lib/runtime /workspace/lib/runtime COPY lib/runtime /workspace/lib/runtime
RUN cd lib/runtime && \ RUN cd lib/runtime && \
cargo build --release --locked && cargo doc --no-deps cargo build --jobs 2 --release --locked && cargo doc --no-deps
# Build OpenAI HTTP Service binaries # Build OpenAI HTTP Service binaries
COPY lib/llm /workspace/lib/llm COPY lib/llm /workspace/lib/llm
COPY components /workspace/components COPY components /workspace/components
RUN cd components && \ RUN cd components && \
cargo build --release && \ cargo build --jobs 2 --release && \
cp target/release/http /usr/local/bin/ cp target/release/http /usr/local/bin/
# Build Dynamo Run binaries # Build Dynamo Run binaries
COPY launch /workspace/launch COPY launch /workspace/launch
RUN cd launch && \ RUN cd launch && \
cargo build --release --features mistralrs,sglang,vllm,python && \ cargo build --jobs 2 --release --features mistralrs,sglang,vllm,python && \
cp target/release/dynamo-run /usr/local/bin/ && \ cp target/release/dynamo-run /usr/local/bin/ && \
cp target/release/llmctl /usr/local/bin/ cp target/release/llmctl /usr/local/bin/
# Generate C bindings for kv cache routing in vLLM # Generate C bindings for kv cache routing in vLLM
COPY lib/bindings /workspace/lib/bindings COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \ RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps cargo build --jobs 2 --release --locked && cargo doc --no-deps
COPY deploy/dynamo/sdk /workspace/deploy/dynamo/sdk COPY deploy/dynamo/sdk /workspace/deploy/dynamo/sdk
# Build dynamo wheel # Build dynamo wheel
......
...@@ -4,9 +4,20 @@ ...@@ -4,9 +4,20 @@
## Setup ## Setup
Libraries (Ubuntu): Libraries Ubuntu:
``` ```
apt install -y build-essential libhwloc-dev libudev-dev pkg-config libssl-dev protobuf-compiler python3-dev apt install -y build-essential libhwloc-dev libudev-dev pkg-config libssl-dev libclang-dev protobuf-compiler python3-dev
```
Libraries macOS:
```
brew install cmake protobuf
# install Xcode from App Store and check that Metal is accessible
xcrun -sdk macosx metal
# may have to install Xcode Command Line Tools:
xcode-select --install
``` ```
Install Rust: Install Rust:
...@@ -17,48 +28,78 @@ source $HOME/.cargo/env ...@@ -17,48 +28,78 @@ source $HOME/.cargo/env
## Build ## Build
- CUDA: Navigate to launch/ directory
```
`cargo build --release --features mistralrs,cuda` cd launch/
```
Optionally can run `cargo build` from any location with arguments:
```
--target-dir /path/to/target_directory` specify target_directory with write privileges
--manifest-path /path/to/project/Cargo.toml` if cargo build is run outside of `launch/` directory
```
- MAC w/ Metal: - Linux with GPU and CUDA (tested on Ubuntu):
```
cargo build --release --features mistralrs,cuda
```
`cargo build --release --features mistralrs,metal` - macOS with Metal:
```
cargo build --release --features mistralrs,metal
```
- CPU only: - CPU only:
```
cargo build --release --features mistralrs
```
`cargo build --release --features mistralrs` The binary will be called `dynamo-run` in `target/release`
```
The binary will be called `dynamo-run` in `$REPO_ROOT/launch/target/release`. cd target/release
```
## Quickstart ## Quickstart
### Automatically download a model from [Hugging Face](https://huggingface.co/models)
NOTE: for gated models (e.g. meta-llama/Llama-3.2-3B-Instruct) you have to have an `HF_TOKEN` environment variable set.
If you have an `HF_TOKEN` environment variable set, this will download Qwen2.5 3B from Hugging Face (6 GiB download) and start it in interactive mode:
``` ```
dynamo-run Qwen/Qwen2.5-3B-Instruct ./dynamo-run <HUGGING_FACE_ORGANIZATION/MODEL_NAME>
``` ```
This will automatically download Qwen2.5 3B from Hugging Face (6 GiB download) and start it in interactive text mode:
`./dynamo-run Qwen/Qwen2.5-3B-Instruct`
The parameter can be the ID of a HuggingFace repository (it will be downloaded), a GGUF file, or a folder containing safetensors, config.json, etc (a locally checked out HuggingFace repository). The parameter can be the ID of a HuggingFace repository (it will be downloaded), a GGUF file, or a folder containing safetensors, config.json, etc (a locally checked out HuggingFace repository).
## Download a model from Hugging Face ## Download a model from Hugging Face
One of these should be fast and good quality on almost any machine: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF One of these models should be high quality and fast on almost any machine: https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF
E.g. https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf
## Run Download model file:
```
curl -L -o Llama-3.2-3B-Instruct-Q4_K_M.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/blob/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf?download=true"
```
*Text interface* ## Run a model from local file
`dynamo-run Llama-3.2-1B-Instruct-Q4_K_M.gguf` or path to a Hugging Face repo checkout instead of the GGUF. *Text interface*
```
dynamo-run Llama-3.2-3B-Instruct-Q4_K_M.gguf # or path to a Hugging Face repo checkout instead of the GGUF
```
*HTTP interface* *HTTP interface*
```
dynamo-run in=http Llama-3.2-3B-Instruct-Q4_K_M.gguf
```
`dynamo-run in=http Llama-3.2-1B-Instruct-Q4_K_M.gguf` *List the models*
```
List the models: `curl localhost:8080/v1/models` curl localhost:8080/v1/models
```
Send a request: *Send a request*
``` ```
curl -d '{"model": "Llama-3.2-1B-Instruct-Q4_K_M", "max_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions curl -d '{"model": "Llama-3.2-3B-Instruct-Q4_K_M", "max_tokens": 2049, "messages":[{"role":"user", "content": "What is the capital of South Africa?" }]}' -H 'Content-Type: application/json' http://localhost:8080/v1/chat/completions
``` ```
*Multi-node* *Multi-node*
...@@ -196,21 +237,21 @@ Example engine: ...@@ -196,21 +237,21 @@ Example engine:
import asyncio import asyncio
async def generate(request): async def generate(request):
yield {"id":"1","choices":[{"index":0,"delta":{"content":"The","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":"The","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":" capital","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":" capital","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":" of","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":" of","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":" France","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":" France","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":" is","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":" is","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":" Paris","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":" Paris","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":".","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":".","role":"assistant"}}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
await asyncio.sleep(0.1) await asyncio.sleep(0.1)
yield {"id":"1","choices":[{"index":0,"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}],"created":1841762283,"model":"Llama-3.2-1B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"} yield {"id":"1","choices":[{"index":0,"delta":{"content":"","role":"assistant"},"finish_reason":"stop"}],"created":1841762283,"model":"Llama-3.2-3B-Instruct","system_fingerprint":"local","object":"chat.completion.chunk"}
``` ```
Command line arguments are passed to the python engine like this: Command line arguments are passed to the python engine like this:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment