Commit 602352ce authored by Neelay Shah's avatar Neelay Shah Committed by GitHub
Browse files

chore: rename dynamo (#44)


Co-authored-by: default avatarBiswa Panda <biswa.panda@gmail.com>
parent ecf53ce2
...@@ -19,7 +19,7 @@ ...@@ -19,7 +19,7 @@
**/*.plan **/*.plan
**/.cache/* **/.cache/*
**/*onnx* **/*onnx*
# Engine must be allowed because code contains dynemo_engine.py # Engine must be allowed because code contains dynamo_engine.py
**/*tensorrtllm_engines* **/*tensorrtllm_engines*
**/*tensorrtllm_models* **/*tensorrtllm_models*
**/*tensorrtllm_checkpoints* **/*tensorrtllm_checkpoints*
......
...@@ -23,4 +23,4 @@ jobs: ...@@ -23,4 +23,4 @@ jobs:
env: env:
NVBUILD_VERBOSITY: DETAILED NVBUILD_VERBOSITY: DETAILED
timeout-minutes: 2 timeout-minutes: 2
working-directory: /workspace working-directory: /workspace
\ No newline at end of file
...@@ -40,7 +40,7 @@ jobs: ...@@ -40,7 +40,7 @@ jobs:
pre-merge-rust: pre-merge-rust:
runs-on: ubuntu-latest runs-on: ubuntu-latest
strategy: strategy:
matrix: { dir: ['lib/runtime', 'lib/llm', 'lib/bindings/c', 'lib/bindings/python', 'launch/dynemo-run', 'components/metrics', 'examples/rust'] } matrix: { dir: ['lib/runtime', 'lib/llm', 'lib/bindings/c', 'lib/bindings/python', 'launch/dynamo-run', 'components/metrics', 'examples/rust'] }
permissions: permissions:
contents: read contents: read
steps: steps:
......
...@@ -17,7 +17,7 @@ limitations under the License. ...@@ -17,7 +17,7 @@ limitations under the License.
# Open Source License Attribution # Open Source License Attribution
Dynemo uses Open Source components. You can find the details of these open-source projects along with license information below. Dynamo uses Open Source components. You can find the details of these open-source projects along with license information below.
We are grateful to the developers for their contributions to open source and acknowledge these below. We are grateful to the developers for their contributions to open source and acknowledge these below.
## nats-py - [Apache License 2.0](https://github.com/nats-io/nats.py/blob/main/LICENSE) ## nats-py - [Apache License 2.0](https://github.com/nats-io/nats.py/blob/main/LICENSE)
......
# CODEOWNERS file for Dynemo # CODEOWNERS file for Dynamo
# #
# For more information about CODEOWNERS files, see: # For more information about CODEOWNERS files, see:
# https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners # https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
......
...@@ -15,17 +15,17 @@ See the License for the specific language governing permissions and ...@@ -15,17 +15,17 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
# Dynemo # Dynamo
<h4> A Datacenter Scale Distributed Inference Serving Framework </h4> <h4> A Datacenter Scale Distributed Inference Serving Framework </h4>
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![GitHub Release](https://img.shields.io/github/v/release/dynemo-ai/dynemo)](https://github.com/dynemo-ai/dynemo/releases/latest) [![GitHub Release](https://img.shields.io/github/v/release/dynemo-ai/dynamo)](https://github.com/dynemo-ai/dynemo/releases/latest)
Dynemo is a flexible, component based, data center scale Dynamo is a flexible, component based, data center scale
inference serving framework designed to leverage the strengths of the inference serving framework designed to leverage the strengths of the
standalone Dynemo Inference Server while expanding its capabilities standalone Dynamo Inference Server while expanding its capabilities
to meet the demands of complex use cases including those of Generative to meet the demands of complex use cases including those of Generative
AI. It is designed to enable developers to implement and customize AI. It is designed to enable developers to implement and customize
routing, load balancing, scaling and workflow definitions at the data routing, load balancing, scaling and workflow definitions at the data
...@@ -36,17 +36,17 @@ center scale without sacrificing performance or ease of use. ...@@ -36,17 +36,17 @@ center scale without sacrificing performance or ease of use.
> rapid-prototyping stage and we are actively looking for feedback and > rapid-prototyping stage and we are actively looking for feedback and
> collaborators. > collaborators.
## Building Dynemo ## Building Dynamo
### Requirements ### Requirements
Dynemo development and examples are container based. Dynamo development and examples are container based.
* [Docker](https://docs.docker.com/get-started/get-docker/) * [Docker](https://docs.docker.com/get-started/get-docker/)
* [buildx](https://github.com/docker/buildx) * [buildx](https://github.com/docker/buildx)
### Development ### Development
You can build the Dynemo container using the build scripts You can build the Dynamo container using the build scripts
in `container/` (or directly with `docker build`). in `container/` (or directly with `docker build`).
We provide 3 types of builds: We provide 3 types of builds:
...@@ -62,9 +62,9 @@ For example, if you want to build a container for the `STANDARD` backends you ca ...@@ -62,9 +62,9 @@ For example, if you want to build a container for the `STANDARD` backends you ca
Please see the instructions in the corresponding example for specific build instructions. Please see the instructions in the corresponding example for specific build instructions.
## Running Dynemo for Local Testing and Development ## Running Dynamo for Local Testing and Development
You can run the Dynemo container using the run scripts in You can run the Dynamo container using the run scripts in
`container/` (or directly with `docker run`). `container/` (or directly with `docker run`).
The run script offers a few common workflows: The run script offers a few common workflows:
...@@ -72,7 +72,7 @@ The run script offers a few common workflows: ...@@ -72,7 +72,7 @@ The run script offers a few common workflows:
1. Running a command in a container and exiting. 1. Running a command in a container and exiting.
``` ```
./container/run.sh -- python3 -c "import dynemo.runtime; help(dynemo.runtime)" ./container/run.sh -- python3 -c "import dynamo.runtime; help(dynamo.runtime)"
``` ```
2. Starting an interactive shell. 2. Starting an interactive shell.
...@@ -95,7 +95,7 @@ deployment instructions. ...@@ -95,7 +95,7 @@ deployment instructions.
## Rust Based Runtime ## Rust Based Runtime
Dynemo has a new rust based distributed runtime with Dynamo has a new rust based distributed runtime with
implementation under development. The rust based runtime enables implementation under development. The rust based runtime enables
serving arbitrary python code as well as native rust. Please note the serving arbitrary python code as well as native rust. Please note the
APIs are subject to change. APIs are subject to change.
...@@ -114,7 +114,7 @@ bindings. ...@@ -114,7 +114,7 @@ bindings.
An intermediate example expanding further on the concepts introduced An intermediate example expanding further on the concepts introduced
in the Hello World example. In this example, we demonstrate in the Hello World example. In this example, we demonstrate
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an [Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
application of the components defined in Dynemo. application of the components defined in Dynamo.
# Disclaimers # Disclaimers
......
dynamo->dynemo dynmo->dynamo
dynmo->dynemo
...@@ -1005,7 +1005,7 @@ dependencies = [ ...@@ -1005,7 +1005,7 @@ dependencies = [
] ]
[[package]] [[package]]
name = "dynemo-llm" name = "dynamo-llm"
version = "0.2.1" version = "0.2.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
...@@ -1020,7 +1020,7 @@ dependencies = [ ...@@ -1020,7 +1020,7 @@ dependencies = [
"chrono", "chrono",
"cmake", "cmake",
"derive_builder", "derive_builder",
"dynemo-runtime", "dynamo-runtime",
"either", "either",
"erased-serde", "erased-serde",
"futures", "futures",
...@@ -1054,7 +1054,7 @@ dependencies = [ ...@@ -1054,7 +1054,7 @@ dependencies = [
] ]
[[package]] [[package]]
name = "dynemo-runtime" name = "dynamo-runtime"
version = "0.2.1" version = "0.2.1"
dependencies = [ dependencies = [
"anyhow", "anyhow",
...@@ -2202,8 +2202,8 @@ dependencies = [ ...@@ -2202,8 +2202,8 @@ dependencies = [
"async-nats", "async-nats",
"axum 0.6.20", "axum 0.6.20",
"clap", "clap",
"dynemo-llm", "dynamo-llm",
"dynemo-runtime", "dynamo-runtime",
"futures", "futures",
"opentelemetry", "opentelemetry",
"opentelemetry-prometheus", "opentelemetry-prometheus",
......
...@@ -22,8 +22,8 @@ license = "Apache-2.0" ...@@ -22,8 +22,8 @@ license = "Apache-2.0"
[dependencies] [dependencies]
# local # local
dynemo-runtime = { path = "../../lib/runtime" } dynamo-runtime = { path = "../../lib/runtime" }
dynemo-llm = { path = "../../lib/llm" } dynamo-llm = { path = "../../lib/llm" }
# workspace - todo # workspace - todo
......
...@@ -12,16 +12,16 @@ This will: ...@@ -12,16 +12,16 @@ This will:
For example: For example:
```bash ```bash
# For more details, try DYN_LOG=debug # For more details, try DYN_LOG=debug
DYN_LOG=info cargo run --bin metrics -- --namespace dynemo --component backend --endpoint generate DYN_LOG=info cargo run --bin metrics -- --namespace dynamo --component backend --endpoint generate
# 2025-02-26T18:45:05.467026Z INFO metrics: Creating unique instance of Metrics at dynemo/components/metrics/instance # 2025-02-26T18:45:05.467026Z INFO metrics: Creating unique instance of Metrics at dynamo/components/metrics/instance
# 2025-02-26T18:45:05.472146Z INFO metrics: Scraping service dynemo_backend_720278f8 and filtering on subject dynemo_backend_720278f8.generate # 2025-02-26T18:45:05.472146Z INFO metrics: Scraping service dynamo_backend_720278f8 and filtering on subject dynamo_backend_720278f8.generate
# ... # ...
``` ```
With no matching endpoints running to collect stats from, you should see warnings in the logs: With no matching endpoints running to collect stats from, you should see warnings in the logs:
```bash ```bash
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dynemo_backend_720278f8.generate 2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dynamo_backend_720278f8.generate
``` ```
After a matching endpoint gets started, you should see the warnings stop After a matching endpoint gets started, you should see the warnings stop
...@@ -30,7 +30,7 @@ when the endpoint gets automatically discovered. ...@@ -30,7 +30,7 @@ when the endpoint gets automatically discovered.
When stats are found from target endpoints, the metrics component will When stats are found from target endpoints, the metrics component will
aggregate them and publish them to a prometheus server running on `localhost:9091/metrics` by default: aggregate them and publish them to a prometheus server running on `localhost:9091/metrics` by default:
``` ```
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynemo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 } 2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dynamo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dynamo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
``` ```
To see the metrics being published in prometheus format, you can run: To see the metrics being published in prometheus format, you can run:
......
...@@ -14,10 +14,10 @@ ...@@ -14,10 +14,10 @@
// limitations under the License. // limitations under the License.
use async_nats::service::endpoint::Stats; use async_nats::service::endpoint::Stats;
use dynemo_llm::kv_router::{ use dynamo_llm::kv_router::{
protocols::ForwardPassMetrics, scheduler::KVHitRateEvent, KV_HIT_RATE_SUBJECT, protocols::ForwardPassMetrics, scheduler::KVHitRateEvent, KV_HIT_RATE_SUBJECT,
}; };
use dynemo_runtime::{ use dynamo_runtime::{
component::Namespace, component::Namespace,
logging, logging,
pipeline::{ pipeline::{
...@@ -123,7 +123,7 @@ fn mock_stats_handler(_stats: Stats) -> serde_json::Value { ...@@ -123,7 +123,7 @@ fn mock_stats_handler(_stats: Stats) -> serde_json::Value {
} }
async fn backend(runtime: DistributedRuntime) -> Result<()> { async fn backend(runtime: DistributedRuntime) -> Result<()> {
let namespace = runtime.namespace("dynemo")?; let namespace = runtime.namespace("dynamo")?;
// Spawn background task for publishing KV hit rate events // Spawn background task for publishing KV hit rate events
let namespace_clone = namespace.clone(); let namespace_clone = namespace.clone();
......
...@@ -20,11 +20,11 @@ use prometheus::{register_counter_vec, register_gauge_vec}; ...@@ -20,11 +20,11 @@ use prometheus::{register_counter_vec, register_gauge_vec};
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use std::net::SocketAddr; use std::net::SocketAddr;
use dynemo_llm::kv_router::protocols::ForwardPassMetrics; use dynamo_llm::kv_router::protocols::ForwardPassMetrics;
use dynemo_llm::kv_router::scheduler::Endpoint; use dynamo_llm::kv_router::scheduler::Endpoint;
use dynemo_llm::kv_router::scoring::ProcessedEndpoints; use dynamo_llm::kv_router::scoring::ProcessedEndpoints;
use dynemo_runtime::{distributed::Component, service::EndpointInfo, utils::Duration, Result}; use dynamo_runtime::{distributed::Component, service::EndpointInfo, utils::Duration, Result};
/// Configuration for LLM worker load capacity metrics /// Configuration for LLM worker load capacity metrics
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
......
...@@ -27,9 +27,9 @@ ...@@ -27,9 +27,9 @@
//! - ISL Blocks: Cumulative count of total blocks in all KV hit rate events //! - ISL Blocks: Cumulative count of total blocks in all KV hit rate events
//! - Overlap Blocks: Cumulative count of blocks that were already in the KV cache //! - Overlap Blocks: Cumulative count of blocks that were already in the KV cache
use clap::Parser; use clap::Parser;
use dynemo_llm::kv_router::scheduler::KVHitRateEvent; use dynamo_llm::kv_router::scheduler::KVHitRateEvent;
use dynemo_llm::kv_router::KV_HIT_RATE_SUBJECT; use dynamo_llm::kv_router::KV_HIT_RATE_SUBJECT;
use dynemo_runtime::{ use dynamo_runtime::{
error, logging, error, logging,
traits::events::{EventPublisher, EventSubscriber}, traits::events::{EventPublisher, EventSubscriber},
utils::{Duration, Instant}, utils::{Duration, Instant},
...@@ -57,7 +57,7 @@ struct Args { ...@@ -57,7 +57,7 @@ struct Args {
endpoint: String, endpoint: String,
/// Namespace to operate in /// Namespace to operate in
#[arg(long, env = "DYN_NAMESPACE", default_value = "dynemo")] #[arg(long, env = "DYN_NAMESPACE", default_value = "dynamo")]
namespace: String, namespace: String,
/// Polling interval in seconds (minimum 1 second) /// Polling interval in seconds (minimum 1 second)
......
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
ARG BASE_IMAGE="nvcr.io/nvidia/tritonserver" ARG BASE_IMAGE="nvcr.io/nvidia/tritonserver"
ARG BASE_IMAGE_TAG="25.01-py3" ARG BASE_IMAGE_TAG="25.01-py3"
FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynemo FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dynamo
# TODO: non root user by default # TODO: non root user by default
...@@ -34,7 +34,7 @@ RUN rustup toolchain install 1.85.0-x86_64-unknown-linux-gnu ...@@ -34,7 +34,7 @@ RUN rustup toolchain install 1.85.0-x86_64-unknown-linux-gnu
# Install OpenAI-compatible frontend and its dependencies from triton server # Install OpenAI-compatible frontend and its dependencies from triton server
# repository. These are used to have a consistent interface, schema, and FastAPI # repository. These are used to have a consistent interface, schema, and FastAPI
# app between Triton Core and Dynemo implementations. # app between Triton Core and Dynamo implementations.
ARG OPENAI_SERVER_TAG="r25.01" ARG OPENAI_SERVER_TAG="r25.01"
RUN mkdir -p /opt/tritonserver/python && \ RUN mkdir -p /opt/tritonserver/python && \
cd /opt/tritonserver/python && \ cd /opt/tritonserver/python && \
...@@ -78,7 +78,7 @@ ARG TENSORRTLLM_SKIP_CLONE= ...@@ -78,7 +78,7 @@ ARG TENSORRTLLM_SKIP_CLONE=
ENV FRAMEWORK=${FRAMEWORK} ENV FRAMEWORK=${FRAMEWORK}
RUN --mount=type=bind,source=./container/deps/requirements.tensorrtllm.txt,target=/tmp/requirements.txt \ RUN --mount=type=bind,source=./container/deps/requirements.tensorrtllm.txt,target=/tmp/requirements.txt \
--mount=type=bind,source=./container/deps/clone_tensorrtllm.sh,target=/tmp/clone_tensorrtllm.sh \ --mount=type=bind,source=./container/deps/clone_tensorrtllm.sh,target=/tmp/clone_tensorrtllm.sh \
if [[ "$FRAMEWORK" == "TENSORRTLLM" ]] ; then pip install --timeout=2000 -r /tmp/requirements.txt; if [ ${TENSORRTLLM_SKIP_CLONE} -ne 1 ] ; then /tmp/clone_tensorrtllm.sh --tensorrtllm-backend-repo-tag ${TENSORRTLLM_BACKEND_REPO_TAG} --tensorrtllm-backend-rebuild ${TENSORRTLLM_BACKEND_REBUILD} --dynemo-llm-path /opt/dynemo/llm_binding ; fi ; fi if [[ "$FRAMEWORK" == "TENSORRTLLM" ]] ; then pip install --timeout=2000 -r /tmp/requirements.txt; if [ ${TENSORRTLLM_SKIP_CLONE} -ne 1 ] ; then /tmp/clone_tensorrtllm.sh --tensorrtllm-backend-repo-tag ${TENSORRTLLM_BACKEND_REPO_TAG} --tensorrtllm-backend-rebuild ${TENSORRTLLM_BACKEND_REBUILD} --dynamo-llm-path /opt/dynamo/llm_binding ; fi ; fi
RUN --mount=type=bind,source=./container/deps/requirements.standard.txt,target=/tmp/requirements.txt \ RUN --mount=type=bind,source=./container/deps/requirements.standard.txt,target=/tmp/requirements.txt \
...@@ -106,7 +106,7 @@ ENV VLLM_GENERATE_WORKERS=${VLLM_FRAMEWORK:+1} ...@@ -106,7 +106,7 @@ ENV VLLM_GENERATE_WORKERS=${VLLM_FRAMEWORK:+1}
ENV VLLM_BASELINE_TP_SIZE=${VLLM_FRAMEWORK:+1} ENV VLLM_BASELINE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV VLLM_CONTEXT_TP_SIZE=${VLLM_FRAMEWORK:+1} ENV VLLM_CONTEXT_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV VLLM_GENERATE_TP_SIZE=${VLLM_FRAMEWORK:+1} ENV VLLM_GENERATE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV VLLM_KV_CAPI_PATH="/opt/dynemo/bindings/lib/libdynemo_llm_capi.so" ENV VLLM_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
ENV PYTHONUNBUFFERED=1 ENV PYTHONUNBUFFERED=1
# Install NATS - pointing toward NATS github instead of binaries.nats.dev due to server instability # Install NATS - pointing toward NATS github instead of binaries.nats.dev due to server instability
...@@ -154,7 +154,7 @@ RUN cd examples/rust && \ ...@@ -154,7 +154,7 @@ RUN cd examples/rust && \
cp target/release/http /usr/local/bin/ && \ cp target/release/http /usr/local/bin/ && \
cp target/release/llmctl /usr/local/bin/ cp target/release/llmctl /usr/local/bin/
COPY deploy/dynemo/sdk /workspace/deploy/dynemo/sdk COPY deploy/dynamo/sdk /workspace/deploy/dynamo/sdk
# Generate C bindings. Note that this is required for TRTLLM backend re-build # Generate C bindings. Note that this is required for TRTLLM backend re-build
...@@ -162,30 +162,30 @@ COPY lib/bindings /workspace/lib/bindings ...@@ -162,30 +162,30 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c/ && \ RUN cd lib/bindings/c/ && \
cargo build --release --locked && cargo doc --no-deps cargo build --release --locked && cargo doc --no-deps
# Install uv, create virtualenv for general use, and build dynemo wheel # Install uv, create virtualenv for general use, and build dynamo wheel
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dynemo && \ RUN mkdir /opt/dynamo && \
uv venv /opt/dynemo/venv --python 3.12 && \ uv venv /opt/dynamo/venv --python 3.12 && \
source /opt/dynemo/venv/bin/activate && \ source /opt/dynamo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \ uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dynemo*cp312*.whl && \ uv pip install /workspace/dist/dynamo*cp312*.whl && \
cd /workspace/deploy/dynemo/sdk && \ cd /workspace/deploy/dynamo/sdk && \
uv build --wheel --out-dir /workspace/dist && \ uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dynemo_sdk*any.whl uv pip install /workspace/dist/dynamo_sdk*any.whl
# Package the bindings # Package the bindings
RUN mkdir -p /opt/dynemo/bindings/wheels && \ RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynemo/bindings/lib && \ mkdir /opt/dynamo/bindings/lib && \
cp dist/dynemo*cp312*.whl /opt/dynemo/bindings/wheels/. && \ cp dist/dynamo*cp312*.whl /opt/dynamo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdynemo_llm_capi.so /opt/dynemo/bindings/lib/. && \ cp lib/bindings/c/target/release/libdynamo_llm_capi.so /opt/dynamo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dynemo/bindings/. cp -r lib/bindings/c/include /opt/dynamo/bindings/.
# Install dynemo.runtime and dynemo.llm wheels globally in container for tests that # Install dynamo.runtime and dynamo.llm wheels globally in container for tests that
# currently run without virtual environment activated. # currently run without virtual environment activated.
# TODO: In future, we may use a virtualenv for everything and remove this. # TODO: In future, we may use a virtualenv for everything and remove this.
RUN cd /opt/dynemo/bindings/wheels && \ RUN cd /opt/dynamo/bindings/wheels && \
pip install dynemo*cp312*.whl && \ pip install dynamo*cp312*.whl && \
pip install /workspace/dist/dynemo_sdk*any.whl pip install /workspace/dist/dynamo_sdk*any.whl
# Copy everything in after ginstall steps to avoid re-running build/install # Copy everything in after ginstall steps to avoid re-running build/install
# commands on unrelated changes in other dirs. # commands on unrelated changes in other dirs.
......
...@@ -24,17 +24,17 @@ ENV PATH=/usr/local/bin/etcd/:$PATH ...@@ -24,17 +24,17 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
# Install uv and create virtualenv # Install uv and create virtualenv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dynemo && \ RUN mkdir /opt/dynamo && \
uv venv /opt/dynemo/venv --python 3.12 uv venv /opt/dynamo/venv --python 3.12
# Activate virtual environment # Activate virtual environment
ENV VIRTUAL_ENV=/opt/dynemo/venv ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}" ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Install patched vllm - keep this early in Dockerfile to avoid # Install patched vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes # rebuilds from unrelated source code changes
ARG VLLM_REF="v0.7.2" ARG VLLM_REF="v0.7.2"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dynemo-kv-disagg-patch.patch" ARG VLLM_PATCH="vllm_${VLLM_REF}-dynamo-kv-disagg-patch.patch"
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \ RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
...@@ -92,7 +92,7 @@ RUN cd examples/rust && \ ...@@ -92,7 +92,7 @@ RUN cd examples/rust && \
cp target/release/http /usr/local/bin/ && \ cp target/release/http /usr/local/bin/ && \
cp target/release/llmctl /usr/local/bin/ cp target/release/llmctl /usr/local/bin/
# TODO: Build dynemo-run # TODO: Build dynamo-run
# COPY applications/... # COPY applications/...
# Generate C bindings for kv cache routing in vLLM # Generate C bindings for kv cache routing in vLLM
...@@ -100,29 +100,29 @@ COPY lib/bindings /workspace/lib/bindings ...@@ -100,29 +100,29 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \ RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps cargo build --release --locked && cargo doc --no-deps
COPY deploy/dynemo/sdk /workspace/deploy/dynemo/sdk COPY deploy/dynamo/sdk /workspace/deploy/dynamo/sdk
# Build dynemo wheel # Build dynamo wheel
RUN source /opt/dynemo/venv/bin/activate && \ RUN source /opt/dynamo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \ uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dynemo*cp312*.whl && \ uv pip install /workspace/dist/dynamo*cp312*.whl && \
cd /workspace/deploy/dynemo/sdk && \ cd /workspace/deploy/dynamo/sdk && \
uv build --wheel --out-dir /workspace/dist && \ uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dynemo_sdk*any.whl uv pip install /workspace/dist/dynamo_sdk*any.whl
# Package the bindings # Package the bindings
RUN mkdir -p /opt/dynemo/bindings/wheels && \ RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynemo/bindings/lib && \ mkdir /opt/dynamo/bindings/lib && \
cp dist/dynemo*cp312*.whl /opt/dynemo/bindings/wheels/. && \ cp dist/dynamo*cp312*.whl /opt/dynamo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdynemo_llm_capi.so /opt/dynemo/bindings/lib/. && \ cp lib/bindings/c/target/release/libdynamo_llm_capi.so /opt/dynamo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dynemo/bindings/. cp -r lib/bindings/c/include /opt/dynamo/bindings/.
# Tell vllm to use the Dynemo LLM C API for KV Cache Routing # Tell vllm to use the Dynamo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dynemo/bindings/lib/libdynemo_llm_capi.so" ENV VLLM_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
# FIXME: Copy more specific folders in for dev/debug after directory restructure # FIXME: Copy more specific folders in for dev/debug after directory restructure
COPY . /workspace COPY . /workspace
# FIXME: May want a modification with dynemo-distributed banner on entry # FIXME: May want a modification with dynamo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD [] CMD []
...@@ -140,10 +140,10 @@ RUN apt update -y && \ ...@@ -140,10 +140,10 @@ RUN apt update -y && \
echo "set -g mouse on" >> /root/.tmux.conf echo "set -g mouse on" >> /root/.tmux.conf
# Set environment variables # Set environment variables
ENV VIRTUAL_ENV=/opt/dynemo/venv ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}" ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV VLLM_KV_CAPI_PATH="/opt/dynemo/bindings/lib/libdynemo_llm_capi.so" ENV VLLM_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
# Copy binaries # Copy binaries
COPY --from=dev /usr/local/bin/http /usr/local/bin/http COPY --from=dev /usr/local/bin/http /usr/local/bin/http
...@@ -170,7 +170,7 @@ COPY examples/python_rs/llm/vllm /workspace/examples/python_rs/llm/vllm ...@@ -170,7 +170,7 @@ COPY examples/python_rs/llm/vllm /workspace/examples/python_rs/llm/vllm
WORKDIR /workspace WORKDIR /workspace
# FIXME: May want a modification with dynemo-distributed banner on entry # FIXME: May want a modification with dynamo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD [] CMD []
...@@ -151,11 +151,11 @@ ENV PATH=/usr/local/bin/etcd/:$PATH ...@@ -151,11 +151,11 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
# Install uv and create virtualenv # Install uv and create virtualenv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dynemo && \ RUN mkdir /opt/dynamo && \
uv venv /opt/dynemo/venv --python 3.12 uv venv /opt/dynamo/venv --python 3.12
# Activate virtual environment # Activate virtual environment
ENV VIRTUAL_ENV=/opt/dynemo/venv ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}" ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Common dependencies # Common dependencies
...@@ -165,7 +165,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi ...@@ -165,7 +165,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi
# Install patched vllm - keep this early in Dockerfile to avoid # Install patched vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes # rebuilds from unrelated source code changes
ARG VLLM_REF="v0.7.2" ARG VLLM_REF="v0.7.2"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dynemo-kv-disagg-patch.patch" ARG VLLM_PATCH="vllm_${VLLM_REF}-dynamo-kv-disagg-patch.patch"
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \ RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
...@@ -230,30 +230,29 @@ COPY lib/bindings /workspace/lib/bindings ...@@ -230,30 +230,29 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \ RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps cargo build --release --locked && cargo doc --no-deps
COPY deploy/dynemo/sdk /workspace/deploy/dynemo/sdk COPY deploy/dynamo/sdk /workspace/deploy/dynamo/sdk
# Build dynemo wheel # Build dynamo wheel
RUN source /opt/dynemo/venv/bin/activate && \ RUN source /opt/dynamo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \ uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dynemo*cp312*.whl && \ uv pip install /workspace/dist/dynamo*cp312*.whl && \
cd /workspace/deploy/dynemo/sdk && \ cd /workspace/deploy/dynamo/sdk && \
uv build --wheel --out-dir /workspace/dist && \ uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dynemo_sdk*any.whl uv pip install /workspace/dist/dynamo_sdk*any.whl
# Package the bindings # Package the bindings
RUN mkdir -p /opt/dynemo/bindings/wheels && \ RUN mkdir -p /opt/dynamo/bindings/wheels && \
mkdir /opt/dynemo/bindings/lib && \ mkdir /opt/dynamo/bindings/lib && \
cp dist/dynemo*cp312*.whl /opt/dynemo/bindings/wheels/. && \ cp dist/dynamo*cp312*.whl /opt/dynamo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdynemo_llm_capi.so /opt/dynemo/bindings/lib/. && \ cp lib/bindings/c/target/release/libdynamo_llm_capi.so /opt/dynamo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dynemo/bindings/. cp -r lib/bindings/c/include /opt/dynamo/bindings/.
# Tell vllm to use the Dynemo LLM C API for KV Cache Routing # Tell vllm to use the Dynamo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dynemo/bindings/lib/libdynemo_llm_capi.so" ENV VLLM_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
# FIXME: Copy more specific folders in for dev/debug after directory restructure # FIXME: Copy more specific folders in for dev/debug after directory restructure
COPY . /workspace COPY . /workspace
# FIXME: May want a modification with dynamo banner on entry
# FIXME: May want a modification with dynemo-distributed banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD [] CMD []
...@@ -271,10 +270,10 @@ RUN apt update -y && \ ...@@ -271,10 +270,10 @@ RUN apt update -y && \
echo "set -g mouse on" >> /root/.tmux.conf echo "set -g mouse on" >> /root/.tmux.conf
# Set environment variables # Set environment variables
ENV VIRTUAL_ENV=/opt/dynemo/venv ENV VIRTUAL_ENV=/opt/dynamo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}" ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV VLLM_KV_CAPI_PATH="/opt/dynemo/bindings/lib/libdynemo_llm_capi.so" ENV VLLM_KV_CAPI_PATH="/opt/dynamo/bindings/lib/libdynamo_llm_capi.so"
# Copy binaries # Copy binaries
COPY --from=dev /usr/local/bin/http /usr/local/bin/http COPY --from=dev /usr/local/bin/http /usr/local/bin/http
...@@ -301,7 +300,7 @@ COPY examples/python_rs/llm/vllm_nixl /workspace/examples/python_rs/llm/vllm_nix ...@@ -301,7 +300,7 @@ COPY examples/python_rs/llm/vllm_nixl /workspace/examples/python_rs/llm/vllm_nix
WORKDIR /workspace WORKDIR /workspace
# FIXME: May want a modification with dynemo-distributed banner on entry # FIXME: May want a modification with dynamo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"] ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD [] CMD []
...@@ -16,7 +16,7 @@ ...@@ -16,7 +16,7 @@
TENSORRTLLM_BACKEND_REPO_TAG= TENSORRTLLM_BACKEND_REPO_TAG=
TENSORRTLLM_BACKEND_REBUILD= TENSORRTLLM_BACKEND_REBUILD=
DYNEMO_LLM_PATH= DYNAMO_LLM_PATH=
GIT_TOKEN= GIT_TOKEN=
GIT_REPO= GIT_REPO=
...@@ -43,9 +43,9 @@ get_options() { ...@@ -43,9 +43,9 @@ get_options() {
missing_requirement $1 missing_requirement $1
fi fi
;; ;;
--dynemo-llm-path) --dynamo-llm-path)
if [ "$2" ]; then if [ "$2" ]; then
DYNEMO_LLM_PATH=$2 DYNAMO_LLM_PATH=$2
shift shift
else else
missing_requirement $1 missing_requirement $1
...@@ -147,7 +147,7 @@ if [ ! -z ${TENSORRTLLM_BACKEND_REBUILD} ]; then ...@@ -147,7 +147,7 @@ if [ ! -z ${TENSORRTLLM_BACKEND_REBUILD} ]; then
# Build the backend # Build the backend
(cd inflight_batcher_llm/src \ (cd inflight_batcher_llm/src \
&& cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DUSE_CXX11_ABI=1 -DDYNEMO_LLM_PATH=$DYNEMO_LLM_PATH .. \ && cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DUSE_CXX11_ABI=1 -DDYNAMO_LLM_PATH=$DYNAMO_LLM_PATH .. \
&& make install \ && make install \
&& cp libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/ \ && cp libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/ \
&& cp trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/ \ && cp trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/ \
......
diff --git a/vllm/config.py b/vllm/config.py diff --git a/vllm/config.py b/vllm/config.py
index 9ba49757..3ec4bbab 100644 index 9ba49757..5e1cf249 100644
--- a/vllm/config.py --- a/vllm/config.py
+++ b/vllm/config.py +++ b/vllm/config.py
@@ -2620,6 +2620,9 @@ class KVTransferConfig(BaseModel): @@ -2620,6 +2620,9 @@ class KVTransferConfig(BaseModel):
...@@ -41,7 +41,7 @@ index 9ba49757..3ec4bbab 100644 ...@@ -41,7 +41,7 @@ index 9ba49757..3ec4bbab 100644
f"and `kv_both`") f"and `kv_both`")
- if self.kv_connector is not None and self.kv_role is None: - if self.kv_connector is not None and self.kv_role is None:
+ if self.kv_connector is not None and self.kv_connector != "DynemoNixlConnector" and self.kv_role is None: + if self.kv_connector is not None and self.kv_connector != "DynamoNixlConnector" and self.kv_role is None:
raise ValueError("Please specify kv_disagg_role when kv_connector " raise ValueError("Please specify kv_disagg_role when kv_connector "
"is set, supported roles are `kv_producer`, " "is set, supported roles are `kv_producer`, "
"`kv_consumer`, and `kv_both`") "`kv_consumer`, and `kv_both`")
...@@ -54,7 +54,7 @@ index 9ba49757..3ec4bbab 100644 ...@@ -54,7 +54,7 @@ index 9ba49757..3ec4bbab 100644
def need_kv_parallel_group(self) -> bool: def need_kv_parallel_group(self) -> bool:
# for those database-based connector, vLLM does not need to create # for those database-based connector, vLLM does not need to create
# parallel group, and in that case the kv parallel size will be 1. # parallel group, and in that case the kv parallel size will be 1.
+ if self.kv_connector == "DynemoNixlConnector": + if self.kv_connector == "DynamoNixlConnector":
+ return False + return False
return self.kv_connector is not None and self.kv_parallel_size > 1 return self.kv_connector is not None and self.kv_parallel_size > 1
...@@ -271,7 +271,7 @@ index c5b3b04f..c72001f7 100644 ...@@ -271,7 +271,7 @@ index c5b3b04f..c72001f7 100644
self.block_tables: Dict[SeqId, BlockTable] = {} self.block_tables: Dict[SeqId, BlockTable] = {}
diff --git a/vllm/core/event_manager.py b/vllm/core/event_manager.py diff --git a/vllm/core/event_manager.py b/vllm/core/event_manager.py
new file mode 100644 new file mode 100644
index 00000000..8699ca06 index 00000000..d3706700
--- /dev/null --- /dev/null
+++ b/vllm/core/event_manager.py +++ b/vllm/core/event_manager.py
@@ -0,0 +1,102 @@ @@ -0,0 +1,102 @@
...@@ -287,7 +287,7 @@ index 00000000..8699ca06 ...@@ -287,7 +287,7 @@ index 00000000..8699ca06
+logger = logging.getLogger(__name__) +logger = logging.getLogger(__name__)
+ +
+ +
+class DynemoResult: +class DynamoResult:
+ OK = 0 + OK = 0
+ ERR = 1 + ERR = 1
+ +
...@@ -300,12 +300,12 @@ index 00000000..8699ca06 ...@@ -300,12 +300,12 @@ index 00000000..8699ca06
+ +
+ try: + try:
+ self.lib = ctypes.CDLL(lib_path) + self.lib = ctypes.CDLL(lib_path)
+ self.lib.dynemo_llm_init.argtypes = [c_char_p, c_char_p, c_int64] + self.lib.dynamo_llm_init.argtypes = [c_char_p, c_char_p, c_int64]
+ self.lib.dynemo_llm_init.restype = c_uint32 + self.lib.dynamo_llm_init.restype = c_uint32
+ +
+ result = self.lib.dynemo_llm_init(namespace.encode(), + result = self.lib.dynamo_llm_init(namespace.encode(),
+ component.encode(), worker_id) + component.encode(), worker_id)
+ if result == DynemoResult.OK: + if result == DynamoResult.OK:
+ logger.info( + logger.info(
+ "KVCacheEventManager initialized successfully. Ready to publish KV Cache Events" + "KVCacheEventManager initialized successfully. Ready to publish KV Cache Events"
+ ) + )
...@@ -316,7 +316,7 @@ index 00000000..8699ca06 ...@@ -316,7 +316,7 @@ index 00000000..8699ca06
+ print(f"Failed to load {lib_path}") + print(f"Failed to load {lib_path}")
+ raise e + raise e
+ +
+ self.lib.dynemo_kv_event_publish_stored.argtypes = [ + self.lib.dynamo_kv_event_publish_stored.argtypes = [
+ ctypes.c_uint64, # event_id + ctypes.c_uint64, # event_id
+ ctypes.POINTER(ctypes.c_uint32), # token_ids + ctypes.POINTER(ctypes.c_uint32), # token_ids
+ ctypes.POINTER(ctypes.c_size_t), # num_block_tokens + ctypes.POINTER(ctypes.c_size_t), # num_block_tokens
...@@ -325,14 +325,14 @@ index 00000000..8699ca06 ...@@ -325,14 +325,14 @@ index 00000000..8699ca06
+ ctypes.POINTER(ctypes.c_uint64), # parent_hash + ctypes.POINTER(ctypes.c_uint64), # parent_hash
+ ctypes.c_uint64, # lora_id + ctypes.c_uint64, # lora_id
+ ] + ]
+ self.lib.dynemo_kv_event_publish_stored.restype = ctypes.c_uint32 # dynemo_llm_result_t + self.lib.dynamo_kv_event_publish_stored.restype = ctypes.c_uint32 # dynamo_llm_result_t
+ +
+ self.lib.dynemo_kv_event_publish_removed.argtypes = [ + self.lib.dynamo_kv_event_publish_removed.argtypes = [
+ ctypes.c_uint64, # event_id + ctypes.c_uint64, # event_id
+ ctypes.POINTER(ctypes.c_uint64), # block_ids + ctypes.POINTER(ctypes.c_uint64), # block_ids
+ ctypes.c_size_t, # num_blocks + ctypes.c_size_t, # num_blocks
+ ] + ]
+ self.lib.dynemo_kv_event_publish_removed.restype = ctypes.c_uint32 # dynemo_llm_result_t + self.lib.dynamo_kv_event_publish_removed.restype = ctypes.c_uint32 # dynamo_llm_result_t
+ +
+ self.event_id_counter = 0 + self.event_id_counter = 0
+ +
...@@ -346,7 +346,7 @@ index 00000000..8699ca06 ...@@ -346,7 +346,7 @@ index 00000000..8699ca06
+ if parent is not None else None) + if parent is not None else None)
+ +
+ # Publish the event + # Publish the event
+ result = self.lib.dynemo_kv_event_publish_stored( + result = self.lib.dynamo_kv_event_publish_stored(
+ self.event_id_counter, # uint64_t event_id + self.event_id_counter, # uint64_t event_id
+ token_ids_arr, # const uint32_t *token_ids + token_ids_arr, # const uint32_t *token_ids
+ num_block_tokens, # const uintptr_t *num_block_tokens + num_block_tokens, # const uintptr_t *num_block_tokens
...@@ -356,7 +356,7 @@ index 00000000..8699ca06 ...@@ -356,7 +356,7 @@ index 00000000..8699ca06
+ 0, # uint64_t lora_id + 0, # uint64_t lora_id
+ ) + )
+ +
+ if result == DynemoResult.OK: + if result == DynamoResult.OK:
+ logger.debug(f"Store - Published KV Event: {block.content_hash}") + logger.debug(f"Store - Published KV Event: {block.content_hash}")
+ else: + else:
+ logger.debug( + logger.debug(
...@@ -365,13 +365,13 @@ index 00000000..8699ca06 ...@@ -365,13 +365,13 @@ index 00000000..8699ca06
+ self.event_id_counter += 1 + self.event_id_counter += 1
+ +
+ def enqueue_removed_event(self, block_hash: PrefixHash): + def enqueue_removed_event(self, block_hash: PrefixHash):
+ result = self.lib.dynemo_kv_event_publish_removed( + result = self.lib.dynamo_kv_event_publish_removed(
+ self.event_id_counter, + self.event_id_counter,
+ (ctypes.c_uint64 * 1)(block_hash), + (ctypes.c_uint64 * 1)(block_hash),
+ 1, + 1,
+ ) + )
+ +
+ if result == DynemoResult.OK: + if result == DynamoResult.OK:
+ logger.debug(f"Remove - Published KV Event: {block_hash}") + logger.debug(f"Remove - Published KV Event: {block_hash}")
+ else: + else:
+ logger.debug(f"Remove - Failed to Publish KV Event: {block_hash}") + logger.debug(f"Remove - Failed to Publish KV Event: {block_hash}")
...@@ -764,7 +764,7 @@ index 00000000..9b938039 ...@@ -764,7 +764,7 @@ index 00000000..9b938039
\ No newline at end of file \ No newline at end of file
diff --git a/vllm/distributed/device_communicators/nixl.py b/vllm/distributed/device_communicators/nixl.py diff --git a/vllm/distributed/device_communicators/nixl.py b/vllm/distributed/device_communicators/nixl.py
new file mode 100644 new file mode 100644
index 00000000..523d58d4 index 00000000..87020367
--- /dev/null --- /dev/null
+++ b/vllm/distributed/device_communicators/nixl.py +++ b/vllm/distributed/device_communicators/nixl.py
@@ -0,0 +1,405 @@ @@ -0,0 +1,405 @@
...@@ -799,7 +799,7 @@ index 00000000..523d58d4 ...@@ -799,7 +799,7 @@ index 00000000..523d58d4
+ num_blocks: int + num_blocks: int
+ +
+ +
+class DynemoNixlConnector: +class DynamoNixlConnector:
+ def __init__(self, vllm_config: VllmConfig, engine_id: str, rank: int): + def __init__(self, vllm_config: VllmConfig, engine_id: str, rank: int):
+ self.vllm_config = vllm_config + self.vllm_config = vllm_config
+ if NixlWrapper is None: + if NixlWrapper is None:
...@@ -1173,11 +1173,11 @@ index 00000000..523d58d4 ...@@ -1173,11 +1173,11 @@ index 00000000..523d58d4
+ else: + else:
+ self._transfers[req_id] = running_reqs + self._transfers[req_id] = running_reqs
+ return done_req_ids + return done_req_ids
diff --git a/vllm/distributed/kv_transfer/kv_connector/dynemo_connector.py b/vllm/distributed/kv_transfer/kv_connector/dynemo_connector.py diff --git a/vllm/distributed/kv_transfer/kv_connector/dynamo_connector.py b/vllm/distributed/kv_transfer/kv_connector/dynamo_connector.py
new file mode 100644 new file mode 100644
index 00000000..2319867a index 00000000..7b3344f8
--- /dev/null --- /dev/null
+++ b/vllm/distributed/kv_transfer/kv_connector/dynemo_connector.py +++ b/vllm/distributed/kv_transfer/kv_connector/dynamo_connector.py
@@ -0,0 +1,350 @@ @@ -0,0 +1,350 @@
+# SPDX-License-Identifier: Apache-2.0 +# SPDX-License-Identifier: Apache-2.0
+""" +"""
...@@ -1209,7 +1209,7 @@ index 00000000..2319867a ...@@ -1209,7 +1209,7 @@ index 00000000..2319867a
+logger = init_logger(__name__) +logger = init_logger(__name__)
+ +
+ +
+class DynemoConnector(KVConnectorBase): +class DynamoConnector(KVConnectorBase):
+ +
+ def __init__( + def __init__(
+ self, + self,
...@@ -1223,16 +1223,16 @@ index 00000000..2319867a ...@@ -1223,16 +1223,16 @@ index 00000000..2319867a
+ self.tp_size = config.parallel_config.tensor_parallel_size + self.tp_size = config.parallel_config.tensor_parallel_size
+ self.rank = rank + self.rank = rank
+ +
+ if self.config.kv_connector != "DynemoNcclConnector": + if self.config.kv_connector != "DynamoNcclConnector":
+ raise NotImplementedError("Only DynemoNcclConnector is supported by the DynemoConnector class") + raise NotImplementedError("Only DynamoNcclConnector is supported by the DynamoConnector class")
+ +
+ from vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe import ( + from vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe import (
+ PyNcclPipe) + PyNcclPipe)
+ from vllm.distributed.kv_transfer.kv_pipe.dynemo_nccl_pipe import ( + from vllm.distributed.kv_transfer.kv_pipe.dynamo_nccl_pipe import (
+ DynemoNcclDataPlane) + DynamoNcclDataPlane)
+ +
+ logger.info( + logger.info(
+ "Initializing DynemoNcclConnector under kv_transfer_config %s", + "Initializing DynamoNcclConnector under kv_transfer_config %s",
+ self.config) + self.config)
+ +
+ self.lookup_buffer_size = self.config.kv_buffer_size + self.lookup_buffer_size = self.config.kv_buffer_size
...@@ -1264,7 +1264,7 @@ index 00000000..2319867a ...@@ -1264,7 +1264,7 @@ index 00000000..2319867a
+ port_offset=port_offset_base, + port_offset=port_offset_base,
+ ) + )
+ +
+ self.data_plane = DynemoNcclDataPlane( + self.data_plane = DynamoNcclDataPlane(
+ data_pipe=self.data_pipe, + data_pipe=self.data_pipe,
+ port=self._get_data_plane_port(self.global_kv_rank), + port=self._get_data_plane_port(self.global_kv_rank),
+ ) + )
...@@ -1530,7 +1530,7 @@ index 00000000..2319867a ...@@ -1530,7 +1530,7 @@ index 00000000..2319867a
+ self.config.kv_consumers_pipeline_parallel_size = kv_config_enhanced["kv_consumers_pipeline_parallel_size"] + self.config.kv_consumers_pipeline_parallel_size = kv_config_enhanced["kv_consumers_pipeline_parallel_size"]
+ self.config.kv_producers_parallel_size = kv_config_enhanced["kv_producers_parallel_size"] + self.config.kv_producers_parallel_size = kv_config_enhanced["kv_producers_parallel_size"]
diff --git a/vllm/distributed/kv_transfer/kv_connector/factory.py b/vllm/distributed/kv_transfer/kv_connector/factory.py diff --git a/vllm/distributed/kv_transfer/kv_connector/factory.py b/vllm/distributed/kv_transfer/kv_connector/factory.py
index fe480533..f4775663 100644 index fe480533..c82fda80 100644
--- a/vllm/distributed/kv_transfer/kv_connector/factory.py --- a/vllm/distributed/kv_transfer/kv_connector/factory.py
+++ b/vllm/distributed/kv_transfer/kv_connector/factory.py +++ b/vllm/distributed/kv_transfer/kv_connector/factory.py
@@ -27,13 +27,13 @@ class KVConnectorFactory: @@ -27,13 +27,13 @@ class KVConnectorFactory:
...@@ -1555,11 +1555,11 @@ index fe480533..f4775663 100644 ...@@ -1555,11 +1555,11 @@ index fe480533..f4775663 100644
"SimpleConnector") "SimpleConnector")
+ +
+KVConnectorFactory.register_connector( +KVConnectorFactory.register_connector(
+ "DynemoNcclConnector", + "DynamoNcclConnector",
+ "vllm.distributed.kv_transfer.kv_connector.dynemo_connector", + "vllm.distributed.kv_transfer.kv_connector.dynamo_connector",
+ "DynemoConnector") + "DynamoConnector")
diff --git a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py diff --git a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
index 2033e976..e0537903 100644 index 2033e976..ddebb68e 100644
--- a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py --- a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py +++ b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
@@ -8,13 +8,15 @@ MooncakePipe. @@ -8,13 +8,15 @@ MooncakePipe.
...@@ -1886,7 +1886,7 @@ index 2033e976..e0537903 100644 ...@@ -1886,7 +1886,7 @@ index 2033e976..e0537903 100644
+ world_group.broadcast_object(kv_config_enhanced) + world_group.broadcast_object(kv_config_enhanced)
+ +
+ else: + else:
+ raise NotImplementedError("MooncakeConnector is not supported in Dynemo patch") + raise NotImplementedError("MooncakeConnector is not supported in Dynamo patch")
+ else: + else:
+ kv_config_enhanced = world_group.broadcast_object() + kv_config_enhanced = world_group.broadcast_object()
+ logger.info("kv_config_enhanced: %s", kv_config_enhanced) + logger.info("kv_config_enhanced: %s", kv_config_enhanced)
...@@ -2175,11 +2175,11 @@ index 40589fb3..da2829cf 100644 ...@@ -2175,11 +2175,11 @@ index 40589fb3..da2829cf 100644
"""Receive a tensor (can be None) from the pipeline. """Receive a tensor (can be None) from the pipeline.
Returns: Returns:
diff --git a/vllm/distributed/kv_transfer/kv_pipe/dynemo_nccl_pipe.py b/vllm/distributed/kv_transfer/kv_pipe/dynemo_nccl_pipe.py diff --git a/vllm/distributed/kv_transfer/kv_pipe/dynamo_nccl_pipe.py b/vllm/distributed/kv_transfer/kv_pipe/dynamo_nccl_pipe.py
new file mode 100644 new file mode 100644
index 00000000..58d0d28c index 00000000..3ee0fa78
--- /dev/null --- /dev/null
+++ b/vllm/distributed/kv_transfer/kv_pipe/dynemo_nccl_pipe.py +++ b/vllm/distributed/kv_transfer/kv_pipe/dynamo_nccl_pipe.py
@@ -0,0 +1,124 @@ @@ -0,0 +1,124 @@
+import logging +import logging
+import threading +import threading
...@@ -2195,7 +2195,7 @@ index 00000000..58d0d28c ...@@ -2195,7 +2195,7 @@ index 00000000..58d0d28c
+logger = logging.getLogger(__name__) +logger = logging.getLogger(__name__)
+ +
+ +
+class DynemoNcclDataPlane: +class DynamoNcclDataPlane:
+ def __init__( + def __init__(
+ self, + self,
+ data_pipe: PyNcclPipe, + data_pipe: PyNcclPipe,
...@@ -2531,7 +2531,7 @@ index 321902d1..b8937ef8 100644 ...@@ -2531,7 +2531,7 @@ index 321902d1..b8937ef8 100644
def ensure_model_parallel_initialized( def ensure_model_parallel_initialized(
diff --git a/vllm/engine/llm_engine.py b/vllm/engine/llm_engine.py diff --git a/vllm/engine/llm_engine.py b/vllm/engine/llm_engine.py
index d82d9ad9..cc02b029 100644 index d82d9ad9..53cace75 100644
--- a/vllm/engine/llm_engine.py --- a/vllm/engine/llm_engine.py
+++ b/vllm/engine/llm_engine.py +++ b/vllm/engine/llm_engine.py
@@ -2,13 +2,17 @@ @@ -2,13 +2,17 @@
...@@ -2614,7 +2614,7 @@ index d82d9ad9..cc02b029 100644 ...@@ -2614,7 +2614,7 @@ index d82d9ad9..cc02b029 100644
+ self.engine_id = str(uuid.uuid4()) + self.engine_id = str(uuid.uuid4())
+ self._nixl_agents_names: Optional[List[str]] = None + self._nixl_agents_names: Optional[List[str]] = None
+ if self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "DynemoNixlConnector": + if self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "DynamoNixlConnector":
+ self._nixl_agents_names = self._initialize_nixl() + self._nixl_agents_names = self._initialize_nixl()
+ +
+ self._request_notif_counter = defaultdict(lambda: -self.parallel_config.tensor_parallel_size) + self._request_notif_counter = defaultdict(lambda: -self.parallel_config.tensor_parallel_size)
...@@ -2946,7 +2946,7 @@ index 3cf1850e..6b90ece7 100644 ...@@ -2946,7 +2946,7 @@ index 3cf1850e..6b90ece7 100644
+ kv_active_blocks: int + kv_active_blocks: int
+ kv_total_blocks: int + kv_total_blocks: int
diff --git a/vllm/engine/multiprocessing/client.py b/vllm/engine/multiprocessing/client.py diff --git a/vllm/engine/multiprocessing/client.py b/vllm/engine/multiprocessing/client.py
index 85b5f31e..3f8b8fad 100644 index 85b5f31e..da207947 100644
--- a/vllm/engine/multiprocessing/client.py --- a/vllm/engine/multiprocessing/client.py
+++ b/vllm/engine/multiprocessing/client.py +++ b/vllm/engine/multiprocessing/client.py
@@ -8,6 +8,7 @@ from typing import (Any, AsyncGenerator, Dict, Iterator, List, Mapping, @@ -8,6 +8,7 @@ from typing import (Any, AsyncGenerator, Dict, Iterator, List, Mapping,
...@@ -3028,7 +3028,7 @@ index 85b5f31e..3f8b8fad 100644 ...@@ -3028,7 +3028,7 @@ index 85b5f31e..3f8b8fad 100644
+ +
+ @property + @property
+ def using_nixl_connector(self) -> bool: + def using_nixl_connector(self) -> bool:
+ return self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "DynemoNixlConnector" + return self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "DynamoNixlConnector"
+ +
@staticmethod @staticmethod
def is_unsupported_config(engine_args: AsyncEngineArgs): def is_unsupported_config(engine_args: AsyncEngineArgs):
...@@ -3656,7 +3656,7 @@ index 534b9e60..18675d2f 100644 ...@@ -3656,7 +3656,7 @@ index 534b9e60..18675d2f 100644
@property @property
def is_first_multi_step(self) -> bool: def is_first_multi_step(self) -> bool:
diff --git a/vllm/worker/model_runner.py b/vllm/worker/model_runner.py diff --git a/vllm/worker/model_runner.py b/vllm/worker/model_runner.py
index 12baecde..489d3b77 100644 index 12baecde..a3f2c464 100644
--- a/vllm/worker/model_runner.py --- a/vllm/worker/model_runner.py
+++ b/vllm/worker/model_runner.py +++ b/vllm/worker/model_runner.py
@@ -1824,6 +1824,9 @@ class ModelRunner(GPUModelRunnerBase[ModelInputForGPUWithSamplingMetadata]): @@ -1824,6 +1824,9 @@ class ModelRunner(GPUModelRunnerBase[ModelInputForGPUWithSamplingMetadata]):
...@@ -3664,7 +3664,7 @@ index 12baecde..489d3b77 100644 ...@@ -3664,7 +3664,7 @@ index 12baecde..489d3b77 100644
if self.vllm_config.kv_transfer_config is None: if self.vllm_config.kv_transfer_config is None:
return False return False
+ +
+ if self.vllm_config.kv_transfer_config.kv_connector == "DynemoNixlConnector": + if self.vllm_config.kv_transfer_config.kv_connector == "DynamoNixlConnector":
+ return False + return False
prefill_meta = model_input.attn_metadata.prefill_metadata prefill_meta = model_input.attn_metadata.prefill_metadata
...@@ -3674,13 +3674,13 @@ index 12baecde..489d3b77 100644 ...@@ -3674,13 +3674,13 @@ index 12baecde..489d3b77 100644
if self.vllm_config.kv_transfer_config is None: if self.vllm_config.kv_transfer_config is None:
return False return False
+ +
+ if self.vllm_config.kv_transfer_config.kv_connector == "DynemoNixlConnector": + if self.vllm_config.kv_transfer_config.kv_connector == "DynamoNixlConnector":
+ return False + return False
prefill_meta = model_input.attn_metadata.prefill_metadata prefill_meta = model_input.attn_metadata.prefill_metadata
diff --git a/vllm/worker/worker.py b/vllm/worker/worker.py diff --git a/vllm/worker/worker.py b/vllm/worker/worker.py
index 582aa460..e4ed902e 100644 index 582aa460..36a21d10 100644
--- a/vllm/worker/worker.py --- a/vllm/worker/worker.py
+++ b/vllm/worker/worker.py +++ b/vllm/worker/worker.py
@@ -2,7 +2,7 @@ @@ -2,7 +2,7 @@
...@@ -3696,7 +3696,7 @@ index 582aa460..e4ed902e 100644 ...@@ -3696,7 +3696,7 @@ index 582aa460..e4ed902e 100644
from vllm.worker.pooling_model_runner import PoolingModelRunner from vllm.worker.pooling_model_runner import PoolingModelRunner
from vllm.worker.worker_base import (LocalOrDistributedWorkerBase, WorkerBase, from vllm.worker.worker_base import (LocalOrDistributedWorkerBase, WorkerBase,
WorkerInput) WorkerInput)
+from vllm.distributed.device_communicators.nixl import DynemoNixlConnector +from vllm.distributed.device_communicators.nixl import DynamoNixlConnector
+ +
logger = init_logger(__name__) logger = init_logger(__name__)
...@@ -3710,7 +3710,7 @@ index 582aa460..e4ed902e 100644 ...@@ -3710,7 +3710,7 @@ index 582aa460..e4ed902e 100644
+ # TODO ptarasiewicz nixl can also support DRAM + # TODO ptarasiewicz nixl can also support DRAM
+ assert self.device_config.device_type == "cuda", "Currently only CUDA is supported for Nixl connector" + assert self.device_config.device_type == "cuda", "Currently only CUDA is supported for Nixl connector"
+ +
+ self.nixl_connector = DynemoNixlConnector(self.vllm_config, engine_id, self.local_rank) # TODO ptarasiewicz: rank or local_rank? + self.nixl_connector = DynamoNixlConnector(self.vllm_config, engine_id, self.local_rank) # TODO ptarasiewicz: rank or local_rank?
+ assert len(self.cache_engine) == 1, "Only one cache engine is supported for now" + assert len(self.cache_engine) == 1, "Only one cache engine is supported for now"
+ self.nixl_connector.register_kv_caches(self.cache_engine[0].gpu_cache) + self.nixl_connector.register_kv_caches(self.cache_engine[0].gpu_cache)
+ return self.nixl_connector.agent_name + return self.nixl_connector.agent_name
...@@ -3766,7 +3766,7 @@ index 582aa460..e4ed902e 100644 ...@@ -3766,7 +3766,7 @@ index 582aa460..e4ed902e 100644
@torch.inference_mode() @torch.inference_mode()
diff --git a/vllm/worker/worker_base.py b/vllm/worker/worker_base.py diff --git a/vllm/worker/worker_base.py b/vllm/worker/worker_base.py
index 819b81fb..8dfdadde 100644 index 819b81fb..ff43dadc 100644
--- a/vllm/worker/worker_base.py --- a/vllm/worker/worker_base.py
+++ b/vllm/worker/worker_base.py +++ b/vllm/worker/worker_base.py
@@ -9,6 +9,7 @@ from typing import Any, Dict, List, Optional, Set, Tuple, Type, Union @@ -9,6 +9,7 @@ from typing import Any, Dict, List, Optional, Set, Tuple, Type, Union
...@@ -3781,7 +3781,7 @@ index 819b81fb..8dfdadde 100644 ...@@ -3781,7 +3781,7 @@ index 819b81fb..8dfdadde 100644
from vllm.worker.model_runner_base import (BroadcastableModelInput, from vllm.worker.model_runner_base import (BroadcastableModelInput,
ModelRunnerBase, ModelRunnerBase,
ModelRunnerInputBase) ModelRunnerInputBase)
+from vllm.distributed.device_communicators.nixl import DynemoNixlConnector +from vllm.distributed.device_communicators.nixl import DynamoNixlConnector
logger = init_logger(__name__) logger = init_logger(__name__)
...@@ -3789,7 +3789,7 @@ index 819b81fb..8dfdadde 100644 ...@@ -3789,7 +3789,7 @@ index 819b81fb..8dfdadde 100644
from vllm.platforms import current_platform from vllm.platforms import current_platform
self.current_platform = current_platform self.current_platform = current_platform
+ self.nixl_connector: Optional[DynemoNixlConnector] = None + self.nixl_connector: Optional[DynamoNixlConnector] = None
+ +
@abstractmethod @abstractmethod
def init_device(self) -> None: def init_device(self) -> None:
......
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
apiVersion: v2 apiVersion: v2
appVersion: 1.0.0 appVersion: 1.0.0
description: Distributed Neural Models (dynemo) Component description: Distributed Neural Models (dynamo) Component
icon: https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png icon: https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png
name: dynemo_component name: dynamo_component
version: 1.0.0 version: 1.0.0
...@@ -15,7 +15,7 @@ ...@@ -15,7 +15,7 @@
# Annotation Groups # Annotation Groups
{{- define "nvidia.annotations.default" }} {{- define "nvidia.annotations.default" }}
dynemo: "{{ .Release.Name }}.{{ .Chart.AppVersion | default "0.0" }}" dynamo: "{{ .Release.Name }}.{{ .Chart.AppVersion | default "0.0" }}"
{{- with .Values.kubernetes }} {{- with .Values.kubernetes }}
{{- with .annotations }} {{- with .annotations }}
{{ toYaml . }} {{ toYaml . }}
...@@ -54,7 +54,7 @@ app.kubernetes.io/instance: {{ .Release.Name }} ...@@ -54,7 +54,7 @@ app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }} {{- end }}
{{- define "nvidia.label.appManagedBy" }} {{- define "nvidia.label.appManagedBy" }}
{{- $service_name := "dynemo" }} {{- $service_name := "dynamo" }}
{{- with .Release.Service }} {{- with .Release.Service }}
{{- $service_name = . }} {{- $service_name = . }}
{{- end }} {{- end }}
...@@ -66,7 +66,7 @@ app.kubernetes.io/name: {{ required "Property '.component.name' is required." .V ...@@ -66,7 +66,7 @@ app.kubernetes.io/name: {{ required "Property '.component.name' is required." .V
{{- end }} {{- end }}
{{- define "nvidia.label.appPartOf" }} {{- define "nvidia.label.appPartOf" }}
{{- $part_of := "dynemo" }} {{- $part_of := "dynamo" }}
{{- with .Values.kubernetes }} {{- with .Values.kubernetes }}
{{- with .partOf }} {{- with .partOf }}
{{- $part_of = . }} {{- $part_of = . }}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment