Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
602352ce
Commit
602352ce
authored
Mar 08, 2025
by
Neelay Shah
Committed by
GitHub
Mar 08, 2025
Browse files
chore: rename dynamo (#44)
Co-authored-by:
Biswa Panda
<
biswa.panda@gmail.com
>
parent
ecf53ce2
Changes
431
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
166 additions
and
168 deletions
+166
-168
.dockerignore
.dockerignore
+1
-1
.github/workflows/copyright-checks.yml
.github/workflows/copyright-checks.yml
+1
-1
.github/workflows/pre-merge-rust.yml
.github/workflows/pre-merge-rust.yml
+1
-1
ATTRIBUTIONS.md
ATTRIBUTIONS.md
+1
-1
CODEOWNERS
CODEOWNERS
+1
-1
README.md
README.md
+12
-12
codespell.txt
codespell.txt
+1
-2
components/metrics/Cargo.lock
components/metrics/Cargo.lock
+5
-5
components/metrics/Cargo.toml
components/metrics/Cargo.toml
+2
-2
components/metrics/README.md
components/metrics/README.md
+5
-5
components/metrics/src/bin/mock_worker.rs
components/metrics/src/bin/mock_worker.rs
+3
-3
components/metrics/src/lib.rs
components/metrics/src/lib.rs
+4
-4
components/metrics/src/main.rs
components/metrics/src/main.rs
+4
-4
container/Dockerfile
container/Dockerfile
+21
-21
container/Dockerfile.vllm
container/Dockerfile.vllm
+22
-22
container/Dockerfile.vllm_nixl
container/Dockerfile.vllm_nixl
+21
-22
container/deps/clone_tensorrtllm.sh
container/deps/clone_tensorrtllm.sh
+4
-4
container/deps/vllm/vllm_v0.7.2-dynamo-kv-disagg-patch.patch
container/deps/vllm/vllm_v0.7.2-dynamo-kv-disagg-patch.patch
+52
-52
deploy/Kubernetes/common/chart/Chart.yaml
deploy/Kubernetes/common/chart/Chart.yaml
+2
-2
deploy/Kubernetes/common/chart/templates/_helpers.tpl
deploy/Kubernetes/common/chart/templates/_helpers.tpl
+3
-3
No files found.
.dockerignore
View file @
602352ce
...
...
@@ -19,7 +19,7 @@
**/*.plan
**/.cache/*
**/*onnx*
# Engine must be allowed because code contains dyn
e
mo_engine.py
# Engine must be allowed because code contains dyn
a
mo_engine.py
**/*tensorrtllm_engines*
**/*tensorrtllm_models*
**/*tensorrtllm_checkpoints*
...
...
.github/workflows/copyright-checks.yml
View file @
602352ce
...
...
@@ -23,4 +23,4 @@ jobs:
env
:
NVBUILD_VERBOSITY
:
DETAILED
timeout-minutes
:
2
working-directory
:
/workspace
\ No newline at end of file
working-directory
:
/workspace
.github/workflows/pre-merge-rust.yml
View file @
602352ce
...
...
@@ -40,7 +40,7 @@ jobs:
pre-merge-rust
:
runs-on
:
ubuntu-latest
strategy
:
matrix
:
{
dir
:
[
'
lib/runtime'
,
'
lib/llm'
,
'
lib/bindings/c'
,
'
lib/bindings/python'
,
'
launch/dyn
e
mo-run'
,
'
components/metrics'
,
'
examples/rust'
]
}
matrix
:
{
dir
:
[
'
lib/runtime'
,
'
lib/llm'
,
'
lib/bindings/c'
,
'
lib/bindings/python'
,
'
launch/dyn
a
mo-run'
,
'
components/metrics'
,
'
examples/rust'
]
}
permissions
:
contents
:
read
steps
:
...
...
ATTRIBUTIONS.md
View file @
602352ce
...
...
@@ -17,7 +17,7 @@ limitations under the License.
# Open Source License Attribution
Dyn
e
mo uses Open Source components. You can find the details of these open-source projects along with license information below.
Dyn
a
mo uses Open Source components. You can find the details of these open-source projects along with license information below.
We are grateful to the developers for their contributions to open source and acknowledge these below.
## nats-py - [Apache License 2.0](https://github.com/nats-io/nats.py/blob/main/LICENSE)
...
...
CODEOWNERS
View file @
602352ce
# CODEOWNERS file for Dyn
e
mo
# CODEOWNERS file for Dyn
a
mo
#
# For more information about CODEOWNERS files, see:
# https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
...
...
README.md
View file @
602352ce
...
...
@@ -15,17 +15,17 @@ See the License for the specific language governing permissions and
limitations under the License.
-->
# Dyn
e
mo
# Dyn
a
mo
<h4>
A Datacenter Scale Distributed Inference Serving Framework
</h4>
[

](https://opensource.org/licenses/Apache-2.0)
[

](https://github.com/dynemo-ai/dynemo/releases/latest)
[

](https://github.com/dynemo-ai/dynemo/releases/latest)
Dyn
e
mo is a flexible, component based, data center scale
Dyn
a
mo is a flexible, component based, data center scale
inference serving framework designed to leverage the strengths of the
standalone Dyn
e
mo Inference Server while expanding its capabilities
standalone Dyn
a
mo Inference Server while expanding its capabilities
to meet the demands of complex use cases including those of Generative
AI. It is designed to enable developers to implement and customize
routing, load balancing, scaling and workflow definitions at the data
...
...
@@ -36,17 +36,17 @@ center scale without sacrificing performance or ease of use.
> rapid-prototyping stage and we are actively looking for feedback and
> collaborators.
## Building Dyn
e
mo
## Building Dyn
a
mo
### Requirements
Dyn
e
mo development and examples are container based.
Dyn
a
mo development and examples are container based.
*
[
Docker
](
https://docs.docker.com/get-started/get-docker/
)
*
[
buildx
](
https://github.com/docker/buildx
)
### Development
You can build the Dyn
e
mo container using the build scripts
You can build the Dyn
a
mo container using the build scripts
in
`container/`
(or directly with
`docker build`
).
We provide 3 types of builds:
...
...
@@ -62,9 +62,9 @@ For example, if you want to build a container for the `STANDARD` backends you ca
Please see the instructions in the corresponding example for specific build instructions.
## Running Dyn
e
mo for Local Testing and Development
## Running Dyn
a
mo for Local Testing and Development
You can run the Dyn
e
mo container using the run scripts in
You can run the Dyn
a
mo container using the run scripts in
`container/`
(or directly with
`docker run`
).
The run script offers a few common workflows:
...
...
@@ -72,7 +72,7 @@ The run script offers a few common workflows:
1.
Running a command in a container and exiting.
```
./container/run.sh -- python3 -c "import dyn
e
mo.runtime; help(dyn
e
mo.runtime)"
./container/run.sh -- python3 -c "import dyn
a
mo.runtime; help(dyn
a
mo.runtime)"
```
2.
Starting an interactive shell.
...
...
@@ -95,7 +95,7 @@ deployment instructions.
## Rust Based Runtime
Dyn
e
mo has a new rust based distributed runtime with
Dyn
a
mo has a new rust based distributed runtime with
implementation under development. The rust based runtime enables
serving arbitrary python code as well as native rust. Please note the
APIs are subject to change.
...
...
@@ -114,7 +114,7 @@ bindings.
An intermediate example expanding further on the concepts introduced
in the Hello World example. In this example, we demonstrate
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
application of the components defined in Dyn
e
mo.
application of the components defined in Dyn
a
mo.
# Disclaimers
...
...
codespell.txt
View file @
602352ce
dynamo->dynemo
dynmo->dynemo
dynmo->dynamo
components/metrics/Cargo.lock
View file @
602352ce
...
...
@@ -1005,7 +1005,7 @@ dependencies = [
]
[[package]]
name = "dyn
e
mo-llm"
name = "dyn
a
mo-llm"
version = "0.2.1"
dependencies = [
"anyhow",
...
...
@@ -1020,7 +1020,7 @@ dependencies = [
"chrono",
"cmake",
"derive_builder",
"dyn
e
mo-runtime",
"dyn
a
mo-runtime",
"either",
"erased-serde",
"futures",
...
...
@@ -1054,7 +1054,7 @@ dependencies = [
]
[[package]]
name = "dyn
e
mo-runtime"
name = "dyn
a
mo-runtime"
version = "0.2.1"
dependencies = [
"anyhow",
...
...
@@ -2202,8 +2202,8 @@ dependencies = [
"async-nats",
"axum 0.6.20",
"clap",
"dyn
e
mo-llm",
"dyn
e
mo-runtime",
"dyn
a
mo-llm",
"dyn
a
mo-runtime",
"futures",
"opentelemetry",
"opentelemetry-prometheus",
...
...
components/metrics/Cargo.toml
View file @
602352ce
...
...
@@ -22,8 +22,8 @@ license = "Apache-2.0"
[dependencies]
# local
dyn
e
mo-runtime
=
{
path
=
"../../lib/runtime"
}
dyn
e
mo-llm
=
{
path
=
"../../lib/llm"
}
dyn
a
mo-runtime
=
{
path
=
"../../lib/runtime"
}
dyn
a
mo-llm
=
{
path
=
"../../lib/llm"
}
# workspace - todo
...
...
components/metrics/README.md
View file @
602352ce
...
...
@@ -12,16 +12,16 @@ This will:
For example:
```
bash
# For more details, try DYN_LOG=debug
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dyn
e
mo
--component
backend
--endpoint
generate
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dyn
a
mo
--component
backend
--endpoint
generate
# 2025-02-26T18:45:05.467026Z INFO metrics: Creating unique instance of Metrics at dyn
e
mo/components/metrics/instance
# 2025-02-26T18:45:05.472146Z INFO metrics: Scraping service dyn
e
mo_backend_720278f8 and filtering on subject dyn
e
mo_backend_720278f8.generate
# 2025-02-26T18:45:05.467026Z INFO metrics: Creating unique instance of Metrics at dyn
a
mo/components/metrics/instance
# 2025-02-26T18:45:05.472146Z INFO metrics: Scraping service dyn
a
mo_backend_720278f8 and filtering on subject dyn
a
mo_backend_720278f8.generate
# ...
```
With no matching endpoints running to collect stats from, you should see warnings in the logs:
```
bash
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dyn
e
mo_backend_720278f8.generate
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dyn
a
mo_backend_720278f8.generate
```
After a matching endpoint gets started, you should see the warnings stop
...
...
@@ -30,7 +30,7 @@ when the endpoint gets automatically discovered.
When stats are found from target endpoints, the metrics component will
aggregate them and publish them to a prometheus server running on
`localhost:9091/metrics`
by default:
```
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dyn
e
mo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dyn
e
mo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dyn
a
mo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dyn
a
mo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
```
To see the metrics being published in prometheus format, you can run:
...
...
components/metrics/src/bin/mock_worker.rs
View file @
602352ce
...
...
@@ -14,10 +14,10 @@
// limitations under the License.
use
async_nats
::
service
::
endpoint
::
Stats
;
use
dyn
e
mo_llm
::
kv_router
::{
use
dyn
a
mo_llm
::
kv_router
::{
protocols
::
ForwardPassMetrics
,
scheduler
::
KVHitRateEvent
,
KV_HIT_RATE_SUBJECT
,
};
use
dyn
e
mo_runtime
::{
use
dyn
a
mo_runtime
::{
component
::
Namespace
,
logging
,
pipeline
::{
...
...
@@ -123,7 +123,7 @@ fn mock_stats_handler(_stats: Stats) -> serde_json::Value {
}
async
fn
backend
(
runtime
:
DistributedRuntime
)
->
Result
<
()
>
{
let
namespace
=
runtime
.namespace
(
"dyn
e
mo"
)
?
;
let
namespace
=
runtime
.namespace
(
"dyn
a
mo"
)
?
;
// Spawn background task for publishing KV hit rate events
let
namespace_clone
=
namespace
.clone
();
...
...
components/metrics/src/lib.rs
View file @
602352ce
...
...
@@ -20,11 +20,11 @@ use prometheus::{register_counter_vec, register_gauge_vec};
use
serde
::{
Deserialize
,
Serialize
};
use
std
::
net
::
SocketAddr
;
use
dyn
e
mo_llm
::
kv_router
::
protocols
::
ForwardPassMetrics
;
use
dyn
e
mo_llm
::
kv_router
::
scheduler
::
Endpoint
;
use
dyn
e
mo_llm
::
kv_router
::
scoring
::
ProcessedEndpoints
;
use
dyn
a
mo_llm
::
kv_router
::
protocols
::
ForwardPassMetrics
;
use
dyn
a
mo_llm
::
kv_router
::
scheduler
::
Endpoint
;
use
dyn
a
mo_llm
::
kv_router
::
scoring
::
ProcessedEndpoints
;
use
dyn
e
mo_runtime
::{
distributed
::
Component
,
service
::
EndpointInfo
,
utils
::
Duration
,
Result
};
use
dyn
a
mo_runtime
::{
distributed
::
Component
,
service
::
EndpointInfo
,
utils
::
Duration
,
Result
};
/// Configuration for LLM worker load capacity metrics
#[derive(Debug,
Clone,
Serialize,
Deserialize)]
...
...
components/metrics/src/main.rs
View file @
602352ce
...
...
@@ -27,9 +27,9 @@
//! - ISL Blocks: Cumulative count of total blocks in all KV hit rate events
//! - Overlap Blocks: Cumulative count of blocks that were already in the KV cache
use
clap
::
Parser
;
use
dyn
e
mo_llm
::
kv_router
::
scheduler
::
KVHitRateEvent
;
use
dyn
e
mo_llm
::
kv_router
::
KV_HIT_RATE_SUBJECT
;
use
dyn
e
mo_runtime
::{
use
dyn
a
mo_llm
::
kv_router
::
scheduler
::
KVHitRateEvent
;
use
dyn
a
mo_llm
::
kv_router
::
KV_HIT_RATE_SUBJECT
;
use
dyn
a
mo_runtime
::{
error
,
logging
,
traits
::
events
::{
EventPublisher
,
EventSubscriber
},
utils
::{
Duration
,
Instant
},
...
...
@@ -57,7 +57,7 @@ struct Args {
endpoint
:
String
,
/// Namespace to operate in
#[arg(long,
env
=
"DYN_NAMESPACE"
,
default_value
=
"dyn
e
mo"
)]
#[arg(long,
env
=
"DYN_NAMESPACE"
,
default_value
=
"dyn
a
mo"
)]
namespace
:
String
,
/// Polling interval in seconds (minimum 1 second)
...
...
container/Dockerfile
View file @
602352ce
...
...
@@ -16,7 +16,7 @@
ARG
BASE_IMAGE="nvcr.io/nvidia/tritonserver"
ARG
BASE_IMAGE_TAG="25.01-py3"
FROM
${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dyn
e
mo
FROM
${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dyn
a
mo
# TODO: non root user by default
...
...
@@ -34,7 +34,7 @@ RUN rustup toolchain install 1.85.0-x86_64-unknown-linux-gnu
# Install OpenAI-compatible frontend and its dependencies from triton server
# repository. These are used to have a consistent interface, schema, and FastAPI
# app between Triton Core and Dyn
e
mo implementations.
# app between Triton Core and Dyn
a
mo implementations.
ARG
OPENAI_SERVER_TAG="r25.01"
RUN
mkdir
-p
/opt/tritonserver/python
&&
\
cd
/opt/tritonserver/python
&&
\
...
...
@@ -78,7 +78,7 @@ ARG TENSORRTLLM_SKIP_CLONE=
ENV
FRAMEWORK=${FRAMEWORK}
RUN
--mount
=
type
=
bind
,source
=
./container/deps/requirements.tensorrtllm.txt,target
=
/tmp/requirements.txt
\
--mount
=
type
=
bind
,source
=
./container/deps/clone_tensorrtllm.sh,target
=
/tmp/clone_tensorrtllm.sh
\
if
[[
"
$FRAMEWORK
"
==
"TENSORRTLLM"
]]
;
then
pip
install
--timeout
=
2000
-r
/tmp/requirements.txt
;
if
[
${
TENSORRTLLM_SKIP_CLONE
}
-ne
1
]
;
then
/tmp/clone_tensorrtllm.sh
--tensorrtllm-backend-repo-tag
${
TENSORRTLLM_BACKEND_REPO_TAG
}
--tensorrtllm-backend-rebuild
${
TENSORRTLLM_BACKEND_REBUILD
}
--dyn
e
mo-llm-path
/opt/dyn
e
mo/llm_binding
;
fi
;
fi
if
[[
"
$FRAMEWORK
"
==
"TENSORRTLLM"
]]
;
then
pip
install
--timeout
=
2000
-r
/tmp/requirements.txt
;
if
[
${
TENSORRTLLM_SKIP_CLONE
}
-ne
1
]
;
then
/tmp/clone_tensorrtllm.sh
--tensorrtllm-backend-repo-tag
${
TENSORRTLLM_BACKEND_REPO_TAG
}
--tensorrtllm-backend-rebuild
${
TENSORRTLLM_BACKEND_REBUILD
}
--dyn
a
mo-llm-path
/opt/dyn
a
mo/llm_binding
;
fi
;
fi
RUN
--mount
=
type
=
bind
,source
=
./container/deps/requirements.standard.txt,target
=
/tmp/requirements.txt
\
...
...
@@ -106,7 +106,7 @@ ENV VLLM_GENERATE_WORKERS=${VLLM_FRAMEWORK:+1}
ENV
VLLM_BASELINE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_CONTEXT_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_GENERATE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV
VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
ENV
PYTHONUNBUFFERED=1
# Install NATS - pointing toward NATS github instead of binaries.nats.dev due to server instability
...
...
@@ -154,7 +154,7 @@ RUN cd examples/rust && \
cp
target/release/http /usr/local/bin/
&&
\
cp
target/release/llmctl /usr/local/bin/
COPY
deploy/dyn
e
mo/sdk /workspace/deploy/dyn
e
mo/sdk
COPY
deploy/dyn
a
mo/sdk /workspace/deploy/dyn
a
mo/sdk
# Generate C bindings. Note that this is required for TRTLLM backend re-build
...
...
@@ -162,30 +162,30 @@ COPY lib/bindings /workspace/lib/bindings
RUN
cd
lib/bindings/c/
&&
\
cargo build
--release
--locked
&&
cargo doc
--no-deps
# Install uv, create virtualenv for general use, and build dyn
e
mo wheel
# Install uv, create virtualenv for general use, and build dyn
a
mo wheel
COPY
--from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN
mkdir
/opt/dyn
e
mo
&&
\
uv venv /opt/dyn
e
mo/venv
--python
3.12
&&
\
source
/opt/dyn
e
mo/venv/bin/activate
&&
\
RUN
mkdir
/opt/dyn
a
mo
&&
\
uv venv /opt/dyn
a
mo/venv
--python
3.12
&&
\
source
/opt/dyn
a
mo/venv/bin/activate
&&
\
uv build
--wheel
--out-dir
/workspace/dist
&&
\
uv pip
install
/workspace/dist/dyn
e
mo
*
cp312
*
.whl
&&
\
cd
/workspace/deploy/dyn
e
mo/sdk
&&
\
uv pip
install
/workspace/dist/dyn
a
mo
*
cp312
*
.whl
&&
\
cd
/workspace/deploy/dyn
a
mo/sdk
&&
\
uv build
--wheel
--out-dir
/workspace/dist
&&
\
uv pip
install
/workspace/dist/dyn
e
mo_sdk
*
any.whl
uv pip
install
/workspace/dist/dyn
a
mo_sdk
*
any.whl
# Package the bindings
RUN
mkdir
-p
/opt/dyn
e
mo/bindings/wheels
&&
\
mkdir
/opt/dyn
e
mo/bindings/lib
&&
\
cp
dist/dyn
e
mo
*
cp312
*
.whl /opt/dyn
e
mo/bindings/wheels/.
&&
\
cp
lib/bindings/c/target/release/libdyn
e
mo_llm_capi.so /opt/dyn
e
mo/bindings/lib/.
&&
\
cp
-r
lib/bindings/c/include /opt/dyn
e
mo/bindings/.
RUN
mkdir
-p
/opt/dyn
a
mo/bindings/wheels
&&
\
mkdir
/opt/dyn
a
mo/bindings/lib
&&
\
cp
dist/dyn
a
mo
*
cp312
*
.whl /opt/dyn
a
mo/bindings/wheels/.
&&
\
cp
lib/bindings/c/target/release/libdyn
a
mo_llm_capi.so /opt/dyn
a
mo/bindings/lib/.
&&
\
cp
-r
lib/bindings/c/include /opt/dyn
a
mo/bindings/.
# Install dyn
e
mo.runtime and dyn
e
mo.llm wheels globally in container for tests that
# Install dyn
a
mo.runtime and dyn
a
mo.llm wheels globally in container for tests that
# currently run without virtual environment activated.
# TODO: In future, we may use a virtualenv for everything and remove this.
RUN
cd
/opt/dyn
e
mo/bindings/wheels
&&
\
pip
install
dyn
e
mo
*
cp312
*
.whl
&&
\
pip
install
/workspace/dist/dyn
e
mo_sdk
*
any.whl
RUN
cd
/opt/dyn
a
mo/bindings/wheels
&&
\
pip
install
dyn
a
mo
*
cp312
*
.whl
&&
\
pip
install
/workspace/dist/dyn
a
mo_sdk
*
any.whl
# Copy everything in after ginstall steps to avoid re-running build/install
# commands on unrelated changes in other dirs.
...
...
container/Dockerfile.vllm
View file @
602352ce
...
...
@@ -24,17 +24,17 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
# Install uv and create virtualenv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dyn
e
mo && \
uv venv /opt/dyn
e
mo/venv --python 3.12
RUN mkdir /opt/dyn
a
mo && \
uv venv /opt/dyn
a
mo/venv --python 3.12
# Activate virtual environment
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Install patched vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes
ARG VLLM_REF="v0.7.2"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
e
mo-kv-disagg-patch.patch"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
a
mo-kv-disagg-patch.patch"
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
...
...
@@ -92,7 +92,7 @@ RUN cd examples/rust && \
cp target/release/http /usr/local/bin/ && \
cp target/release/llmctl /usr/local/bin/
# TODO: Build dyn
e
mo-run
# TODO: Build dyn
a
mo-run
# COPY applications/...
# Generate C bindings for kv cache routing in vLLM
...
...
@@ -100,29 +100,29 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps
COPY deploy/dyn
e
mo/sdk /workspace/deploy/dyn
e
mo/sdk
# Build dyn
e
mo wheel
RUN source /opt/dyn
e
mo/venv/bin/activate && \
COPY deploy/dyn
a
mo/sdk /workspace/deploy/dyn
a
mo/sdk
# Build dyn
a
mo wheel
RUN source /opt/dyn
a
mo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo*cp312*.whl && \
cd /workspace/deploy/dyn
e
mo/sdk && \
uv pip install /workspace/dist/dyn
a
mo*cp312*.whl && \
cd /workspace/deploy/dyn
a
mo/sdk && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo_sdk*any.whl
uv pip install /workspace/dist/dyn
a
mo_sdk*any.whl
# Package the bindings
RUN mkdir -p /opt/dyn
e
mo/bindings/wheels && \
mkdir /opt/dyn
e
mo/bindings/lib && \
cp dist/dyn
e
mo*cp312*.whl /opt/dyn
e
mo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdyn
e
mo_llm_capi.so /opt/dyn
e
mo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dyn
e
mo/bindings/.
RUN mkdir -p /opt/dyn
a
mo/bindings/wheels && \
mkdir /opt/dyn
a
mo/bindings/lib && \
cp dist/dyn
a
mo*cp312*.whl /opt/dyn
a
mo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdyn
a
mo_llm_capi.so /opt/dyn
a
mo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dyn
a
mo/bindings/.
# Tell vllm to use the Dyn
e
mo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
# Tell vllm to use the Dyn
a
mo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# FIXME: Copy more specific folders in for dev/debug after directory restructure
COPY . /workspace
# FIXME: May want a modification with dyn
e
mo
-distributed
banner on entry
# FIXME: May want a modification with dyn
a
mo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
...
...
@@ -140,10 +140,10 @@ RUN apt update -y && \
echo "set -g mouse on" >> /root/.tmux.conf
# Set environment variables
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# Copy binaries
COPY --from=dev /usr/local/bin/http /usr/local/bin/http
...
...
@@ -170,7 +170,7 @@ COPY examples/python_rs/llm/vllm /workspace/examples/python_rs/llm/vllm
WORKDIR /workspace
# FIXME: May want a modification with dyn
e
mo
-distributed
banner on entry
# FIXME: May want a modification with dyn
a
mo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
container/Dockerfile.vllm_nixl
View file @
602352ce
...
...
@@ -151,11 +151,11 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
# Install uv and create virtualenv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dyn
e
mo && \
uv venv /opt/dyn
e
mo/venv --python 3.12
RUN mkdir /opt/dyn
a
mo && \
uv venv /opt/dyn
a
mo/venv --python 3.12
# Activate virtual environment
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Common dependencies
...
...
@@ -165,7 +165,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi
# Install patched vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes
ARG VLLM_REF="v0.7.2"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
e
mo-kv-disagg-patch.patch"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
a
mo-kv-disagg-patch.patch"
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
...
...
@@ -230,30 +230,29 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps
COPY deploy/dyn
e
mo/sdk /workspace/deploy/dyn
e
mo/sdk
# Build dyn
e
mo wheel
RUN source /opt/dyn
e
mo/venv/bin/activate && \
COPY deploy/dyn
a
mo/sdk /workspace/deploy/dyn
a
mo/sdk
# Build dyn
a
mo wheel
RUN source /opt/dyn
a
mo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo*cp312*.whl && \
cd /workspace/deploy/dyn
e
mo/sdk && \
uv pip install /workspace/dist/dyn
a
mo*cp312*.whl && \
cd /workspace/deploy/dyn
a
mo/sdk && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo_sdk*any.whl
uv pip install /workspace/dist/dyn
a
mo_sdk*any.whl
# Package the bindings
RUN mkdir -p /opt/dyn
e
mo/bindings/wheels && \
mkdir /opt/dyn
e
mo/bindings/lib && \
cp dist/dyn
e
mo*cp312*.whl /opt/dyn
e
mo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdyn
e
mo_llm_capi.so /opt/dyn
e
mo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dyn
e
mo/bindings/.
RUN mkdir -p /opt/dyn
a
mo/bindings/wheels && \
mkdir /opt/dyn
a
mo/bindings/lib && \
cp dist/dyn
a
mo*cp312*.whl /opt/dyn
a
mo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdyn
a
mo_llm_capi.so /opt/dyn
a
mo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dyn
a
mo/bindings/.
# Tell vllm to use the Dyn
e
mo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
# Tell vllm to use the Dyn
a
mo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# FIXME: Copy more specific folders in for dev/debug after directory restructure
COPY . /workspace
# FIXME: May want a modification with dynemo-distributed banner on entry
# FIXME: May want a modification with dynamo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
...
...
@@ -271,10 +270,10 @@ RUN apt update -y && \
echo "set -g mouse on" >> /root/.tmux.conf
# Set environment variables
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# Copy binaries
COPY --from=dev /usr/local/bin/http /usr/local/bin/http
...
...
@@ -301,7 +300,7 @@ COPY examples/python_rs/llm/vllm_nixl /workspace/examples/python_rs/llm/vllm_nix
WORKDIR /workspace
# FIXME: May want a modification with dyn
e
mo
-distributed
banner on entry
# FIXME: May want a modification with dyn
a
mo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
container/deps/clone_tensorrtllm.sh
View file @
602352ce
...
...
@@ -16,7 +16,7 @@
TENSORRTLLM_BACKEND_REPO_TAG
=
TENSORRTLLM_BACKEND_REBUILD
=
DYN
E
MO_LLM_PATH
=
DYN
A
MO_LLM_PATH
=
GIT_TOKEN
=
GIT_REPO
=
...
...
@@ -43,9 +43,9 @@ get_options() {
missing_requirement
$1
fi
;;
--dyn
e
mo-llm-path
)
--dyn
a
mo-llm-path
)
if
[
"
$2
"
]
;
then
DYN
E
MO_LLM_PATH
=
$2
DYN
A
MO_LLM_PATH
=
$2
shift
else
missing_requirement
$1
...
...
@@ -147,7 +147,7 @@ if [ ! -z ${TENSORRTLLM_BACKEND_REBUILD} ]; then
# Build the backend
(
cd
inflight_batcher_llm/src
\
&&
cmake
-DCMAKE_INSTALL_PREFIX
:PATH
=
`
pwd
`
/install
-DUSE_CXX11_ABI
=
1
-DDYN
E
MO_LLM_PATH
=
$DYN
E
MO_LLM_PATH
..
\
&&
cmake
-DCMAKE_INSTALL_PREFIX
:PATH
=
`
pwd
`
/install
-DUSE_CXX11_ABI
=
1
-DDYN
A
MO_LLM_PATH
=
$DYN
A
MO_LLM_PATH
..
\
&&
make
install
\
&&
cp
libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/
\
&&
cp
trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/
\
...
...
container/deps/vllm/vllm_v0.7.2-dyn
e
mo-kv-disagg-patch.patch
→
container/deps/vllm/vllm_v0.7.2-dyn
a
mo-kv-disagg-patch.patch
View file @
602352ce
diff --git a/vllm/config.py b/vllm/config.py
index 9ba49757..
3ec4bbab
100644
index 9ba49757..
5e1cf249
100644
--- a/vllm/config.py
+++ b/vllm/config.py
@@ -2620,6 +2620,9 @@
class KVTransferConfig(BaseModel):
...
...
@@ -41,7 +41,7 @@ index 9ba49757..3ec4bbab 100644
f"and `kv_both`")
- if self.kv_connector is not None and self.kv_role is None:
+ if self.kv_connector is not None and self.kv_connector != "Dyn
e
moNixlConnector" and self.kv_role is None:
+ if self.kv_connector is not None and self.kv_connector != "Dyn
a
moNixlConnector" and self.kv_role is None:
raise ValueError("Please specify kv_disagg_role when kv_connector "
"is set, supported roles are `kv_producer`, "
"`kv_consumer`, and `kv_both`")
...
...
@@ -54,7 +54,7 @@ index 9ba49757..3ec4bbab 100644
def need_kv_parallel_group(self) -> bool:
# for those database-based connector, vLLM does not need to create
# parallel group, and in that case the kv parallel size will be 1.
+ if self.kv_connector == "Dyn
e
moNixlConnector":
+ if self.kv_connector == "Dyn
a
moNixlConnector":
+ return False
return self.kv_connector is not None and self.kv_parallel_size > 1
...
...
@@ -271,7 +271,7 @@ index c5b3b04f..c72001f7 100644
self.block_tables: Dict[SeqId, BlockTable] = {}
diff --git a/vllm/core/event_manager.py b/vllm/core/event_manager.py
new file mode 100644
index 00000000..
8699ca06
index 00000000..
d3706700
--- /dev/null
+++ b/vllm/core/event_manager.py
@@ -0,0 +1,102 @@
...
...
@@ -287,7 +287,7 @@ index 00000000..8699ca06
+logger = logging.getLogger(__name__)
+
+
+class Dyn
e
moResult:
+class Dyn
a
moResult:
+ OK = 0
+ ERR = 1
+
...
...
@@ -300,12 +300,12 @@ index 00000000..8699ca06
+
+ try:
+ self.lib = ctypes.CDLL(lib_path)
+ self.lib.dyn
e
mo_llm_init.argtypes = [c_char_p, c_char_p, c_int64]
+ self.lib.dyn
e
mo_llm_init.restype = c_uint32
+ self.lib.dyn
a
mo_llm_init.argtypes = [c_char_p, c_char_p, c_int64]
+ self.lib.dyn
a
mo_llm_init.restype = c_uint32
+
+ result = self.lib.dyn
e
mo_llm_init(namespace.encode(),
+ result = self.lib.dyn
a
mo_llm_init(namespace.encode(),
+ component.encode(), worker_id)
+ if result == Dyn
e
moResult.OK:
+ if result == Dyn
a
moResult.OK:
+ logger.info(
+ "KVCacheEventManager initialized successfully. Ready to publish KV Cache Events"
+ )
...
...
@@ -316,7 +316,7 @@ index 00000000..8699ca06
+ print(f"Failed to load {lib_path}")
+ raise e
+
+ self.lib.dyn
e
mo_kv_event_publish_stored.argtypes = [
+ self.lib.dyn
a
mo_kv_event_publish_stored.argtypes = [
+ ctypes.c_uint64, # event_id
+ ctypes.POINTER(ctypes.c_uint32), # token_ids
+ ctypes.POINTER(ctypes.c_size_t), # num_block_tokens
...
...
@@ -325,14 +325,14 @@ index 00000000..8699ca06
+ ctypes.POINTER(ctypes.c_uint64), # parent_hash
+ ctypes.c_uint64, # lora_id
+ ]
+ self.lib.dyn
e
mo_kv_event_publish_stored.restype = ctypes.c_uint32 # dyn
e
mo_llm_result_t
+ self.lib.dyn
a
mo_kv_event_publish_stored.restype = ctypes.c_uint32 # dyn
a
mo_llm_result_t
+
+ self.lib.dyn
e
mo_kv_event_publish_removed.argtypes = [
+ self.lib.dyn
a
mo_kv_event_publish_removed.argtypes = [
+ ctypes.c_uint64, # event_id
+ ctypes.POINTER(ctypes.c_uint64), # block_ids
+ ctypes.c_size_t, # num_blocks
+ ]
+ self.lib.dyn
e
mo_kv_event_publish_removed.restype = ctypes.c_uint32 # dyn
e
mo_llm_result_t
+ self.lib.dyn
a
mo_kv_event_publish_removed.restype = ctypes.c_uint32 # dyn
a
mo_llm_result_t
+
+ self.event_id_counter = 0
+
...
...
@@ -346,7 +346,7 @@ index 00000000..8699ca06
+ if parent is not None else None)
+
+ # Publish the event
+ result = self.lib.dyn
e
mo_kv_event_publish_stored(
+ result = self.lib.dyn
a
mo_kv_event_publish_stored(
+ self.event_id_counter, # uint64_t event_id
+ token_ids_arr, # const uint32_t *token_ids
+ num_block_tokens, # const uintptr_t *num_block_tokens
...
...
@@ -356,7 +356,7 @@ index 00000000..8699ca06
+ 0, # uint64_t lora_id
+ )
+
+ if result == Dyn
e
moResult.OK:
+ if result == Dyn
a
moResult.OK:
+ logger.debug(f"Store - Published KV Event: {block.content_hash}")
+ else:
+ logger.debug(
...
...
@@ -365,13 +365,13 @@ index 00000000..8699ca06
+ self.event_id_counter += 1
+
+ def enqueue_removed_event(self, block_hash: PrefixHash):
+ result = self.lib.dyn
e
mo_kv_event_publish_removed(
+ result = self.lib.dyn
a
mo_kv_event_publish_removed(
+ self.event_id_counter,
+ (ctypes.c_uint64 * 1)(block_hash),
+ 1,
+ )
+
+ if result == Dyn
e
moResult.OK:
+ if result == Dyn
a
moResult.OK:
+ logger.debug(f"Remove - Published KV Event: {block_hash}")
+ else:
+ logger.debug(f"Remove - Failed to Publish KV Event: {block_hash}")
...
...
@@ -764,7 +764,7 @@ index 00000000..9b938039
\
No newline at end of file
diff --git a/vllm/distributed/device_communicators/nixl.py b/vllm/distributed/device_communicators/nixl.py
new file mode 100644
index 00000000..
523d58d4
index 00000000..
87020367
--- /dev/null
+++ b/vllm/distributed/device_communicators/nixl.py
@@ -0,0 +1,405 @@
...
...
@@ -799,7 +799,7 @@ index 00000000..523d58d4
+ num_blocks: int
+
+
+class Dyn
e
moNixlConnector:
+class Dyn
a
moNixlConnector:
+ def __init__(self, vllm_config: VllmConfig, engine_id: str, rank: int):
+ self.vllm_config = vllm_config
+ if NixlWrapper is None:
...
...
@@ -1173,11 +1173,11 @@ index 00000000..523d58d4
+ else:
+ self._transfers[req_id] = running_reqs
+ return done_req_ids
diff --git a/vllm/distributed/kv_transfer/kv_connector/dyn
e
mo_connector.py b/vllm/distributed/kv_transfer/kv_connector/dyn
e
mo_connector.py
diff --git a/vllm/distributed/kv_transfer/kv_connector/dyn
a
mo_connector.py b/vllm/distributed/kv_transfer/kv_connector/dyn
a
mo_connector.py
new file mode 100644
index 00000000..
2319867a
index 00000000..
7b3344f8
--- /dev/null
+++ b/vllm/distributed/kv_transfer/kv_connector/dyn
e
mo_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/dyn
a
mo_connector.py
@@ -0,0 +1,350 @@
+# SPDX-License-Identifier: Apache-2.0
+"""
...
...
@@ -1209,7 +1209,7 @@ index 00000000..2319867a
+logger = init_logger(__name__)
+
+
+class Dyn
e
moConnector(KVConnectorBase):
+class Dyn
a
moConnector(KVConnectorBase):
+
+ def __init__(
+ self,
...
...
@@ -1223,16 +1223,16 @@ index 00000000..2319867a
+ self.tp_size = config.parallel_config.tensor_parallel_size
+ self.rank = rank
+
+ if self.config.kv_connector != "Dyn
e
moNcclConnector":
+ raise NotImplementedError("Only Dyn
e
moNcclConnector is supported by the Dyn
e
moConnector class")
+ if self.config.kv_connector != "Dyn
a
moNcclConnector":
+ raise NotImplementedError("Only Dyn
a
moNcclConnector is supported by the Dyn
a
moConnector class")
+
+ from vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe import (
+ PyNcclPipe)
+ from vllm.distributed.kv_transfer.kv_pipe.dyn
e
mo_nccl_pipe import (
+ Dyn
e
moNcclDataPlane)
+ from vllm.distributed.kv_transfer.kv_pipe.dyn
a
mo_nccl_pipe import (
+ Dyn
a
moNcclDataPlane)
+
+ logger.info(
+ "Initializing Dyn
e
moNcclConnector under kv_transfer_config %s",
+ "Initializing Dyn
a
moNcclConnector under kv_transfer_config %s",
+ self.config)
+
+ self.lookup_buffer_size = self.config.kv_buffer_size
...
...
@@ -1264,7 +1264,7 @@ index 00000000..2319867a
+ port_offset=port_offset_base,
+ )
+
+ self.data_plane = Dyn
e
moNcclDataPlane(
+ self.data_plane = Dyn
a
moNcclDataPlane(
+ data_pipe=self.data_pipe,
+ port=self._get_data_plane_port(self.global_kv_rank),
+ )
...
...
@@ -1530,7 +1530,7 @@ index 00000000..2319867a
+ self.config.kv_consumers_pipeline_parallel_size = kv_config_enhanced["kv_consumers_pipeline_parallel_size"]
+ self.config.kv_producers_parallel_size = kv_config_enhanced["kv_producers_parallel_size"]
diff --git a/vllm/distributed/kv_transfer/kv_connector/factory.py b/vllm/distributed/kv_transfer/kv_connector/factory.py
index fe480533..
f4775663
100644
index fe480533..
c82fda80
100644
--- a/vllm/distributed/kv_transfer/kv_connector/factory.py
+++ b/vllm/distributed/kv_transfer/kv_connector/factory.py
@@ -27,13 +27,13 @@
class KVConnectorFactory:
...
...
@@ -1555,11 +1555,11 @@ index fe480533..f4775663 100644
"SimpleConnector")
+
+KVConnectorFactory.register_connector(
+ "Dyn
e
moNcclConnector",
+ "vllm.distributed.kv_transfer.kv_connector.dyn
e
mo_connector",
+ "Dyn
e
moConnector")
+ "Dyn
a
moNcclConnector",
+ "vllm.distributed.kv_transfer.kv_connector.dyn
a
mo_connector",
+ "Dyn
a
moConnector")
diff --git a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
index 2033e976..
e0537903
100644
index 2033e976..
ddebb68e
100644
--- a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
@@ -8,13 +8,15 @@
MooncakePipe.
...
...
@@ -1886,7 +1886,7 @@ index 2033e976..e0537903 100644
+ world_group.broadcast_object(kv_config_enhanced)
+
+ else:
+ raise NotImplementedError("MooncakeConnector is not supported in Dyn
e
mo patch")
+ raise NotImplementedError("MooncakeConnector is not supported in Dyn
a
mo patch")
+ else:
+ kv_config_enhanced = world_group.broadcast_object()
+ logger.info("kv_config_enhanced: %s", kv_config_enhanced)
...
...
@@ -2175,11 +2175,11 @@ index 40589fb3..da2829cf 100644
"""Receive a tensor (can be None) from the pipeline.
Returns:
diff --git a/vllm/distributed/kv_transfer/kv_pipe/dyn
e
mo_nccl_pipe.py b/vllm/distributed/kv_transfer/kv_pipe/dyn
e
mo_nccl_pipe.py
diff --git a/vllm/distributed/kv_transfer/kv_pipe/dyn
a
mo_nccl_pipe.py b/vllm/distributed/kv_transfer/kv_pipe/dyn
a
mo_nccl_pipe.py
new file mode 100644
index 00000000..
58d0d28c
index 00000000..
3ee0fa78
--- /dev/null
+++ b/vllm/distributed/kv_transfer/kv_pipe/dyn
e
mo_nccl_pipe.py
+++ b/vllm/distributed/kv_transfer/kv_pipe/dyn
a
mo_nccl_pipe.py
@@ -0,0 +1,124 @@
+import logging
+import threading
...
...
@@ -2195,7 +2195,7 @@ index 00000000..58d0d28c
+logger = logging.getLogger(__name__)
+
+
+class Dyn
e
moNcclDataPlane:
+class Dyn
a
moNcclDataPlane:
+ def __init__(
+ self,
+ data_pipe: PyNcclPipe,
...
...
@@ -2531,7 +2531,7 @@ index 321902d1..b8937ef8 100644
def ensure_model_parallel_initialized(
diff --git a/vllm/engine/llm_engine.py b/vllm/engine/llm_engine.py
index d82d9ad9..
cc02b029
100644
index d82d9ad9..
53cace75
100644
--- a/vllm/engine/llm_engine.py
+++ b/vllm/engine/llm_engine.py
@@ -2,13 +2,17 @@
...
...
@@ -2614,7 +2614,7 @@ index d82d9ad9..cc02b029 100644
+ self.engine_id = str(uuid.uuid4())
+ self._nixl_agents_names: Optional[List[str]] = None
+ if self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector":
+ if self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector":
+ self._nixl_agents_names = self._initialize_nixl()
+
+ self._request_notif_counter = defaultdict(lambda: -self.parallel_config.tensor_parallel_size)
...
...
@@ -2946,7 +2946,7 @@ index 3cf1850e..6b90ece7 100644
+ kv_active_blocks: int
+ kv_total_blocks: int
diff --git a/vllm/engine/multiprocessing/client.py b/vllm/engine/multiprocessing/client.py
index 85b5f31e..
3f8b8fad
100644
index 85b5f31e..
da207947
100644
--- a/vllm/engine/multiprocessing/client.py
+++ b/vllm/engine/multiprocessing/client.py
@@ -8,6 +8,7 @@
from typing import (Any, AsyncGenerator, Dict, Iterator, List, Mapping,
...
...
@@ -3028,7 +3028,7 @@ index 85b5f31e..3f8b8fad 100644
+
+ @property
+ def using_nixl_connector(self) -> bool:
+ return self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector"
+ return self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector"
+
@staticmethod
def is_unsupported_config(engine_args: AsyncEngineArgs):
...
...
@@ -3656,7 +3656,7 @@ index 534b9e60..18675d2f 100644
@property
def is_first_multi_step(self) -> bool:
diff --git a/vllm/worker/model_runner.py b/vllm/worker/model_runner.py
index 12baecde..
489d3b77
100644
index 12baecde..
a3f2c464
100644
--- a/vllm/worker/model_runner.py
+++ b/vllm/worker/model_runner.py
@@ -1824,6 +1824,9 @@
class ModelRunner(GPUModelRunnerBase[ModelInputForGPUWithSamplingMetadata]):
...
...
@@ -3664,7 +3664,7 @@ index 12baecde..489d3b77 100644
if self.vllm_config.kv_transfer_config is None:
return False
+
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector":
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector":
+ return False
prefill_meta = model_input.attn_metadata.prefill_metadata
...
...
@@ -3674,13 +3674,13 @@ index 12baecde..489d3b77 100644
if self.vllm_config.kv_transfer_config is None:
return False
+
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector":
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector":
+ return False
prefill_meta = model_input.attn_metadata.prefill_metadata
diff --git a/vllm/worker/worker.py b/vllm/worker/worker.py
index 582aa460..
e4ed902e
100644
index 582aa460..
36a21d10
100644
--- a/vllm/worker/worker.py
+++ b/vllm/worker/worker.py
@@ -2,7 +2,7 @@
...
...
@@ -3696,7 +3696,7 @@ index 582aa460..e4ed902e 100644
from vllm.worker.pooling_model_runner import PoolingModelRunner
from vllm.worker.worker_base import (LocalOrDistributedWorkerBase, WorkerBase,
WorkerInput)
+from vllm.distributed.device_communicators.nixl import Dyn
e
moNixlConnector
+from vllm.distributed.device_communicators.nixl import Dyn
a
moNixlConnector
+
logger = init_logger(__name__)
...
...
@@ -3710,7 +3710,7 @@ index 582aa460..e4ed902e 100644
+ # TODO ptarasiewicz nixl can also support DRAM
+ assert self.device_config.device_type == "cuda", "Currently only CUDA is supported for Nixl connector"
+
+ self.nixl_connector = Dyn
e
moNixlConnector(self.vllm_config, engine_id, self.local_rank) # TODO ptarasiewicz: rank or local_rank?
+ self.nixl_connector = Dyn
a
moNixlConnector(self.vllm_config, engine_id, self.local_rank) # TODO ptarasiewicz: rank or local_rank?
+ assert len(self.cache_engine) == 1, "Only one cache engine is supported for now"
+ self.nixl_connector.register_kv_caches(self.cache_engine[0].gpu_cache)
+ return self.nixl_connector.agent_name
...
...
@@ -3766,7 +3766,7 @@ index 582aa460..e4ed902e 100644
@torch.inference_mode()
diff --git a/vllm/worker/worker_base.py b/vllm/worker/worker_base.py
index 819b81fb..
8df
dad
de
100644
index 819b81fb..
ff43
dad
c
100644
--- a/vllm/worker/worker_base.py
+++ b/vllm/worker/worker_base.py
@@ -9,6 +9,7 @@
from typing import Any, Dict, List, Optional, Set, Tuple, Type, Union
...
...
@@ -3781,7 +3781,7 @@ index 819b81fb..8dfdadde 100644
from vllm.worker.model_runner_base import (BroadcastableModelInput,
ModelRunnerBase,
ModelRunnerInputBase)
+from vllm.distributed.device_communicators.nixl import Dyn
e
moNixlConnector
+from vllm.distributed.device_communicators.nixl import Dyn
a
moNixlConnector
logger = init_logger(__name__)
...
...
@@ -3789,7 +3789,7 @@ index 819b81fb..8dfdadde 100644
from vllm.platforms import current_platform
self.current_platform = current_platform
+ self.nixl_connector: Optional[Dyn
e
moNixlConnector] = None
+ self.nixl_connector: Optional[Dyn
a
moNixlConnector] = None
+
@abstractmethod
def init_device(self) -> None:
...
...
deploy/Kubernetes/common/chart/Chart.yaml
View file @
602352ce
...
...
@@ -15,7 +15,7 @@
apiVersion
:
v2
appVersion
:
1.0.0
description
:
Distributed Neural Models (dyn
e
mo) Component
description
:
Distributed Neural Models (dyn
a
mo) Component
icon
:
https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png
name
:
dyn
e
mo_component
name
:
dyn
a
mo_component
version
:
1.0.0
deploy/Kubernetes/common/chart/templates/_helpers.tpl
View file @
602352ce
...
...
@@ -15,7 +15,7 @@
# Annotation Groups
{{- define "nvidia.annotations.default" }}
dyn
e
mo: "{{ .Release.Name }}.{{ .Chart.AppVersion | default "0.0" }}"
dyn
a
mo: "{{ .Release.Name }}.{{ .Chart.AppVersion | default "0.0" }}"
{{- with .Values.kubernetes }}
{{- with .annotations }}
{{ toYaml . }}
...
...
@@ -54,7 +54,7 @@ app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{- define "nvidia.label.appManagedBy" }}
{{- $service_name := "dyn
e
mo" }}
{{- $service_name := "dyn
a
mo" }}
{{- with .Release.Service }}
{{- $service_name = . }}
{{- end }}
...
...
@@ -66,7 +66,7 @@ app.kubernetes.io/name: {{ required "Property '.component.name' is required." .V
{{- end }}
{{- define "nvidia.label.appPartOf" }}
{{- $part_of := "dyn
e
mo" }}
{{- $part_of := "dyn
a
mo" }}
{{- with .Values.kubernetes }}
{{- with .partOf }}
{{- $part_of = . }}
...
...
Prev
1
2
3
4
5
…
22
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment