Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
602352ce
Commit
602352ce
authored
Mar 08, 2025
by
Neelay Shah
Committed by
GitHub
Mar 08, 2025
Browse files
chore: rename dynamo (#44)
Co-authored-by:
Biswa Panda
<
biswa.panda@gmail.com
>
parent
ecf53ce2
Changes
431
Show whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
166 additions
and
168 deletions
+166
-168
.dockerignore
.dockerignore
+1
-1
.github/workflows/copyright-checks.yml
.github/workflows/copyright-checks.yml
+1
-1
.github/workflows/pre-merge-rust.yml
.github/workflows/pre-merge-rust.yml
+1
-1
ATTRIBUTIONS.md
ATTRIBUTIONS.md
+1
-1
CODEOWNERS
CODEOWNERS
+1
-1
README.md
README.md
+12
-12
codespell.txt
codespell.txt
+1
-2
components/metrics/Cargo.lock
components/metrics/Cargo.lock
+5
-5
components/metrics/Cargo.toml
components/metrics/Cargo.toml
+2
-2
components/metrics/README.md
components/metrics/README.md
+5
-5
components/metrics/src/bin/mock_worker.rs
components/metrics/src/bin/mock_worker.rs
+3
-3
components/metrics/src/lib.rs
components/metrics/src/lib.rs
+4
-4
components/metrics/src/main.rs
components/metrics/src/main.rs
+4
-4
container/Dockerfile
container/Dockerfile
+21
-21
container/Dockerfile.vllm
container/Dockerfile.vllm
+22
-22
container/Dockerfile.vllm_nixl
container/Dockerfile.vllm_nixl
+21
-22
container/deps/clone_tensorrtllm.sh
container/deps/clone_tensorrtllm.sh
+4
-4
container/deps/vllm/vllm_v0.7.2-dynamo-kv-disagg-patch.patch
container/deps/vllm/vllm_v0.7.2-dynamo-kv-disagg-patch.patch
+52
-52
deploy/Kubernetes/common/chart/Chart.yaml
deploy/Kubernetes/common/chart/Chart.yaml
+2
-2
deploy/Kubernetes/common/chart/templates/_helpers.tpl
deploy/Kubernetes/common/chart/templates/_helpers.tpl
+3
-3
No files found.
.dockerignore
View file @
602352ce
...
@@ -19,7 +19,7 @@
...
@@ -19,7 +19,7 @@
**/*.plan
**/*.plan
**/.cache/*
**/.cache/*
**/*onnx*
**/*onnx*
# Engine must be allowed because code contains dyn
e
mo_engine.py
# Engine must be allowed because code contains dyn
a
mo_engine.py
**/*tensorrtllm_engines*
**/*tensorrtllm_engines*
**/*tensorrtllm_models*
**/*tensorrtllm_models*
**/*tensorrtllm_checkpoints*
**/*tensorrtllm_checkpoints*
...
...
.github/workflows/copyright-checks.yml
View file @
602352ce
.github/workflows/pre-merge-rust.yml
View file @
602352ce
...
@@ -40,7 +40,7 @@ jobs:
...
@@ -40,7 +40,7 @@ jobs:
pre-merge-rust
:
pre-merge-rust
:
runs-on
:
ubuntu-latest
runs-on
:
ubuntu-latest
strategy
:
strategy
:
matrix
:
{
dir
:
[
'
lib/runtime'
,
'
lib/llm'
,
'
lib/bindings/c'
,
'
lib/bindings/python'
,
'
launch/dyn
e
mo-run'
,
'
components/metrics'
,
'
examples/rust'
]
}
matrix
:
{
dir
:
[
'
lib/runtime'
,
'
lib/llm'
,
'
lib/bindings/c'
,
'
lib/bindings/python'
,
'
launch/dyn
a
mo-run'
,
'
components/metrics'
,
'
examples/rust'
]
}
permissions
:
permissions
:
contents
:
read
contents
:
read
steps
:
steps
:
...
...
ATTRIBUTIONS.md
View file @
602352ce
...
@@ -17,7 +17,7 @@ limitations under the License.
...
@@ -17,7 +17,7 @@ limitations under the License.
# Open Source License Attribution
# Open Source License Attribution
Dyn
e
mo uses Open Source components. You can find the details of these open-source projects along with license information below.
Dyn
a
mo uses Open Source components. You can find the details of these open-source projects along with license information below.
We are grateful to the developers for their contributions to open source and acknowledge these below.
We are grateful to the developers for their contributions to open source and acknowledge these below.
## nats-py - [Apache License 2.0](https://github.com/nats-io/nats.py/blob/main/LICENSE)
## nats-py - [Apache License 2.0](https://github.com/nats-io/nats.py/blob/main/LICENSE)
...
...
CODEOWNERS
View file @
602352ce
# CODEOWNERS file for Dyn
e
mo
# CODEOWNERS file for Dyn
a
mo
#
#
# For more information about CODEOWNERS files, see:
# For more information about CODEOWNERS files, see:
# https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
# https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners
...
...
README.md
View file @
602352ce
...
@@ -15,17 +15,17 @@ See the License for the specific language governing permissions and
...
@@ -15,17 +15,17 @@ See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.
-->
-->
# Dyn
e
mo
# Dyn
a
mo
<h4>
A Datacenter Scale Distributed Inference Serving Framework
</h4>
<h4>
A Datacenter Scale Distributed Inference Serving Framework
</h4>
[

](https://opensource.org/licenses/Apache-2.0)
[

](https://opensource.org/licenses/Apache-2.0)
[

](https://github.com/dynemo-ai/dynemo/releases/latest)
[

](https://github.com/dynemo-ai/dynemo/releases/latest)
Dyn
e
mo is a flexible, component based, data center scale
Dyn
a
mo is a flexible, component based, data center scale
inference serving framework designed to leverage the strengths of the
inference serving framework designed to leverage the strengths of the
standalone Dyn
e
mo Inference Server while expanding its capabilities
standalone Dyn
a
mo Inference Server while expanding its capabilities
to meet the demands of complex use cases including those of Generative
to meet the demands of complex use cases including those of Generative
AI. It is designed to enable developers to implement and customize
AI. It is designed to enable developers to implement and customize
routing, load balancing, scaling and workflow definitions at the data
routing, load balancing, scaling and workflow definitions at the data
...
@@ -36,17 +36,17 @@ center scale without sacrificing performance or ease of use.
...
@@ -36,17 +36,17 @@ center scale without sacrificing performance or ease of use.
> rapid-prototyping stage and we are actively looking for feedback and
> rapid-prototyping stage and we are actively looking for feedback and
> collaborators.
> collaborators.
## Building Dyn
e
mo
## Building Dyn
a
mo
### Requirements
### Requirements
Dyn
e
mo development and examples are container based.
Dyn
a
mo development and examples are container based.
*
[
Docker
](
https://docs.docker.com/get-started/get-docker/
)
*
[
Docker
](
https://docs.docker.com/get-started/get-docker/
)
*
[
buildx
](
https://github.com/docker/buildx
)
*
[
buildx
](
https://github.com/docker/buildx
)
### Development
### Development
You can build the Dyn
e
mo container using the build scripts
You can build the Dyn
a
mo container using the build scripts
in
`container/`
(or directly with
`docker build`
).
in
`container/`
(or directly with
`docker build`
).
We provide 3 types of builds:
We provide 3 types of builds:
...
@@ -62,9 +62,9 @@ For example, if you want to build a container for the `STANDARD` backends you ca
...
@@ -62,9 +62,9 @@ For example, if you want to build a container for the `STANDARD` backends you ca
Please see the instructions in the corresponding example for specific build instructions.
Please see the instructions in the corresponding example for specific build instructions.
## Running Dyn
e
mo for Local Testing and Development
## Running Dyn
a
mo for Local Testing and Development
You can run the Dyn
e
mo container using the run scripts in
You can run the Dyn
a
mo container using the run scripts in
`container/`
(or directly with
`docker run`
).
`container/`
(or directly with
`docker run`
).
The run script offers a few common workflows:
The run script offers a few common workflows:
...
@@ -72,7 +72,7 @@ The run script offers a few common workflows:
...
@@ -72,7 +72,7 @@ The run script offers a few common workflows:
1.
Running a command in a container and exiting.
1.
Running a command in a container and exiting.
```
```
./container/run.sh -- python3 -c "import dyn
e
mo.runtime; help(dyn
e
mo.runtime)"
./container/run.sh -- python3 -c "import dyn
a
mo.runtime; help(dyn
a
mo.runtime)"
```
```
2.
Starting an interactive shell.
2.
Starting an interactive shell.
...
@@ -95,7 +95,7 @@ deployment instructions.
...
@@ -95,7 +95,7 @@ deployment instructions.
## Rust Based Runtime
## Rust Based Runtime
Dyn
e
mo has a new rust based distributed runtime with
Dyn
a
mo has a new rust based distributed runtime with
implementation under development. The rust based runtime enables
implementation under development. The rust based runtime enables
serving arbitrary python code as well as native rust. Please note the
serving arbitrary python code as well as native rust. Please note the
APIs are subject to change.
APIs are subject to change.
...
@@ -114,7 +114,7 @@ bindings.
...
@@ -114,7 +114,7 @@ bindings.
An intermediate example expanding further on the concepts introduced
An intermediate example expanding further on the concepts introduced
in the Hello World example. In this example, we demonstrate
in the Hello World example. In this example, we demonstrate
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
[Disaggregated Serving](https://arxiv.org/abs/2401.09670) as an
application of the components defined in Dyn
e
mo.
application of the components defined in Dyn
a
mo.
# Disclaimers
# Disclaimers
...
...
codespell.txt
View file @
602352ce
dynamo->dynemo
dynmo->dynamo
dynmo->dynemo
components/metrics/Cargo.lock
View file @
602352ce
...
@@ -1005,7 +1005,7 @@ dependencies = [
...
@@ -1005,7 +1005,7 @@ dependencies = [
]
]
[[package]]
[[package]]
name = "dyn
e
mo-llm"
name = "dyn
a
mo-llm"
version = "0.2.1"
version = "0.2.1"
dependencies = [
dependencies = [
"anyhow",
"anyhow",
...
@@ -1020,7 +1020,7 @@ dependencies = [
...
@@ -1020,7 +1020,7 @@ dependencies = [
"chrono",
"chrono",
"cmake",
"cmake",
"derive_builder",
"derive_builder",
"dyn
e
mo-runtime",
"dyn
a
mo-runtime",
"either",
"either",
"erased-serde",
"erased-serde",
"futures",
"futures",
...
@@ -1054,7 +1054,7 @@ dependencies = [
...
@@ -1054,7 +1054,7 @@ dependencies = [
]
]
[[package]]
[[package]]
name = "dyn
e
mo-runtime"
name = "dyn
a
mo-runtime"
version = "0.2.1"
version = "0.2.1"
dependencies = [
dependencies = [
"anyhow",
"anyhow",
...
@@ -2202,8 +2202,8 @@ dependencies = [
...
@@ -2202,8 +2202,8 @@ dependencies = [
"async-nats",
"async-nats",
"axum 0.6.20",
"axum 0.6.20",
"clap",
"clap",
"dyn
e
mo-llm",
"dyn
a
mo-llm",
"dyn
e
mo-runtime",
"dyn
a
mo-runtime",
"futures",
"futures",
"opentelemetry",
"opentelemetry",
"opentelemetry-prometheus",
"opentelemetry-prometheus",
...
...
components/metrics/Cargo.toml
View file @
602352ce
...
@@ -22,8 +22,8 @@ license = "Apache-2.0"
...
@@ -22,8 +22,8 @@ license = "Apache-2.0"
[dependencies]
[dependencies]
# local
# local
dyn
e
mo-runtime
=
{
path
=
"../../lib/runtime"
}
dyn
a
mo-runtime
=
{
path
=
"../../lib/runtime"
}
dyn
e
mo-llm
=
{
path
=
"../../lib/llm"
}
dyn
a
mo-llm
=
{
path
=
"../../lib/llm"
}
# workspace - todo
# workspace - todo
...
...
components/metrics/README.md
View file @
602352ce
...
@@ -12,16 +12,16 @@ This will:
...
@@ -12,16 +12,16 @@ This will:
For example:
For example:
```
bash
```
bash
# For more details, try DYN_LOG=debug
# For more details, try DYN_LOG=debug
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dyn
e
mo
--component
backend
--endpoint
generate
DYN_LOG
=
info cargo run
--bin
metrics
--
--namespace
dyn
a
mo
--component
backend
--endpoint
generate
# 2025-02-26T18:45:05.467026Z INFO metrics: Creating unique instance of Metrics at dyn
e
mo/components/metrics/instance
# 2025-02-26T18:45:05.467026Z INFO metrics: Creating unique instance of Metrics at dyn
a
mo/components/metrics/instance
# 2025-02-26T18:45:05.472146Z INFO metrics: Scraping service dyn
e
mo_backend_720278f8 and filtering on subject dyn
e
mo_backend_720278f8.generate
# 2025-02-26T18:45:05.472146Z INFO metrics: Scraping service dyn
a
mo_backend_720278f8 and filtering on subject dyn
a
mo_backend_720278f8.generate
# ...
# ...
```
```
With no matching endpoints running to collect stats from, you should see warnings in the logs:
With no matching endpoints running to collect stats from, you should see warnings in the logs:
```
bash
```
bash
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dyn
e
mo_backend_720278f8.generate
2025-02-26T18:45:06.474161Z WARN metrics: No endpoints found matching subject dyn
a
mo_backend_720278f8.generate
```
```
After a matching endpoint gets started, you should see the warnings stop
After a matching endpoint gets started, you should see the warnings stop
...
@@ -30,7 +30,7 @@ when the endpoint gets automatically discovered.
...
@@ -30,7 +30,7 @@ when the endpoint gets automatically discovered.
When stats are found from target endpoints, the metrics component will
When stats are found from target endpoints, the metrics component will
aggregate them and publish them to a prometheus server running on
`localhost:9091/metrics`
by default:
aggregate them and publish them to a prometheus server running on
`localhost:9091/metrics`
by default:
```
```
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dyn
e
mo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dyn
e
mo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
2025-02-28T04:05:58.077901Z INFO metrics: Aggregated metrics: ProcessedEndpoints { endpoints: [Endpoint { name: "worker-7587884888253033398", subject: "dyn
a
mo_backend_720278f8.generate-694d951a80e06bb6", data: ForwardPassMetrics { request_active_slots: 58, request_total_slots: 100, kv_active_blocks: 77, kv_total_blocks: 100 } }, Endpoint { name: "worker-7587884888253033401", subject: "dyn
a
mo_backend_720278f8.generate-694d951a80e06bb9", data: ForwardPassMetrics { request_active_slots: 71, request_total_slots: 100, kv_active_blocks: 29, kv_total_blocks: 100 } }], worker_ids: [7587884888253033398, 7587884888253033401], load_avg: 53.0, load_std: 24.0 }
```
```
To see the metrics being published in prometheus format, you can run:
To see the metrics being published in prometheus format, you can run:
...
...
components/metrics/src/bin/mock_worker.rs
View file @
602352ce
...
@@ -14,10 +14,10 @@
...
@@ -14,10 +14,10 @@
// limitations under the License.
// limitations under the License.
use
async_nats
::
service
::
endpoint
::
Stats
;
use
async_nats
::
service
::
endpoint
::
Stats
;
use
dyn
e
mo_llm
::
kv_router
::{
use
dyn
a
mo_llm
::
kv_router
::{
protocols
::
ForwardPassMetrics
,
scheduler
::
KVHitRateEvent
,
KV_HIT_RATE_SUBJECT
,
protocols
::
ForwardPassMetrics
,
scheduler
::
KVHitRateEvent
,
KV_HIT_RATE_SUBJECT
,
};
};
use
dyn
e
mo_runtime
::{
use
dyn
a
mo_runtime
::{
component
::
Namespace
,
component
::
Namespace
,
logging
,
logging
,
pipeline
::{
pipeline
::{
...
@@ -123,7 +123,7 @@ fn mock_stats_handler(_stats: Stats) -> serde_json::Value {
...
@@ -123,7 +123,7 @@ fn mock_stats_handler(_stats: Stats) -> serde_json::Value {
}
}
async
fn
backend
(
runtime
:
DistributedRuntime
)
->
Result
<
()
>
{
async
fn
backend
(
runtime
:
DistributedRuntime
)
->
Result
<
()
>
{
let
namespace
=
runtime
.namespace
(
"dyn
e
mo"
)
?
;
let
namespace
=
runtime
.namespace
(
"dyn
a
mo"
)
?
;
// Spawn background task for publishing KV hit rate events
// Spawn background task for publishing KV hit rate events
let
namespace_clone
=
namespace
.clone
();
let
namespace_clone
=
namespace
.clone
();
...
...
components/metrics/src/lib.rs
View file @
602352ce
...
@@ -20,11 +20,11 @@ use prometheus::{register_counter_vec, register_gauge_vec};
...
@@ -20,11 +20,11 @@ use prometheus::{register_counter_vec, register_gauge_vec};
use
serde
::{
Deserialize
,
Serialize
};
use
serde
::{
Deserialize
,
Serialize
};
use
std
::
net
::
SocketAddr
;
use
std
::
net
::
SocketAddr
;
use
dyn
e
mo_llm
::
kv_router
::
protocols
::
ForwardPassMetrics
;
use
dyn
a
mo_llm
::
kv_router
::
protocols
::
ForwardPassMetrics
;
use
dyn
e
mo_llm
::
kv_router
::
scheduler
::
Endpoint
;
use
dyn
a
mo_llm
::
kv_router
::
scheduler
::
Endpoint
;
use
dyn
e
mo_llm
::
kv_router
::
scoring
::
ProcessedEndpoints
;
use
dyn
a
mo_llm
::
kv_router
::
scoring
::
ProcessedEndpoints
;
use
dyn
e
mo_runtime
::{
distributed
::
Component
,
service
::
EndpointInfo
,
utils
::
Duration
,
Result
};
use
dyn
a
mo_runtime
::{
distributed
::
Component
,
service
::
EndpointInfo
,
utils
::
Duration
,
Result
};
/// Configuration for LLM worker load capacity metrics
/// Configuration for LLM worker load capacity metrics
#[derive(Debug,
Clone,
Serialize,
Deserialize)]
#[derive(Debug,
Clone,
Serialize,
Deserialize)]
...
...
components/metrics/src/main.rs
View file @
602352ce
...
@@ -27,9 +27,9 @@
...
@@ -27,9 +27,9 @@
//! - ISL Blocks: Cumulative count of total blocks in all KV hit rate events
//! - ISL Blocks: Cumulative count of total blocks in all KV hit rate events
//! - Overlap Blocks: Cumulative count of blocks that were already in the KV cache
//! - Overlap Blocks: Cumulative count of blocks that were already in the KV cache
use
clap
::
Parser
;
use
clap
::
Parser
;
use
dyn
e
mo_llm
::
kv_router
::
scheduler
::
KVHitRateEvent
;
use
dyn
a
mo_llm
::
kv_router
::
scheduler
::
KVHitRateEvent
;
use
dyn
e
mo_llm
::
kv_router
::
KV_HIT_RATE_SUBJECT
;
use
dyn
a
mo_llm
::
kv_router
::
KV_HIT_RATE_SUBJECT
;
use
dyn
e
mo_runtime
::{
use
dyn
a
mo_runtime
::{
error
,
logging
,
error
,
logging
,
traits
::
events
::{
EventPublisher
,
EventSubscriber
},
traits
::
events
::{
EventPublisher
,
EventSubscriber
},
utils
::{
Duration
,
Instant
},
utils
::{
Duration
,
Instant
},
...
@@ -57,7 +57,7 @@ struct Args {
...
@@ -57,7 +57,7 @@ struct Args {
endpoint
:
String
,
endpoint
:
String
,
/// Namespace to operate in
/// Namespace to operate in
#[arg(long,
env
=
"DYN_NAMESPACE"
,
default_value
=
"dyn
e
mo"
)]
#[arg(long,
env
=
"DYN_NAMESPACE"
,
default_value
=
"dyn
a
mo"
)]
namespace
:
String
,
namespace
:
String
,
/// Polling interval in seconds (minimum 1 second)
/// Polling interval in seconds (minimum 1 second)
...
...
container/Dockerfile
View file @
602352ce
...
@@ -16,7 +16,7 @@
...
@@ -16,7 +16,7 @@
ARG
BASE_IMAGE="nvcr.io/nvidia/tritonserver"
ARG
BASE_IMAGE="nvcr.io/nvidia/tritonserver"
ARG
BASE_IMAGE_TAG="25.01-py3"
ARG
BASE_IMAGE_TAG="25.01-py3"
FROM
${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dyn
e
mo
FROM
${BASE_IMAGE}:${BASE_IMAGE_TAG} AS dyn
a
mo
# TODO: non root user by default
# TODO: non root user by default
...
@@ -34,7 +34,7 @@ RUN rustup toolchain install 1.85.0-x86_64-unknown-linux-gnu
...
@@ -34,7 +34,7 @@ RUN rustup toolchain install 1.85.0-x86_64-unknown-linux-gnu
# Install OpenAI-compatible frontend and its dependencies from triton server
# Install OpenAI-compatible frontend and its dependencies from triton server
# repository. These are used to have a consistent interface, schema, and FastAPI
# repository. These are used to have a consistent interface, schema, and FastAPI
# app between Triton Core and Dyn
e
mo implementations.
# app between Triton Core and Dyn
a
mo implementations.
ARG
OPENAI_SERVER_TAG="r25.01"
ARG
OPENAI_SERVER_TAG="r25.01"
RUN
mkdir
-p
/opt/tritonserver/python
&&
\
RUN
mkdir
-p
/opt/tritonserver/python
&&
\
cd
/opt/tritonserver/python
&&
\
cd
/opt/tritonserver/python
&&
\
...
@@ -78,7 +78,7 @@ ARG TENSORRTLLM_SKIP_CLONE=
...
@@ -78,7 +78,7 @@ ARG TENSORRTLLM_SKIP_CLONE=
ENV
FRAMEWORK=${FRAMEWORK}
ENV
FRAMEWORK=${FRAMEWORK}
RUN
--mount
=
type
=
bind
,source
=
./container/deps/requirements.tensorrtllm.txt,target
=
/tmp/requirements.txt
\
RUN
--mount
=
type
=
bind
,source
=
./container/deps/requirements.tensorrtllm.txt,target
=
/tmp/requirements.txt
\
--mount
=
type
=
bind
,source
=
./container/deps/clone_tensorrtllm.sh,target
=
/tmp/clone_tensorrtllm.sh
\
--mount
=
type
=
bind
,source
=
./container/deps/clone_tensorrtllm.sh,target
=
/tmp/clone_tensorrtllm.sh
\
if
[[
"
$FRAMEWORK
"
==
"TENSORRTLLM"
]]
;
then
pip
install
--timeout
=
2000
-r
/tmp/requirements.txt
;
if
[
${
TENSORRTLLM_SKIP_CLONE
}
-ne
1
]
;
then
/tmp/clone_tensorrtllm.sh
--tensorrtllm-backend-repo-tag
${
TENSORRTLLM_BACKEND_REPO_TAG
}
--tensorrtllm-backend-rebuild
${
TENSORRTLLM_BACKEND_REBUILD
}
--dyn
e
mo-llm-path
/opt/dyn
e
mo/llm_binding
;
fi
;
fi
if
[[
"
$FRAMEWORK
"
==
"TENSORRTLLM"
]]
;
then
pip
install
--timeout
=
2000
-r
/tmp/requirements.txt
;
if
[
${
TENSORRTLLM_SKIP_CLONE
}
-ne
1
]
;
then
/tmp/clone_tensorrtllm.sh
--tensorrtllm-backend-repo-tag
${
TENSORRTLLM_BACKEND_REPO_TAG
}
--tensorrtllm-backend-rebuild
${
TENSORRTLLM_BACKEND_REBUILD
}
--dyn
a
mo-llm-path
/opt/dyn
a
mo/llm_binding
;
fi
;
fi
RUN
--mount
=
type
=
bind
,source
=
./container/deps/requirements.standard.txt,target
=
/tmp/requirements.txt
\
RUN
--mount
=
type
=
bind
,source
=
./container/deps/requirements.standard.txt,target
=
/tmp/requirements.txt
\
...
@@ -106,7 +106,7 @@ ENV VLLM_GENERATE_WORKERS=${VLLM_FRAMEWORK:+1}
...
@@ -106,7 +106,7 @@ ENV VLLM_GENERATE_WORKERS=${VLLM_FRAMEWORK:+1}
ENV
VLLM_BASELINE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_BASELINE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_CONTEXT_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_CONTEXT_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_GENERATE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_GENERATE_TP_SIZE=${VLLM_FRAMEWORK:+1}
ENV
VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV
VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
ENV
PYTHONUNBUFFERED=1
ENV
PYTHONUNBUFFERED=1
# Install NATS - pointing toward NATS github instead of binaries.nats.dev due to server instability
# Install NATS - pointing toward NATS github instead of binaries.nats.dev due to server instability
...
@@ -154,7 +154,7 @@ RUN cd examples/rust && \
...
@@ -154,7 +154,7 @@ RUN cd examples/rust && \
cp
target/release/http /usr/local/bin/
&&
\
cp
target/release/http /usr/local/bin/
&&
\
cp
target/release/llmctl /usr/local/bin/
cp
target/release/llmctl /usr/local/bin/
COPY
deploy/dyn
e
mo/sdk /workspace/deploy/dyn
e
mo/sdk
COPY
deploy/dyn
a
mo/sdk /workspace/deploy/dyn
a
mo/sdk
# Generate C bindings. Note that this is required for TRTLLM backend re-build
# Generate C bindings. Note that this is required for TRTLLM backend re-build
...
@@ -162,30 +162,30 @@ COPY lib/bindings /workspace/lib/bindings
...
@@ -162,30 +162,30 @@ COPY lib/bindings /workspace/lib/bindings
RUN
cd
lib/bindings/c/
&&
\
RUN
cd
lib/bindings/c/
&&
\
cargo build
--release
--locked
&&
cargo doc
--no-deps
cargo build
--release
--locked
&&
cargo doc
--no-deps
# Install uv, create virtualenv for general use, and build dyn
e
mo wheel
# Install uv, create virtualenv for general use, and build dyn
a
mo wheel
COPY
--from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
COPY
--from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN
mkdir
/opt/dyn
e
mo
&&
\
RUN
mkdir
/opt/dyn
a
mo
&&
\
uv venv /opt/dyn
e
mo/venv
--python
3.12
&&
\
uv venv /opt/dyn
a
mo/venv
--python
3.12
&&
\
source
/opt/dyn
e
mo/venv/bin/activate
&&
\
source
/opt/dyn
a
mo/venv/bin/activate
&&
\
uv build
--wheel
--out-dir
/workspace/dist
&&
\
uv build
--wheel
--out-dir
/workspace/dist
&&
\
uv pip
install
/workspace/dist/dyn
e
mo
*
cp312
*
.whl
&&
\
uv pip
install
/workspace/dist/dyn
a
mo
*
cp312
*
.whl
&&
\
cd
/workspace/deploy/dyn
e
mo/sdk
&&
\
cd
/workspace/deploy/dyn
a
mo/sdk
&&
\
uv build
--wheel
--out-dir
/workspace/dist
&&
\
uv build
--wheel
--out-dir
/workspace/dist
&&
\
uv pip
install
/workspace/dist/dyn
e
mo_sdk
*
any.whl
uv pip
install
/workspace/dist/dyn
a
mo_sdk
*
any.whl
# Package the bindings
# Package the bindings
RUN
mkdir
-p
/opt/dyn
e
mo/bindings/wheels
&&
\
RUN
mkdir
-p
/opt/dyn
a
mo/bindings/wheels
&&
\
mkdir
/opt/dyn
e
mo/bindings/lib
&&
\
mkdir
/opt/dyn
a
mo/bindings/lib
&&
\
cp
dist/dyn
e
mo
*
cp312
*
.whl /opt/dyn
e
mo/bindings/wheels/.
&&
\
cp
dist/dyn
a
mo
*
cp312
*
.whl /opt/dyn
a
mo/bindings/wheels/.
&&
\
cp
lib/bindings/c/target/release/libdyn
e
mo_llm_capi.so /opt/dyn
e
mo/bindings/lib/.
&&
\
cp
lib/bindings/c/target/release/libdyn
a
mo_llm_capi.so /opt/dyn
a
mo/bindings/lib/.
&&
\
cp
-r
lib/bindings/c/include /opt/dyn
e
mo/bindings/.
cp
-r
lib/bindings/c/include /opt/dyn
a
mo/bindings/.
# Install dyn
e
mo.runtime and dyn
e
mo.llm wheels globally in container for tests that
# Install dyn
a
mo.runtime and dyn
a
mo.llm wheels globally in container for tests that
# currently run without virtual environment activated.
# currently run without virtual environment activated.
# TODO: In future, we may use a virtualenv for everything and remove this.
# TODO: In future, we may use a virtualenv for everything and remove this.
RUN
cd
/opt/dyn
e
mo/bindings/wheels
&&
\
RUN
cd
/opt/dyn
a
mo/bindings/wheels
&&
\
pip
install
dyn
e
mo
*
cp312
*
.whl
&&
\
pip
install
dyn
a
mo
*
cp312
*
.whl
&&
\
pip
install
/workspace/dist/dyn
e
mo_sdk
*
any.whl
pip
install
/workspace/dist/dyn
a
mo_sdk
*
any.whl
# Copy everything in after ginstall steps to avoid re-running build/install
# Copy everything in after ginstall steps to avoid re-running build/install
# commands on unrelated changes in other dirs.
# commands on unrelated changes in other dirs.
...
...
container/Dockerfile.vllm
View file @
602352ce
...
@@ -24,17 +24,17 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
...
@@ -24,17 +24,17 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
# Install uv and create virtualenv
# Install uv and create virtualenv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dyn
e
mo && \
RUN mkdir /opt/dyn
a
mo && \
uv venv /opt/dyn
e
mo/venv --python 3.12
uv venv /opt/dyn
a
mo/venv --python 3.12
# Activate virtual environment
# Activate virtual environment
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Install patched vllm - keep this early in Dockerfile to avoid
# Install patched vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes
# rebuilds from unrelated source code changes
ARG VLLM_REF="v0.7.2"
ARG VLLM_REF="v0.7.2"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
e
mo-kv-disagg-patch.patch"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
a
mo-kv-disagg-patch.patch"
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
...
@@ -92,7 +92,7 @@ RUN cd examples/rust && \
...
@@ -92,7 +92,7 @@ RUN cd examples/rust && \
cp target/release/http /usr/local/bin/ && \
cp target/release/http /usr/local/bin/ && \
cp target/release/llmctl /usr/local/bin/
cp target/release/llmctl /usr/local/bin/
# TODO: Build dyn
e
mo-run
# TODO: Build dyn
a
mo-run
# COPY applications/...
# COPY applications/...
# Generate C bindings for kv cache routing in vLLM
# Generate C bindings for kv cache routing in vLLM
...
@@ -100,29 +100,29 @@ COPY lib/bindings /workspace/lib/bindings
...
@@ -100,29 +100,29 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \
RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps
cargo build --release --locked && cargo doc --no-deps
COPY deploy/dyn
e
mo/sdk /workspace/deploy/dyn
e
mo/sdk
COPY deploy/dyn
a
mo/sdk /workspace/deploy/dyn
a
mo/sdk
# Build dyn
e
mo wheel
# Build dyn
a
mo wheel
RUN source /opt/dyn
e
mo/venv/bin/activate && \
RUN source /opt/dyn
a
mo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo*cp312*.whl && \
uv pip install /workspace/dist/dyn
a
mo*cp312*.whl && \
cd /workspace/deploy/dyn
e
mo/sdk && \
cd /workspace/deploy/dyn
a
mo/sdk && \
uv build --wheel --out-dir /workspace/dist && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo_sdk*any.whl
uv pip install /workspace/dist/dyn
a
mo_sdk*any.whl
# Package the bindings
# Package the bindings
RUN mkdir -p /opt/dyn
e
mo/bindings/wheels && \
RUN mkdir -p /opt/dyn
a
mo/bindings/wheels && \
mkdir /opt/dyn
e
mo/bindings/lib && \
mkdir /opt/dyn
a
mo/bindings/lib && \
cp dist/dyn
e
mo*cp312*.whl /opt/dyn
e
mo/bindings/wheels/. && \
cp dist/dyn
a
mo*cp312*.whl /opt/dyn
a
mo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdyn
e
mo_llm_capi.so /opt/dyn
e
mo/bindings/lib/. && \
cp lib/bindings/c/target/release/libdyn
a
mo_llm_capi.so /opt/dyn
a
mo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dyn
e
mo/bindings/.
cp -r lib/bindings/c/include /opt/dyn
a
mo/bindings/.
# Tell vllm to use the Dyn
e
mo LLM C API for KV Cache Routing
# Tell vllm to use the Dyn
a
mo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# FIXME: Copy more specific folders in for dev/debug after directory restructure
# FIXME: Copy more specific folders in for dev/debug after directory restructure
COPY . /workspace
COPY . /workspace
# FIXME: May want a modification with dyn
e
mo
-distributed
banner on entry
# FIXME: May want a modification with dyn
a
mo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
CMD []
...
@@ -140,10 +140,10 @@ RUN apt update -y && \
...
@@ -140,10 +140,10 @@ RUN apt update -y && \
echo "set -g mouse on" >> /root/.tmux.conf
echo "set -g mouse on" >> /root/.tmux.conf
# Set environment variables
# Set environment variables
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# Copy binaries
# Copy binaries
COPY --from=dev /usr/local/bin/http /usr/local/bin/http
COPY --from=dev /usr/local/bin/http /usr/local/bin/http
...
@@ -170,7 +170,7 @@ COPY examples/python_rs/llm/vllm /workspace/examples/python_rs/llm/vllm
...
@@ -170,7 +170,7 @@ COPY examples/python_rs/llm/vllm /workspace/examples/python_rs/llm/vllm
WORKDIR /workspace
WORKDIR /workspace
# FIXME: May want a modification with dyn
e
mo
-distributed
banner on entry
# FIXME: May want a modification with dyn
a
mo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
CMD []
container/Dockerfile.vllm_nixl
View file @
602352ce
...
@@ -151,11 +151,11 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
...
@@ -151,11 +151,11 @@ ENV PATH=/usr/local/bin/etcd/:$PATH
# Install uv and create virtualenv
# Install uv and create virtualenv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
RUN mkdir /opt/dyn
e
mo && \
RUN mkdir /opt/dyn
a
mo && \
uv venv /opt/dyn
e
mo/venv --python 3.12
uv venv /opt/dyn
a
mo/venv --python 3.12
# Activate virtual environment
# Activate virtual environment
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
# Common dependencies
# Common dependencies
...
@@ -165,7 +165,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi
...
@@ -165,7 +165,7 @@ RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requi
# Install patched vllm - keep this early in Dockerfile to avoid
# Install patched vllm - keep this early in Dockerfile to avoid
# rebuilds from unrelated source code changes
# rebuilds from unrelated source code changes
ARG VLLM_REF="v0.7.2"
ARG VLLM_REF="v0.7.2"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
e
mo-kv-disagg-patch.patch"
ARG VLLM_PATCH="vllm_${VLLM_REF}-dyn
a
mo-kv-disagg-patch.patch"
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
RUN --mount=type=bind,source=./container/deps/,target=/tmp/deps \
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
bash /tmp/deps/vllm/install.sh --patch /tmp/deps/vllm/${VLLM_PATCH} --ref ${VLLM_REF} --install-cmd "uv pip install --editable" --use-precompiled --installation-dir /opt/vllm
...
@@ -230,30 +230,29 @@ COPY lib/bindings /workspace/lib/bindings
...
@@ -230,30 +230,29 @@ COPY lib/bindings /workspace/lib/bindings
RUN cd lib/bindings/c && \
RUN cd lib/bindings/c && \
cargo build --release --locked && cargo doc --no-deps
cargo build --release --locked && cargo doc --no-deps
COPY deploy/dyn
e
mo/sdk /workspace/deploy/dyn
e
mo/sdk
COPY deploy/dyn
a
mo/sdk /workspace/deploy/dyn
a
mo/sdk
# Build dyn
e
mo wheel
# Build dyn
a
mo wheel
RUN source /opt/dyn
e
mo/venv/bin/activate && \
RUN source /opt/dyn
a
mo/venv/bin/activate && \
uv build --wheel --out-dir /workspace/dist && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo*cp312*.whl && \
uv pip install /workspace/dist/dyn
a
mo*cp312*.whl && \
cd /workspace/deploy/dyn
e
mo/sdk && \
cd /workspace/deploy/dyn
a
mo/sdk && \
uv build --wheel --out-dir /workspace/dist && \
uv build --wheel --out-dir /workspace/dist && \
uv pip install /workspace/dist/dyn
e
mo_sdk*any.whl
uv pip install /workspace/dist/dyn
a
mo_sdk*any.whl
# Package the bindings
# Package the bindings
RUN mkdir -p /opt/dyn
e
mo/bindings/wheels && \
RUN mkdir -p /opt/dyn
a
mo/bindings/wheels && \
mkdir /opt/dyn
e
mo/bindings/lib && \
mkdir /opt/dyn
a
mo/bindings/lib && \
cp dist/dyn
e
mo*cp312*.whl /opt/dyn
e
mo/bindings/wheels/. && \
cp dist/dyn
a
mo*cp312*.whl /opt/dyn
a
mo/bindings/wheels/. && \
cp lib/bindings/c/target/release/libdyn
e
mo_llm_capi.so /opt/dyn
e
mo/bindings/lib/. && \
cp lib/bindings/c/target/release/libdyn
a
mo_llm_capi.so /opt/dyn
a
mo/bindings/lib/. && \
cp -r lib/bindings/c/include /opt/dyn
e
mo/bindings/.
cp -r lib/bindings/c/include /opt/dyn
a
mo/bindings/.
# Tell vllm to use the Dyn
e
mo LLM C API for KV Cache Routing
# Tell vllm to use the Dyn
a
mo LLM C API for KV Cache Routing
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# FIXME: Copy more specific folders in for dev/debug after directory restructure
# FIXME: Copy more specific folders in for dev/debug after directory restructure
COPY . /workspace
COPY . /workspace
# FIXME: May want a modification with dynamo banner on entry
# FIXME: May want a modification with dynemo-distributed banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
CMD []
...
@@ -271,10 +270,10 @@ RUN apt update -y && \
...
@@ -271,10 +270,10 @@ RUN apt update -y && \
echo "set -g mouse on" >> /root/.tmux.conf
echo "set -g mouse on" >> /root/.tmux.conf
# Set environment variables
# Set environment variables
ENV VIRTUAL_ENV=/opt/dyn
e
mo/venv
ENV VIRTUAL_ENV=/opt/dyn
a
mo/venv
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV RAPIDS_LIBUCX_PREFER_SYSTEM_LIBRARY=true
ENV VLLM_KV_CAPI_PATH="/opt/dyn
e
mo/bindings/lib/libdyn
e
mo_llm_capi.so"
ENV VLLM_KV_CAPI_PATH="/opt/dyn
a
mo/bindings/lib/libdyn
a
mo_llm_capi.so"
# Copy binaries
# Copy binaries
COPY --from=dev /usr/local/bin/http /usr/local/bin/http
COPY --from=dev /usr/local/bin/http /usr/local/bin/http
...
@@ -301,7 +300,7 @@ COPY examples/python_rs/llm/vllm_nixl /workspace/examples/python_rs/llm/vllm_nix
...
@@ -301,7 +300,7 @@ COPY examples/python_rs/llm/vllm_nixl /workspace/examples/python_rs/llm/vllm_nix
WORKDIR /workspace
WORKDIR /workspace
# FIXME: May want a modification with dyn
e
mo
-distributed
banner on entry
# FIXME: May want a modification with dyn
a
mo banner on entry
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
ENTRYPOINT ["/opt/nvidia/nvidia_entrypoint.sh"]
CMD []
CMD []
container/deps/clone_tensorrtllm.sh
View file @
602352ce
...
@@ -16,7 +16,7 @@
...
@@ -16,7 +16,7 @@
TENSORRTLLM_BACKEND_REPO_TAG
=
TENSORRTLLM_BACKEND_REPO_TAG
=
TENSORRTLLM_BACKEND_REBUILD
=
TENSORRTLLM_BACKEND_REBUILD
=
DYN
E
MO_LLM_PATH
=
DYN
A
MO_LLM_PATH
=
GIT_TOKEN
=
GIT_TOKEN
=
GIT_REPO
=
GIT_REPO
=
...
@@ -43,9 +43,9 @@ get_options() {
...
@@ -43,9 +43,9 @@ get_options() {
missing_requirement
$1
missing_requirement
$1
fi
fi
;;
;;
--dyn
e
mo-llm-path
)
--dyn
a
mo-llm-path
)
if
[
"
$2
"
]
;
then
if
[
"
$2
"
]
;
then
DYN
E
MO_LLM_PATH
=
$2
DYN
A
MO_LLM_PATH
=
$2
shift
shift
else
else
missing_requirement
$1
missing_requirement
$1
...
@@ -147,7 +147,7 @@ if [ ! -z ${TENSORRTLLM_BACKEND_REBUILD} ]; then
...
@@ -147,7 +147,7 @@ if [ ! -z ${TENSORRTLLM_BACKEND_REBUILD} ]; then
# Build the backend
# Build the backend
(
cd
inflight_batcher_llm/src
\
(
cd
inflight_batcher_llm/src
\
&&
cmake
-DCMAKE_INSTALL_PREFIX
:PATH
=
`
pwd
`
/install
-DUSE_CXX11_ABI
=
1
-DDYN
E
MO_LLM_PATH
=
$DYN
E
MO_LLM_PATH
..
\
&&
cmake
-DCMAKE_INSTALL_PREFIX
:PATH
=
`
pwd
`
/install
-DUSE_CXX11_ABI
=
1
-DDYN
A
MO_LLM_PATH
=
$DYN
A
MO_LLM_PATH
..
\
&&
make
install
\
&&
make
install
\
&&
cp
libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/
\
&&
cp
libtriton_tensorrtllm.so /opt/tritonserver/backends/tensorrtllm/
\
&&
cp
trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/
\
&&
cp
trtllmExecutorWorker /opt/tritonserver/backends/tensorrtllm/
\
...
...
container/deps/vllm/vllm_v0.7.2-dyn
e
mo-kv-disagg-patch.patch
→
container/deps/vllm/vllm_v0.7.2-dyn
a
mo-kv-disagg-patch.patch
View file @
602352ce
diff --git a/vllm/config.py b/vllm/config.py
diff --git a/vllm/config.py b/vllm/config.py
index 9ba49757..
3ec4bbab
100644
index 9ba49757..
5e1cf249
100644
--- a/vllm/config.py
--- a/vllm/config.py
+++ b/vllm/config.py
+++ b/vllm/config.py
@@ -2620,6 +2620,9 @@
class KVTransferConfig(BaseModel):
@@ -2620,6 +2620,9 @@
class KVTransferConfig(BaseModel):
...
@@ -41,7 +41,7 @@ index 9ba49757..3ec4bbab 100644
...
@@ -41,7 +41,7 @@ index 9ba49757..3ec4bbab 100644
f"and `kv_both`")
f"and `kv_both`")
- if self.kv_connector is not None and self.kv_role is None:
- if self.kv_connector is not None and self.kv_role is None:
+ if self.kv_connector is not None and self.kv_connector != "Dyn
e
moNixlConnector" and self.kv_role is None:
+ if self.kv_connector is not None and self.kv_connector != "Dyn
a
moNixlConnector" and self.kv_role is None:
raise ValueError("Please specify kv_disagg_role when kv_connector "
raise ValueError("Please specify kv_disagg_role when kv_connector "
"is set, supported roles are `kv_producer`, "
"is set, supported roles are `kv_producer`, "
"`kv_consumer`, and `kv_both`")
"`kv_consumer`, and `kv_both`")
...
@@ -54,7 +54,7 @@ index 9ba49757..3ec4bbab 100644
...
@@ -54,7 +54,7 @@ index 9ba49757..3ec4bbab 100644
def need_kv_parallel_group(self) -> bool:
def need_kv_parallel_group(self) -> bool:
# for those database-based connector, vLLM does not need to create
# for those database-based connector, vLLM does not need to create
# parallel group, and in that case the kv parallel size will be 1.
# parallel group, and in that case the kv parallel size will be 1.
+ if self.kv_connector == "Dyn
e
moNixlConnector":
+ if self.kv_connector == "Dyn
a
moNixlConnector":
+ return False
+ return False
return self.kv_connector is not None and self.kv_parallel_size > 1
return self.kv_connector is not None and self.kv_parallel_size > 1
...
@@ -271,7 +271,7 @@ index c5b3b04f..c72001f7 100644
...
@@ -271,7 +271,7 @@ index c5b3b04f..c72001f7 100644
self.block_tables: Dict[SeqId, BlockTable] = {}
self.block_tables: Dict[SeqId, BlockTable] = {}
diff --git a/vllm/core/event_manager.py b/vllm/core/event_manager.py
diff --git a/vllm/core/event_manager.py b/vllm/core/event_manager.py
new file mode 100644
new file mode 100644
index 00000000..
8699ca06
index 00000000..
d3706700
--- /dev/null
--- /dev/null
+++ b/vllm/core/event_manager.py
+++ b/vllm/core/event_manager.py
@@ -0,0 +1,102 @@
@@ -0,0 +1,102 @@
...
@@ -287,7 +287,7 @@ index 00000000..8699ca06
...
@@ -287,7 +287,7 @@ index 00000000..8699ca06
+logger = logging.getLogger(__name__)
+logger = logging.getLogger(__name__)
+
+
+
+
+class Dyn
e
moResult:
+class Dyn
a
moResult:
+ OK = 0
+ OK = 0
+ ERR = 1
+ ERR = 1
+
+
...
@@ -300,12 +300,12 @@ index 00000000..8699ca06
...
@@ -300,12 +300,12 @@ index 00000000..8699ca06
+
+
+ try:
+ try:
+ self.lib = ctypes.CDLL(lib_path)
+ self.lib = ctypes.CDLL(lib_path)
+ self.lib.dyn
e
mo_llm_init.argtypes = [c_char_p, c_char_p, c_int64]
+ self.lib.dyn
a
mo_llm_init.argtypes = [c_char_p, c_char_p, c_int64]
+ self.lib.dyn
e
mo_llm_init.restype = c_uint32
+ self.lib.dyn
a
mo_llm_init.restype = c_uint32
+
+
+ result = self.lib.dyn
e
mo_llm_init(namespace.encode(),
+ result = self.lib.dyn
a
mo_llm_init(namespace.encode(),
+ component.encode(), worker_id)
+ component.encode(), worker_id)
+ if result == Dyn
e
moResult.OK:
+ if result == Dyn
a
moResult.OK:
+ logger.info(
+ logger.info(
+ "KVCacheEventManager initialized successfully. Ready to publish KV Cache Events"
+ "KVCacheEventManager initialized successfully. Ready to publish KV Cache Events"
+ )
+ )
...
@@ -316,7 +316,7 @@ index 00000000..8699ca06
...
@@ -316,7 +316,7 @@ index 00000000..8699ca06
+ print(f"Failed to load {lib_path}")
+ print(f"Failed to load {lib_path}")
+ raise e
+ raise e
+
+
+ self.lib.dyn
e
mo_kv_event_publish_stored.argtypes = [
+ self.lib.dyn
a
mo_kv_event_publish_stored.argtypes = [
+ ctypes.c_uint64, # event_id
+ ctypes.c_uint64, # event_id
+ ctypes.POINTER(ctypes.c_uint32), # token_ids
+ ctypes.POINTER(ctypes.c_uint32), # token_ids
+ ctypes.POINTER(ctypes.c_size_t), # num_block_tokens
+ ctypes.POINTER(ctypes.c_size_t), # num_block_tokens
...
@@ -325,14 +325,14 @@ index 00000000..8699ca06
...
@@ -325,14 +325,14 @@ index 00000000..8699ca06
+ ctypes.POINTER(ctypes.c_uint64), # parent_hash
+ ctypes.POINTER(ctypes.c_uint64), # parent_hash
+ ctypes.c_uint64, # lora_id
+ ctypes.c_uint64, # lora_id
+ ]
+ ]
+ self.lib.dyn
e
mo_kv_event_publish_stored.restype = ctypes.c_uint32 # dyn
e
mo_llm_result_t
+ self.lib.dyn
a
mo_kv_event_publish_stored.restype = ctypes.c_uint32 # dyn
a
mo_llm_result_t
+
+
+ self.lib.dyn
e
mo_kv_event_publish_removed.argtypes = [
+ self.lib.dyn
a
mo_kv_event_publish_removed.argtypes = [
+ ctypes.c_uint64, # event_id
+ ctypes.c_uint64, # event_id
+ ctypes.POINTER(ctypes.c_uint64), # block_ids
+ ctypes.POINTER(ctypes.c_uint64), # block_ids
+ ctypes.c_size_t, # num_blocks
+ ctypes.c_size_t, # num_blocks
+ ]
+ ]
+ self.lib.dyn
e
mo_kv_event_publish_removed.restype = ctypes.c_uint32 # dyn
e
mo_llm_result_t
+ self.lib.dyn
a
mo_kv_event_publish_removed.restype = ctypes.c_uint32 # dyn
a
mo_llm_result_t
+
+
+ self.event_id_counter = 0
+ self.event_id_counter = 0
+
+
...
@@ -346,7 +346,7 @@ index 00000000..8699ca06
...
@@ -346,7 +346,7 @@ index 00000000..8699ca06
+ if parent is not None else None)
+ if parent is not None else None)
+
+
+ # Publish the event
+ # Publish the event
+ result = self.lib.dyn
e
mo_kv_event_publish_stored(
+ result = self.lib.dyn
a
mo_kv_event_publish_stored(
+ self.event_id_counter, # uint64_t event_id
+ self.event_id_counter, # uint64_t event_id
+ token_ids_arr, # const uint32_t *token_ids
+ token_ids_arr, # const uint32_t *token_ids
+ num_block_tokens, # const uintptr_t *num_block_tokens
+ num_block_tokens, # const uintptr_t *num_block_tokens
...
@@ -356,7 +356,7 @@ index 00000000..8699ca06
...
@@ -356,7 +356,7 @@ index 00000000..8699ca06
+ 0, # uint64_t lora_id
+ 0, # uint64_t lora_id
+ )
+ )
+
+
+ if result == Dyn
e
moResult.OK:
+ if result == Dyn
a
moResult.OK:
+ logger.debug(f"Store - Published KV Event: {block.content_hash}")
+ logger.debug(f"Store - Published KV Event: {block.content_hash}")
+ else:
+ else:
+ logger.debug(
+ logger.debug(
...
@@ -365,13 +365,13 @@ index 00000000..8699ca06
...
@@ -365,13 +365,13 @@ index 00000000..8699ca06
+ self.event_id_counter += 1
+ self.event_id_counter += 1
+
+
+ def enqueue_removed_event(self, block_hash: PrefixHash):
+ def enqueue_removed_event(self, block_hash: PrefixHash):
+ result = self.lib.dyn
e
mo_kv_event_publish_removed(
+ result = self.lib.dyn
a
mo_kv_event_publish_removed(
+ self.event_id_counter,
+ self.event_id_counter,
+ (ctypes.c_uint64 * 1)(block_hash),
+ (ctypes.c_uint64 * 1)(block_hash),
+ 1,
+ 1,
+ )
+ )
+
+
+ if result == Dyn
e
moResult.OK:
+ if result == Dyn
a
moResult.OK:
+ logger.debug(f"Remove - Published KV Event: {block_hash}")
+ logger.debug(f"Remove - Published KV Event: {block_hash}")
+ else:
+ else:
+ logger.debug(f"Remove - Failed to Publish KV Event: {block_hash}")
+ logger.debug(f"Remove - Failed to Publish KV Event: {block_hash}")
...
@@ -764,7 +764,7 @@ index 00000000..9b938039
...
@@ -764,7 +764,7 @@ index 00000000..9b938039
\
No newline at end of file
\
No newline at end of file
diff --git a/vllm/distributed/device_communicators/nixl.py b/vllm/distributed/device_communicators/nixl.py
diff --git a/vllm/distributed/device_communicators/nixl.py b/vllm/distributed/device_communicators/nixl.py
new file mode 100644
new file mode 100644
index 00000000..
523d58d4
index 00000000..
87020367
--- /dev/null
--- /dev/null
+++ b/vllm/distributed/device_communicators/nixl.py
+++ b/vllm/distributed/device_communicators/nixl.py
@@ -0,0 +1,405 @@
@@ -0,0 +1,405 @@
...
@@ -799,7 +799,7 @@ index 00000000..523d58d4
...
@@ -799,7 +799,7 @@ index 00000000..523d58d4
+ num_blocks: int
+ num_blocks: int
+
+
+
+
+class Dyn
e
moNixlConnector:
+class Dyn
a
moNixlConnector:
+ def __init__(self, vllm_config: VllmConfig, engine_id: str, rank: int):
+ def __init__(self, vllm_config: VllmConfig, engine_id: str, rank: int):
+ self.vllm_config = vllm_config
+ self.vllm_config = vllm_config
+ if NixlWrapper is None:
+ if NixlWrapper is None:
...
@@ -1173,11 +1173,11 @@ index 00000000..523d58d4
...
@@ -1173,11 +1173,11 @@ index 00000000..523d58d4
+ else:
+ else:
+ self._transfers[req_id] = running_reqs
+ self._transfers[req_id] = running_reqs
+ return done_req_ids
+ return done_req_ids
diff --git a/vllm/distributed/kv_transfer/kv_connector/dyn
e
mo_connector.py b/vllm/distributed/kv_transfer/kv_connector/dyn
e
mo_connector.py
diff --git a/vllm/distributed/kv_transfer/kv_connector/dyn
a
mo_connector.py b/vllm/distributed/kv_transfer/kv_connector/dyn
a
mo_connector.py
new file mode 100644
new file mode 100644
index 00000000..
2319867a
index 00000000..
7b3344f8
--- /dev/null
--- /dev/null
+++ b/vllm/distributed/kv_transfer/kv_connector/dyn
e
mo_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/dyn
a
mo_connector.py
@@ -0,0 +1,350 @@
@@ -0,0 +1,350 @@
+# SPDX-License-Identifier: Apache-2.0
+# SPDX-License-Identifier: Apache-2.0
+"""
+"""
...
@@ -1209,7 +1209,7 @@ index 00000000..2319867a
...
@@ -1209,7 +1209,7 @@ index 00000000..2319867a
+logger = init_logger(__name__)
+logger = init_logger(__name__)
+
+
+
+
+class Dyn
e
moConnector(KVConnectorBase):
+class Dyn
a
moConnector(KVConnectorBase):
+
+
+ def __init__(
+ def __init__(
+ self,
+ self,
...
@@ -1223,16 +1223,16 @@ index 00000000..2319867a
...
@@ -1223,16 +1223,16 @@ index 00000000..2319867a
+ self.tp_size = config.parallel_config.tensor_parallel_size
+ self.tp_size = config.parallel_config.tensor_parallel_size
+ self.rank = rank
+ self.rank = rank
+
+
+ if self.config.kv_connector != "Dyn
e
moNcclConnector":
+ if self.config.kv_connector != "Dyn
a
moNcclConnector":
+ raise NotImplementedError("Only Dyn
e
moNcclConnector is supported by the Dyn
e
moConnector class")
+ raise NotImplementedError("Only Dyn
a
moNcclConnector is supported by the Dyn
a
moConnector class")
+
+
+ from vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe import (
+ from vllm.distributed.kv_transfer.kv_pipe.pynccl_pipe import (
+ PyNcclPipe)
+ PyNcclPipe)
+ from vllm.distributed.kv_transfer.kv_pipe.dyn
e
mo_nccl_pipe import (
+ from vllm.distributed.kv_transfer.kv_pipe.dyn
a
mo_nccl_pipe import (
+ Dyn
e
moNcclDataPlane)
+ Dyn
a
moNcclDataPlane)
+
+
+ logger.info(
+ logger.info(
+ "Initializing Dyn
e
moNcclConnector under kv_transfer_config %s",
+ "Initializing Dyn
a
moNcclConnector under kv_transfer_config %s",
+ self.config)
+ self.config)
+
+
+ self.lookup_buffer_size = self.config.kv_buffer_size
+ self.lookup_buffer_size = self.config.kv_buffer_size
...
@@ -1264,7 +1264,7 @@ index 00000000..2319867a
...
@@ -1264,7 +1264,7 @@ index 00000000..2319867a
+ port_offset=port_offset_base,
+ port_offset=port_offset_base,
+ )
+ )
+
+
+ self.data_plane = Dyn
e
moNcclDataPlane(
+ self.data_plane = Dyn
a
moNcclDataPlane(
+ data_pipe=self.data_pipe,
+ data_pipe=self.data_pipe,
+ port=self._get_data_plane_port(self.global_kv_rank),
+ port=self._get_data_plane_port(self.global_kv_rank),
+ )
+ )
...
@@ -1530,7 +1530,7 @@ index 00000000..2319867a
...
@@ -1530,7 +1530,7 @@ index 00000000..2319867a
+ self.config.kv_consumers_pipeline_parallel_size = kv_config_enhanced["kv_consumers_pipeline_parallel_size"]
+ self.config.kv_consumers_pipeline_parallel_size = kv_config_enhanced["kv_consumers_pipeline_parallel_size"]
+ self.config.kv_producers_parallel_size = kv_config_enhanced["kv_producers_parallel_size"]
+ self.config.kv_producers_parallel_size = kv_config_enhanced["kv_producers_parallel_size"]
diff --git a/vllm/distributed/kv_transfer/kv_connector/factory.py b/vllm/distributed/kv_transfer/kv_connector/factory.py
diff --git a/vllm/distributed/kv_transfer/kv_connector/factory.py b/vllm/distributed/kv_transfer/kv_connector/factory.py
index fe480533..
f4775663
100644
index fe480533..
c82fda80
100644
--- a/vllm/distributed/kv_transfer/kv_connector/factory.py
--- a/vllm/distributed/kv_transfer/kv_connector/factory.py
+++ b/vllm/distributed/kv_transfer/kv_connector/factory.py
+++ b/vllm/distributed/kv_transfer/kv_connector/factory.py
@@ -27,13 +27,13 @@
class KVConnectorFactory:
@@ -27,13 +27,13 @@
class KVConnectorFactory:
...
@@ -1555,11 +1555,11 @@ index fe480533..f4775663 100644
...
@@ -1555,11 +1555,11 @@ index fe480533..f4775663 100644
"SimpleConnector")
"SimpleConnector")
+
+
+KVConnectorFactory.register_connector(
+KVConnectorFactory.register_connector(
+ "Dyn
e
moNcclConnector",
+ "Dyn
a
moNcclConnector",
+ "vllm.distributed.kv_transfer.kv_connector.dyn
e
mo_connector",
+ "vllm.distributed.kv_transfer.kv_connector.dyn
a
mo_connector",
+ "Dyn
e
moConnector")
+ "Dyn
a
moConnector")
diff --git a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
diff --git a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
index 2033e976..
e0537903
100644
index 2033e976..
ddebb68e
100644
--- a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
--- a/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
+++ b/vllm/distributed/kv_transfer/kv_connector/simple_connector.py
@@ -8,13 +8,15 @@
MooncakePipe.
@@ -8,13 +8,15 @@
MooncakePipe.
...
@@ -1886,7 +1886,7 @@ index 2033e976..e0537903 100644
...
@@ -1886,7 +1886,7 @@ index 2033e976..e0537903 100644
+ world_group.broadcast_object(kv_config_enhanced)
+ world_group.broadcast_object(kv_config_enhanced)
+
+
+ else:
+ else:
+ raise NotImplementedError("MooncakeConnector is not supported in Dyn
e
mo patch")
+ raise NotImplementedError("MooncakeConnector is not supported in Dyn
a
mo patch")
+ else:
+ else:
+ kv_config_enhanced = world_group.broadcast_object()
+ kv_config_enhanced = world_group.broadcast_object()
+ logger.info("kv_config_enhanced: %s", kv_config_enhanced)
+ logger.info("kv_config_enhanced: %s", kv_config_enhanced)
...
@@ -2175,11 +2175,11 @@ index 40589fb3..da2829cf 100644
...
@@ -2175,11 +2175,11 @@ index 40589fb3..da2829cf 100644
"""Receive a tensor (can be None) from the pipeline.
"""Receive a tensor (can be None) from the pipeline.
Returns:
Returns:
diff --git a/vllm/distributed/kv_transfer/kv_pipe/dyn
e
mo_nccl_pipe.py b/vllm/distributed/kv_transfer/kv_pipe/dyn
e
mo_nccl_pipe.py
diff --git a/vllm/distributed/kv_transfer/kv_pipe/dyn
a
mo_nccl_pipe.py b/vllm/distributed/kv_transfer/kv_pipe/dyn
a
mo_nccl_pipe.py
new file mode 100644
new file mode 100644
index 00000000..
58d0d28c
index 00000000..
3ee0fa78
--- /dev/null
--- /dev/null
+++ b/vllm/distributed/kv_transfer/kv_pipe/dyn
e
mo_nccl_pipe.py
+++ b/vllm/distributed/kv_transfer/kv_pipe/dyn
a
mo_nccl_pipe.py
@@ -0,0 +1,124 @@
@@ -0,0 +1,124 @@
+import logging
+import logging
+import threading
+import threading
...
@@ -2195,7 +2195,7 @@ index 00000000..58d0d28c
...
@@ -2195,7 +2195,7 @@ index 00000000..58d0d28c
+logger = logging.getLogger(__name__)
+logger = logging.getLogger(__name__)
+
+
+
+
+class Dyn
e
moNcclDataPlane:
+class Dyn
a
moNcclDataPlane:
+ def __init__(
+ def __init__(
+ self,
+ self,
+ data_pipe: PyNcclPipe,
+ data_pipe: PyNcclPipe,
...
@@ -2531,7 +2531,7 @@ index 321902d1..b8937ef8 100644
...
@@ -2531,7 +2531,7 @@ index 321902d1..b8937ef8 100644
def ensure_model_parallel_initialized(
def ensure_model_parallel_initialized(
diff --git a/vllm/engine/llm_engine.py b/vllm/engine/llm_engine.py
diff --git a/vllm/engine/llm_engine.py b/vllm/engine/llm_engine.py
index d82d9ad9..
cc02b029
100644
index d82d9ad9..
53cace75
100644
--- a/vllm/engine/llm_engine.py
--- a/vllm/engine/llm_engine.py
+++ b/vllm/engine/llm_engine.py
+++ b/vllm/engine/llm_engine.py
@@ -2,13 +2,17 @@
@@ -2,13 +2,17 @@
...
@@ -2614,7 +2614,7 @@ index d82d9ad9..cc02b029 100644
...
@@ -2614,7 +2614,7 @@ index d82d9ad9..cc02b029 100644
+ self.engine_id = str(uuid.uuid4())
+ self.engine_id = str(uuid.uuid4())
+ self._nixl_agents_names: Optional[List[str]] = None
+ self._nixl_agents_names: Optional[List[str]] = None
+ if self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector":
+ if self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector":
+ self._nixl_agents_names = self._initialize_nixl()
+ self._nixl_agents_names = self._initialize_nixl()
+
+
+ self._request_notif_counter = defaultdict(lambda: -self.parallel_config.tensor_parallel_size)
+ self._request_notif_counter = defaultdict(lambda: -self.parallel_config.tensor_parallel_size)
...
@@ -2946,7 +2946,7 @@ index 3cf1850e..6b90ece7 100644
...
@@ -2946,7 +2946,7 @@ index 3cf1850e..6b90ece7 100644
+ kv_active_blocks: int
+ kv_active_blocks: int
+ kv_total_blocks: int
+ kv_total_blocks: int
diff --git a/vllm/engine/multiprocessing/client.py b/vllm/engine/multiprocessing/client.py
diff --git a/vllm/engine/multiprocessing/client.py b/vllm/engine/multiprocessing/client.py
index 85b5f31e..
3f8b8fad
100644
index 85b5f31e..
da207947
100644
--- a/vllm/engine/multiprocessing/client.py
--- a/vllm/engine/multiprocessing/client.py
+++ b/vllm/engine/multiprocessing/client.py
+++ b/vllm/engine/multiprocessing/client.py
@@ -8,6 +8,7 @@
from typing import (Any, AsyncGenerator, Dict, Iterator, List, Mapping,
@@ -8,6 +8,7 @@
from typing import (Any, AsyncGenerator, Dict, Iterator, List, Mapping,
...
@@ -3028,7 +3028,7 @@ index 85b5f31e..3f8b8fad 100644
...
@@ -3028,7 +3028,7 @@ index 85b5f31e..3f8b8fad 100644
+
+
+ @property
+ @property
+ def using_nixl_connector(self) -> bool:
+ def using_nixl_connector(self) -> bool:
+ return self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector"
+ return self.vllm_config.kv_transfer_config is not None and self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector"
+
+
@staticmethod
@staticmethod
def is_unsupported_config(engine_args: AsyncEngineArgs):
def is_unsupported_config(engine_args: AsyncEngineArgs):
...
@@ -3656,7 +3656,7 @@ index 534b9e60..18675d2f 100644
...
@@ -3656,7 +3656,7 @@ index 534b9e60..18675d2f 100644
@property
@property
def is_first_multi_step(self) -> bool:
def is_first_multi_step(self) -> bool:
diff --git a/vllm/worker/model_runner.py b/vllm/worker/model_runner.py
diff --git a/vllm/worker/model_runner.py b/vllm/worker/model_runner.py
index 12baecde..
489d3b77
100644
index 12baecde..
a3f2c464
100644
--- a/vllm/worker/model_runner.py
--- a/vllm/worker/model_runner.py
+++ b/vllm/worker/model_runner.py
+++ b/vllm/worker/model_runner.py
@@ -1824,6 +1824,9 @@
class ModelRunner(GPUModelRunnerBase[ModelInputForGPUWithSamplingMetadata]):
@@ -1824,6 +1824,9 @@
class ModelRunner(GPUModelRunnerBase[ModelInputForGPUWithSamplingMetadata]):
...
@@ -3664,7 +3664,7 @@ index 12baecde..489d3b77 100644
...
@@ -3664,7 +3664,7 @@ index 12baecde..489d3b77 100644
if self.vllm_config.kv_transfer_config is None:
if self.vllm_config.kv_transfer_config is None:
return False
return False
+
+
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector":
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector":
+ return False
+ return False
prefill_meta = model_input.attn_metadata.prefill_metadata
prefill_meta = model_input.attn_metadata.prefill_metadata
...
@@ -3674,13 +3674,13 @@ index 12baecde..489d3b77 100644
...
@@ -3674,13 +3674,13 @@ index 12baecde..489d3b77 100644
if self.vllm_config.kv_transfer_config is None:
if self.vllm_config.kv_transfer_config is None:
return False
return False
+
+
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
e
moNixlConnector":
+ if self.vllm_config.kv_transfer_config.kv_connector == "Dyn
a
moNixlConnector":
+ return False
+ return False
prefill_meta = model_input.attn_metadata.prefill_metadata
prefill_meta = model_input.attn_metadata.prefill_metadata
diff --git a/vllm/worker/worker.py b/vllm/worker/worker.py
diff --git a/vllm/worker/worker.py b/vllm/worker/worker.py
index 582aa460..
e4ed902e
100644
index 582aa460..
36a21d10
100644
--- a/vllm/worker/worker.py
--- a/vllm/worker/worker.py
+++ b/vllm/worker/worker.py
+++ b/vllm/worker/worker.py
@@ -2,7 +2,7 @@
@@ -2,7 +2,7 @@
...
@@ -3696,7 +3696,7 @@ index 582aa460..e4ed902e 100644
...
@@ -3696,7 +3696,7 @@ index 582aa460..e4ed902e 100644
from vllm.worker.pooling_model_runner import PoolingModelRunner
from vllm.worker.pooling_model_runner import PoolingModelRunner
from vllm.worker.worker_base import (LocalOrDistributedWorkerBase, WorkerBase,
from vllm.worker.worker_base import (LocalOrDistributedWorkerBase, WorkerBase,
WorkerInput)
WorkerInput)
+from vllm.distributed.device_communicators.nixl import Dyn
e
moNixlConnector
+from vllm.distributed.device_communicators.nixl import Dyn
a
moNixlConnector
+
+
logger = init_logger(__name__)
logger = init_logger(__name__)
...
@@ -3710,7 +3710,7 @@ index 582aa460..e4ed902e 100644
...
@@ -3710,7 +3710,7 @@ index 582aa460..e4ed902e 100644
+ # TODO ptarasiewicz nixl can also support DRAM
+ # TODO ptarasiewicz nixl can also support DRAM
+ assert self.device_config.device_type == "cuda", "Currently only CUDA is supported for Nixl connector"
+ assert self.device_config.device_type == "cuda", "Currently only CUDA is supported for Nixl connector"
+
+
+ self.nixl_connector = Dyn
e
moNixlConnector(self.vllm_config, engine_id, self.local_rank) # TODO ptarasiewicz: rank or local_rank?
+ self.nixl_connector = Dyn
a
moNixlConnector(self.vllm_config, engine_id, self.local_rank) # TODO ptarasiewicz: rank or local_rank?
+ assert len(self.cache_engine) == 1, "Only one cache engine is supported for now"
+ assert len(self.cache_engine) == 1, "Only one cache engine is supported for now"
+ self.nixl_connector.register_kv_caches(self.cache_engine[0].gpu_cache)
+ self.nixl_connector.register_kv_caches(self.cache_engine[0].gpu_cache)
+ return self.nixl_connector.agent_name
+ return self.nixl_connector.agent_name
...
@@ -3766,7 +3766,7 @@ index 582aa460..e4ed902e 100644
...
@@ -3766,7 +3766,7 @@ index 582aa460..e4ed902e 100644
@torch.inference_mode()
@torch.inference_mode()
diff --git a/vllm/worker/worker_base.py b/vllm/worker/worker_base.py
diff --git a/vllm/worker/worker_base.py b/vllm/worker/worker_base.py
index 819b81fb..
8df
dad
de
100644
index 819b81fb..
ff43
dad
c
100644
--- a/vllm/worker/worker_base.py
--- a/vllm/worker/worker_base.py
+++ b/vllm/worker/worker_base.py
+++ b/vllm/worker/worker_base.py
@@ -9,6 +9,7 @@
from typing import Any, Dict, List, Optional, Set, Tuple, Type, Union
@@ -9,6 +9,7 @@
from typing import Any, Dict, List, Optional, Set, Tuple, Type, Union
...
@@ -3781,7 +3781,7 @@ index 819b81fb..8dfdadde 100644
...
@@ -3781,7 +3781,7 @@ index 819b81fb..8dfdadde 100644
from vllm.worker.model_runner_base import (BroadcastableModelInput,
from vllm.worker.model_runner_base import (BroadcastableModelInput,
ModelRunnerBase,
ModelRunnerBase,
ModelRunnerInputBase)
ModelRunnerInputBase)
+from vllm.distributed.device_communicators.nixl import Dyn
e
moNixlConnector
+from vllm.distributed.device_communicators.nixl import Dyn
a
moNixlConnector
logger = init_logger(__name__)
logger = init_logger(__name__)
...
@@ -3789,7 +3789,7 @@ index 819b81fb..8dfdadde 100644
...
@@ -3789,7 +3789,7 @@ index 819b81fb..8dfdadde 100644
from vllm.platforms import current_platform
from vllm.platforms import current_platform
self.current_platform = current_platform
self.current_platform = current_platform
+ self.nixl_connector: Optional[Dyn
e
moNixlConnector] = None
+ self.nixl_connector: Optional[Dyn
a
moNixlConnector] = None
+
+
@abstractmethod
@abstractmethod
def init_device(self) -> None:
def init_device(self) -> None:
...
...
deploy/Kubernetes/common/chart/Chart.yaml
View file @
602352ce
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
apiVersion
:
v2
apiVersion
:
v2
appVersion
:
1.0.0
appVersion
:
1.0.0
description
:
Distributed Neural Models (dyn
e
mo) Component
description
:
Distributed Neural Models (dyn
a
mo) Component
icon
:
https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png
icon
:
https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png
name
:
dyn
e
mo_component
name
:
dyn
a
mo_component
version
:
1.0.0
version
:
1.0.0
deploy/Kubernetes/common/chart/templates/_helpers.tpl
View file @
602352ce
...
@@ -15,7 +15,7 @@
...
@@ -15,7 +15,7 @@
# Annotation Groups
# Annotation Groups
{{- define "nvidia.annotations.default" }}
{{- define "nvidia.annotations.default" }}
dyn
e
mo: "{{ .Release.Name }}.{{ .Chart.AppVersion | default "0.0" }}"
dyn
a
mo: "{{ .Release.Name }}.{{ .Chart.AppVersion | default "0.0" }}"
{{- with .Values.kubernetes }}
{{- with .Values.kubernetes }}
{{- with .annotations }}
{{- with .annotations }}
{{ toYaml . }}
{{ toYaml . }}
...
@@ -54,7 +54,7 @@ app.kubernetes.io/instance: {{ .Release.Name }}
...
@@ -54,7 +54,7 @@ app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{- end }}
{{- define "nvidia.label.appManagedBy" }}
{{- define "nvidia.label.appManagedBy" }}
{{- $service_name := "dyn
e
mo" }}
{{- $service_name := "dyn
a
mo" }}
{{- with .Release.Service }}
{{- with .Release.Service }}
{{- $service_name = . }}
{{- $service_name = . }}
{{- end }}
{{- end }}
...
@@ -66,7 +66,7 @@ app.kubernetes.io/name: {{ required "Property '.component.name' is required." .V
...
@@ -66,7 +66,7 @@ app.kubernetes.io/name: {{ required "Property '.component.name' is required." .V
{{- end }}
{{- end }}
{{- define "nvidia.label.appPartOf" }}
{{- define "nvidia.label.appPartOf" }}
{{- $part_of := "dyn
e
mo" }}
{{- $part_of := "dyn
a
mo" }}
{{- with .Values.kubernetes }}
{{- with .Values.kubernetes }}
{{- with .partOf }}
{{- with .partOf }}
{{- $part_of = . }}
{{- $part_of = . }}
...
...
Prev
1
2
3
4
5
…
22
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment