Unverified Commit 4498a77d authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

fix: move docker-compose.yml to deploy/, and update frontend port (#2121)


Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 4dc529a1
...@@ -123,7 +123,7 @@ python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B ...@@ -123,7 +123,7 @@ python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
#### Send a Request #### Send a Request
```bash ```bash
curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ curl localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
"messages": [ "messages": [
{ {
......
...@@ -56,10 +56,10 @@ Below we provide a guide that lets you run all of our the common deployment patt ...@@ -56,10 +56,10 @@ Below we provide a guide that lets you run all of our the common deployment patt
### Start NATS and ETCD in the background ### Start NATS and ETCD in the background
Start using [Docker Compose](../../deploy/metrics/docker-compose.yml) Start using [Docker Compose](../../../deploy/docker-compose.yml)
```bash ```bash
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
### Build container ### Build container
......
...@@ -64,9 +64,9 @@ Note: TensorRT-LLM disaggregation does not support conditional disaggregation ye ...@@ -64,9 +64,9 @@ Note: TensorRT-LLM disaggregation does not support conditional disaggregation ye
### Prerequisites ### Prerequisites
Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml) Start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml)
```bash ```bash
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
### Build docker ### Build docker
......
...@@ -15,10 +15,10 @@ See [deployment architectures](../llm/README.md#deployment-architectures) to lea ...@@ -15,10 +15,10 @@ See [deployment architectures](../llm/README.md#deployment-architectures) to lea
### Prerequisites ### Prerequisites
Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml): Start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml):
```bash ```bash
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
### Build and Run docker ### Build and Run docker
......
...@@ -22,7 +22,7 @@ Start the required services on your head node. These endpoints must be accessibl ...@@ -22,7 +22,7 @@ Start the required services on your head node. These endpoints must be accessibl
```bash ```bash
# On head node (node-1) # On head node (node-1)
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
Default ports: Default ports:
......
...@@ -94,7 +94,7 @@ To visualize the metrics being exposed on the Prometheus endpoint, ...@@ -94,7 +94,7 @@ To visualize the metrics being exposed on the Prometheus endpoint,
see the Prometheus and Grafana configurations in see the Prometheus and Grafana configurations in
[deploy/metrics](../../deploy/metrics): [deploy/metrics](../../deploy/metrics):
```bash ```bash
docker compose -f deploy/metrics/docker-compose.yml --profile metrics up -d docker compose -f deploy/docker-compose.yml --profile metrics up -d
``` ```
## Metrics Collection Modes ## Metrics Collection Modes
......
...@@ -92,7 +92,7 @@ services: ...@@ -92,7 +92,7 @@ services:
image: prom/prometheus:v3.4.1 image: prom/prometheus:v3.4.1
container_name: prometheus container_name: prometheus
volumes: volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml - ./metrics/prometheus.yml:/etc/prometheus/prometheus.yml
command: command:
- '--config.file=/etc/prometheus/prometheus.yml' - '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus' - '--storage.tsdb.path=/prometheus'
...@@ -123,8 +123,8 @@ services: ...@@ -123,8 +123,8 @@ services:
image: grafana/grafana-enterprise:12.0.1 image: grafana/grafana-enterprise:12.0.1
container_name: grafana container_name: grafana
volumes: volumes:
- ./grafana_dashboards:/etc/grafana/provisioning/dashboards - ./metrics/grafana_dashboards:/etc/grafana/provisioning/dashboards
- ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml - ./metrics/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
environment: environment:
- GF_SERVER_HTTP_PORT=3001 - GF_SERVER_HTTP_PORT=3001
# do not make it admin/admin, because you will be prompted to change the password every time # do not make it admin/admin, because you will be prompted to change the password every time
......
...@@ -18,7 +18,7 @@ graph TD ...@@ -18,7 +18,7 @@ graph TD
PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380] PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380]
PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401] PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401]
PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP
PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000] PROMETHEUS -->|:8080/metrics| DYNAMOFE[Dynamo HTTP FE :8080]
GRAFANA -->|:9090/query API| PROMETHEUS GRAFANA -->|:9090/query API| PROMETHEUS
end end
``` ```
...@@ -34,9 +34,9 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container ...@@ -34,9 +34,9 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
2. Start Dynamo dependencies. Assume you're at the root dynamo path: 2. Start Dynamo dependencies. Assume you're at the root dynamo path:
```bash ```bash
docker compose -f deploy/metrics/docker-compose.yml up -d # Minimum components for Dynamo: etcd/nats/dcgm-exporter docker compose -f deploy/docker-compose.yml up -d # Minimum components for Dynamo: etcd/nats/dcgm-exporter
# or # or
docker compose -f deploy/metrics/docker-compose.yml --profile metrics up -d # In addition to the above, start Prometheus & Grafana docker compose -f deploy/docker-compose.yml --profile metrics up -d # In addition to the above, start Prometheus & Grafana
``` ```
To target specific GPU(s), export the variable below before running Docker Compose: To target specific GPU(s), export the variable below before running Docker Compose:
......
...@@ -35,7 +35,7 @@ scrape_configs: ...@@ -35,7 +35,7 @@ scrape_configs:
# This is a demo service that needs to be launched manually. See components/metrics/README.md # This is a demo service that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp
- job_name: 'llm-demo' - job_name: 'dynamo-backend'
scrape_interval: 10s scrape_interval: 10s
static_configs: static_configs:
- targets: ['host.docker.internal:8000'] # on the "monitoring" network - targets: ['host.docker.internal:8000'] # on the "monitoring" network
......
...@@ -97,7 +97,7 @@ You can run this pipeline locally by spinning up ETCD and NATS and then running ...@@ -97,7 +97,7 @@ You can run this pipeline locally by spinning up ETCD and NATS and then running
```bash ```bash
# Spin up ETCD and NATS # Spin up ETCD and NATS
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
then then
...@@ -110,7 +110,7 @@ dynamo serve pipeline:Frontend ...@@ -110,7 +110,7 @@ dynamo serve pipeline:Frontend
Once it's up and running, you can make a request to the pipeline using Once it's up and running, you can make a request to the pipeline using
```bash ```bash
curl -X POST http://localhost:8000/generate \ curl -X POST http://localhost:8080/generate \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-d '{"text": "federer"}' -d '{"text": "federer"}'
``` ```
......
...@@ -23,7 +23,7 @@ This diagram shows the NVIDIA Dynamo disaggregated inference system as implement ...@@ -23,7 +23,7 @@ This diagram shows the NVIDIA Dynamo disaggregated inference system as implement
The primary user journey through the system: The primary user journey through the system:
1. **Discovery (S1)**: Client discovers the service endpoint 1. **Discovery (S1)**: Client discovers the service endpoint
2. **Request (S2)**: HTTP client sends API request to Frontend (OpenAI-compatible server on port 8000) 2. **Request (S2)**: HTTP client sends API request to Frontend (OpenAI-compatible server on port 8080)
3. **Validate (S3)**: Frontend forwards request to Processor for validation and routing 3. **Validate (S3)**: Frontend forwards request to Processor for validation and routing
4. **Route (S3)**: Processor routes the validated request to appropriate Decode Worker 4. **Route (S3)**: Processor routes the validated request to appropriate Decode Worker
...@@ -84,7 +84,7 @@ graph TD ...@@ -84,7 +84,7 @@ graph TD
%% Top Layer - Client & Frontend %% Top Layer - Client & Frontend
Client["<b>HTTP Client</b>"] Client["<b>HTTP Client</b>"]
S1[["<b>1 DISCOVERY</b>"]] S1[["<b>1 DISCOVERY</b>"]]
Frontend["<b>Frontend</b><br/><i>OpenAI Compatible Server<br/>Port 8000</i>"] Frontend["<b>Frontend</b><br/><i>OpenAI Compatible Server<br/>Port 8080</i>"]
S2[["<b>2 REQUEST</b>"]] S2[["<b>2 REQUEST</b>"]]
%% Processing Layer %% Processing Layer
......
...@@ -67,7 +67,7 @@ Look for one that ends in `-frontend` and use it for port forward. ...@@ -67,7 +67,7 @@ Look for one that ends in `-frontend` and use it for port forward.
```bash ```bash
SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1) SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE} kubectl port-forward svc/${SERVICE_NAME}-frontend 8080:8080 -n ${NAMESPACE}
``` ```
Consult the [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/) Consult the [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)
......
...@@ -88,7 +88,7 @@ Here's a template structure based on the examples: ...@@ -88,7 +88,7 @@ Here's a template structure based on the examples:
Consult the corresponding sh file. Each of the python commands to launch a component will go into your yaml spec under the Consult the corresponding sh file. Each of the python commands to launch a component will go into your yaml spec under the
`extraPodSpec: -> mainContainer: -> args:` `extraPodSpec: -> mainContainer: -> args:`
The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]" The front end is launched with "python3 -m dynamo.frontend [--http-port 8080] [--router-mode kv]"
Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command. Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command. If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command.
......
...@@ -46,7 +46,7 @@ genai-perf profile \ ...@@ -46,7 +46,7 @@ genai-perf profile \
--tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B \ --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
-m deepseek-ai/DeepSeek-R1-Distill-Llama-8B \ -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
--endpoint-type chat \ --endpoint-type chat \
--url http://localhost:8000 \ --url http://localhost:8080 \
--streaming \ --streaming \
--input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl
``` ```
...@@ -76,7 +76,7 @@ In this example, we use a fixed 2p2d engine as baseline. Planner provides a `--n ...@@ -76,7 +76,7 @@ In this example, we use a fixed 2p2d engine as baseline. Planner provides a `--n
# TODO # TODO
# in terminal 2 # in terminal 2
genai-perf profile --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B --service-kind openai --endpoint-type chat --url http://localhost:8000 --streaming --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl genai-perf profile --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B --service-kind openai --endpoint-type chat --url http://localhost:8080 --streaming --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl
``` ```
## Results ## Results
......
...@@ -44,11 +44,11 @@ cargo test ...@@ -44,11 +44,11 @@ cargo test
The simplest way to deploy the pre-requisite services is using The simplest way to deploy the pre-requisite services is using
[docker-compose](https://docs.docker.com/compose/install/linux/), [docker-compose](https://docs.docker.com/compose/install/linux/),
defined in [deploy/metrics/docker-compose.yml](../../deploy/metrics/docker-compose.yml). defined in [deploy/docker-compose.yml](../../deploy/docker-compose.yml).
``` ```
# At the root of the repository: # At the root of the repository:
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
This will deploy a [NATS.io](https://nats.io/) server and an [etcd](https://etcd.io/) This will deploy a [NATS.io](https://nats.io/) server and an [etcd](https://etcd.io/)
......
...@@ -61,7 +61,7 @@ The example demonstrates: ...@@ -61,7 +61,7 @@ The example demonstrates:
# clone the dynamo repository if necessary # clone the dynamo repository if necessary
# git clone https://github.com/ai-dynamo/dynamo.git # git clone https://github.com/ai-dynamo/dynamo.git
cd dynamo cd dynamo
docker compose -f deploy/metrics/docker-compose.yml up -d docker compose -f deploy/docker-compose.yml up -d
``` ```
### Running the Example ### Running the Example
......
...@@ -18,7 +18,7 @@ cargo build ...@@ -18,7 +18,7 @@ cargo build
### Run Server ### Run Server
```bash ```bash
export DYN_LOG=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8000 export DYN_LOG=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081
cargo run --bin system_server cargo run --bin system_server
``` ```
...@@ -31,7 +31,7 @@ Note: Running the client will increment `service_requests_total`. ...@@ -31,7 +31,7 @@ Note: Running the client will increment `service_requests_total`.
### View Metrics ### View Metrics
```bash ```bash
curl http://localhost:8000/metrics curl http://localhost:8081/metrics
``` ```
Example output: Example output:
...@@ -66,7 +66,7 @@ uptime_seconds{namespace="http_server"} 725.997013676 ...@@ -66,7 +66,7 @@ uptime_seconds{namespace="http_server"} 725.997013676
|----------|-------------|---------| |----------|-------------|---------|
| `DYN_LOG` | Enable logging | `0` | | `DYN_LOG` | Enable logging | `0` |
| `DYN_SYSTEM_ENABLED` | Enable system metrics | `false` | | `DYN_SYSTEM_ENABLED` | Enable system metrics | `false` |
| `DYN_SYSTEM_PORT` | HTTP server port | `8000` | | `DYN_SYSTEM_PORT` | HTTP server port | `8081` |
## Metrics ## Metrics
......
...@@ -44,7 +44,7 @@ cargo test ...@@ -44,7 +44,7 @@ cargo test
The simplest way to deploy the pre-requisite services is using The simplest way to deploy the pre-requisite services is using
[docker-compose](https://docs.docker.com/compose/install/linux/), [docker-compose](https://docs.docker.com/compose/install/linux/),
defined in the project's root [docker-compose.yml](../../../docker-compose.yml). defined in the project's root [docker-compose.yml](../../../../../deploy/docker-compose.yml).
``` ```
docker-compose up -d docker-compose up -d
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment