Unverified Commit b0ceb4d3 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

fix: use port 9401 inside and outside of container (#1838)


Co-authored-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 7dea77c3
...@@ -13,19 +13,17 @@ Default Service Relationship Diagram: ...@@ -13,19 +13,17 @@ Default Service Relationship Diagram:
```mermaid ```mermaid
graph TD graph TD
BROWSER[Browser] -->|:3001| GRAFANA[Grafana :3001] BROWSER[Browser] -->|:3001| GRAFANA[Grafana :3001]
BROWSER[Browser] -->|:3001| DCGM_EXPORTER2["external dcgm_exporter 0.0.0.0:9400"]
subgraph DockerComposeNetwork [Network inside Docker Compose] subgraph DockerComposeNetwork [Network inside Docker Compose]
NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222] NATS_PROM_EXP[nats-prom-exp :7777 /metrics] -->|:8222/varz| NATS_SERVER[nats-server :4222, :6222, :8222]
PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380] PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380]
PROMETHEUS -->|:9400/metrics| DCGM_EXPORTER[dcgm-exporter :9400] PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401]
PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP
PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000] PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000]
GRAFANA -->|:9090/query API| PROMETHEUS GRAFANA -->|:9090/query API| PROMETHEUS
end end
BROWSER -->|:9401/metrics| DCGM_EXPORTER
``` ```
The dcgm-exporter within the Docker Compose network is configured to bind to port 9400 internally, but it is exposed externally on port 9401. This setup helps prevent conflicts with other dcgm-exporters that might be running concurrently, such as in distributed environments like SLURM. The dcgm-exporter service in the Docker Compose network is configured to use port 9401 instead of the default port 9400. This adjustment is made to avoid port conflicts with other dcgm-exporter instances that may be running simultaneously. Such a configuration is typical in distributed systems like SLURM.
As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build containers with `--framework VLLM_V1` or `--framework TENSORRTLLM`. As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build containers with `--framework VLLM_V1` or `--framework TENSORRTLLM`.
......
...@@ -63,11 +63,12 @@ services: ...@@ -63,11 +63,12 @@ services:
dcgm-exporter: dcgm-exporter:
image: nvidia/dcgm-exporter:4.2.3-4.1.3-ubi9 image: nvidia/dcgm-exporter:4.2.3-4.1.3-ubi9
ports: ports:
# Remap from 9400 to 9401 (public port) to avoid conflict with an existing dcgm-exporter # Expose dcgm-exporter on port 9401 both inside and outside the container
# on dlcluster. To access dcgm: # to avoid conflicts with other dcgm-exporter instances in distributed environments.
# Outside the container: curl http://localhost:9401/metrics # To access DCGM metrics:
# Inside the container (container-to-container): curl http://dcgm-exporter:9400/metrics # Outside the container: curl http://localhost:9401/metrics (or the host IP)
- 9401:9400 # Inside the container (container-to-container): curl http://dcgm-exporter:9401/metrics
- 9401:9401
cap_add: cap_add:
- SYS_ADMIN - SYS_ADMIN
deploy: deploy:
...@@ -80,6 +81,7 @@ services: ...@@ -80,6 +81,7 @@ services:
environment: environment:
# dcgm uses NVIDIA_VISIBLE_DEVICES variable but normally it is CUDA_VISIBLE_DEVICES # dcgm uses NVIDIA_VISIBLE_DEVICES variable but normally it is CUDA_VISIBLE_DEVICES
- NVIDIA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-all} - NVIDIA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-all}
- DCGM_EXPORTER_LISTEN=:9401
runtime: nvidia # Specify the NVIDIA runtime runtime: nvidia # Specify the NVIDIA runtime
networks: networks:
- monitoring - monitoring
......
...@@ -31,7 +31,7 @@ scrape_configs: ...@@ -31,7 +31,7 @@ scrape_configs:
- job_name: 'dcgm-exporter' - job_name: 'dcgm-exporter'
scrape_interval: 5s scrape_interval: 5s
static_configs: static_configs:
- targets: ['dcgm-exporter:9400'] # on the "monitoring" network - targets: ['dcgm-exporter:9401'] # on the "monitoring" network
# This is a demo service that needs to be launched manually. See components/metrics/README.md # This is a demo service that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment