Unverified Commit a0f845e3 authored by Keiven C's avatar Keiven C Committed by GitHub
Browse files

feat: add network isolation modes to container/run.sh script (#3237)


Signed-off-by: default avatarKeiven Chang <keivenchang@users.noreply.github.com>
parent 5bb74904
...@@ -167,36 +167,112 @@ The `run.sh` script launches Docker containers with the appropriate configuratio ...@@ -167,36 +167,112 @@ The `run.sh` script launches Docker containers with the appropriate configuratio
- **GPU Management**: Automatic GPU detection and allocation - **GPU Management**: Automatic GPU detection and allocation
- **Volume Mounting**: Workspace and HuggingFace cache mounting - **Volume Mounting**: Workspace and HuggingFace cache mounting
- **User Management**: Root or user-based container execution - **User Management**: Root or user-based container execution
- **Network Configuration**: Host networking for service communication - **Network Configuration**: Configurable networking modes (host, bridge, none, container sharing)
- **Resource Limits**: Memory, file descriptors, and IPC configuration - **Resource Limits**: Memory, file descriptors, and IPC configuration
**Common Usage Examples:** **Common Usage Examples:**
```bash ```bash
# Basic container launch (inference/production) # Basic container launch (inference/production, runs as root user)
./run.sh --image dynamo:latest-vllm ./run.sh --image dynamo:latest-vllm -v $HOME/.cache:/home/ubuntu/.cache
# Mount workspace for development (use local-dev image for local user permissions) # Mount workspace for development (use local-dev image for local host user permissions)
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
# Use specific image and framework for development # Use specific image and framework for development
./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace ./run.sh --image v0.1.0.dev.08cc44965-vllm-local-dev --framework vllm --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
# Interactive development shell with workspace mounted # Interactive development shell with workspace mounted
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it -- bash ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache -it -- bash
# Development with custom environment variables # Development with custom environment variables
./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace ./run.sh --image dynamo:latest-vllm-local-dev -e CUDA_VISIBLE_DEVICES=0,1 --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
# Production inference without GPU access
./run.sh --image dynamo:latest-vllm --gpus none
# Dry run to see docker command # Dry run to see docker command
./run.sh --dry-run ./run.sh --dry-run
# Development with custom volume mounts # Development with custom volume mounts
./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace ./run.sh --image dynamo:latest-vllm-local-dev -v /host/path:/container/path --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
```
### Network Configuration Options
The `run.sh` script supports different networking modes via the `--network` flag (defaults to `host`):
#### Host Networking (Default)
```bash
# Same examples with local host user permissions
./run.sh --image dynamo:latest-vllm-local-dev --network host -v $HOME/.cache:/home/ubuntu/.cache
./run.sh --image dynamo:latest-vllm-local-dev -v $HOME/.cache:/home/ubuntu/.cache
```
**Use cases:**
- High-performance ML inference (default for GPU workloads)
- Services that need direct host port access
- Maximum network performance with minimal overhead
- Sharing services with the host machine (NATS, etcd, etc.)
**⚠️ Port Sharing Limitation:** Host networking shares all ports with the host machine, which means you can only run **one instance** of services like NATS (port 4222) or etcd (port 2379) across all containers and the host.
#### Bridge Networking (Isolated)
```bash
# CI/testing with isolated bridge networking and host cache sharing
./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/ubuntu/.cache
```
**Use cases:**
- Secure isolation from host network
- CI/CD pipelines requiring complete isolation
- When you need absolute control of ports
- Exposing specific services to host while maintaining isolation
**Note:** For port sharing with the host, use the `--port` or `-p` option with format `host_port:container_port` (e.g., `--port 8000:8000` or `-p 9081:8081`) to expose specific container ports to the host.
#### No Networking ⚠️ **LIMITED FUNCTIONALITY**
```bash
# Complete network isolation - no external connectivity
./run.sh --image dynamo:latest-vllm --network none --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
# Same with local user permissions
./run.sh --image dynamo:latest-vllm-local-dev --network none --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache
``` ```
**⚠️ WARNING: `--network none` severely limits Dynamo functionality:**
- **No model downloads** - HuggingFace models cannot be downloaded
- **No API access** - Cannot reach external APIs or services
- **No distributed inference** - Multi-node setups won't work
- **No monitoring/logging** - External monitoring systems unreachable
- **Limited debugging** - Cannot access external debugging tools
**Very limited use cases:**
- Pre-downloaded models with purely local processing
- Air-gapped security environments (models must be pre-staged)
#### Container Network Sharing
Use `--network container:name` to share the network namespace with another container.
**Use cases:**
- Sidecar patterns (logging, monitoring, caching)
- Service mesh architectures
- Sharing network namespaces between related containers
See Docker documentation for `--network container:name` usage.
#### Custom Networks
Use custom Docker networks for multi-container applications. Create with `docker network create` and specify with `--network network-name`.
**Use cases:**
- Multi-container applications
- Service discovery by container name
See Docker documentation for custom network creation and management.
#### Network Mode Comparison
| Mode | Performance | Security | Use Case | Dynamo Compatibility | Port Sharing | Port Publishing |
|------|-------------|----------|----------|---------------------|---------------|-----------------|
| `host` | Highest | Lower | ML/GPU workloads, high-performance services | ✅ Full | ⚠️ **Shared with host** (one NATS/etcd only) | ❌ Not needed |
| `bridge` | Good | Higher | General web services, controlled port exposure | ✅ Full | ✅ Isolated ports | ✅ `-p host:container` |
| `none` | N/A | Highest | Air-gapped environments only | ⚠️ **Very Limited** | ✅ No network | ❌ No network |
| `container:name` | Good | Medium | Sidecar patterns, shared network stacks | ✅ Full | ⚠️ Shared with target container | ❌ Use target's ports |
| Custom networks | Good | Medium | Multi-container applications | ✅ Full | ✅ Isolated ports | ✅ `-p host:container` |
## Workflow Examples ## Workflow Examples
...@@ -206,14 +282,14 @@ The `run.sh` script launches Docker containers with the appropriate configuratio ...@@ -206,14 +282,14 @@ The `run.sh` script launches Docker containers with the appropriate configuratio
./build.sh --framework vllm --target local-dev ./build.sh --framework vllm --target local-dev
# 2. Run development container using the local-dev image # 2. Run development container using the local-dev image
./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -it ./run.sh --image dynamo:latest-vllm-local-dev --mount-workspace -v $HOME/.cache:/home/ubuntu/.cache -it
# 3. Inside container, run inference (requires both frontend and backend) # 3. Inside container, run inference (requires both frontend and backend)
# Start frontend # Start frontend
python -m dynamo.frontend & python -m dynamo.frontend &
# Start backend (vLLM example) # Start backend (vLLM example)
python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 & python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 &
``` ```
### Production Workflow ### Production Workflow
...@@ -221,15 +297,36 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 & ...@@ -221,15 +297,36 @@ python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.50 &
# 1. Build production image # 1. Build production image
./build.sh --framework vllm --release-build ./build.sh --framework vllm --release-build
# 2. Run production container # 2. Run production container (runs as root)
./run.sh --image dynamo:latest-vllm-local-dev --gpus all ./run.sh --image dynamo:latest-vllm --gpus all
``` ```
### Testing Workflow ### CI/CD Workflow
```bash ```bash
# 1. Build with no cache for clean build # 1. Build image for CI
./build.sh --framework vllm --no-cache ./build.sh --framework vllm --no-cache
# 2. Test container functionality (--image defaults to dynamo:latest-vllm) # 2. Run tests with network isolation for reproducible results
./run.sh --mount-workspace -it -- python -m pytest tests/ ./run.sh --image dynamo:latest-vllm --mount-workspace --network bridge -v $HOME/.cache:/home/ubuntu/.cache -- python -m pytest tests/
# 3. Inside the container with bridge networking, start services
# Note: Services are only accessible from the same container - no port conflicts with host
nats-server -js &
etcd --listen-client-urls http://0.0.0.0:2379 --advertise-client-urls http://0.0.0.0:2379 --data-dir /tmp/etcd &
python -m dynamo.frontend &
# 4. Start worker backend (choose one framework):
# vLLM
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.vllm --model Qwen/Qwen3-0.6B --gpu-memory-utilization 0.20 --enforce-eager --no-enable-prefix-caching --max-num-seqs 64 &
# SGLang
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.sglang --model Qwen/Qwen3-0.6B --mem-fraction-static 0.20 --max-running-requests 64 &
# TensorRT-LLM
DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 python -m dynamo.trtllm --model Qwen/Qwen3-0.6B --free-gpu-memory-fraction 0.20 --max-num-tokens 8192 --max-batch-size 64 &
``` ```
**Framework-Specific GPU Memory Arguments:**
- **vLLM**: `--gpu-memory-utilization 0.20` (use 20% GPU memory), `--enforce-eager` (disable CUDA graphs), `--no-enable-prefix-caching` (save memory), `--max-num-seqs 64` (max concurrent sequences)
- **SGLang**: `--mem-fraction-static 0.20` (20% GPU memory for static allocation), `--max-running-requests 64` (max concurrent requests)
- **TensorRT-LLM**: `--free-gpu-memory-fraction 0.20` (reserve 20% GPU memory), `--max-num-tokens 8192` (max tokens in batch), `--max-batch-size 64` (max batch size)
...@@ -36,6 +36,7 @@ DEFAULT_HF_CACHE=${SOURCE_DIR}/.cache/huggingface ...@@ -36,6 +36,7 @@ DEFAULT_HF_CACHE=${SOURCE_DIR}/.cache/huggingface
GPUS="all" GPUS="all"
PRIVILEGED= PRIVILEGED=
VOLUME_MOUNTS= VOLUME_MOUNTS=
PORT_MAPPINGS=
MOUNT_WORKSPACE= MOUNT_WORKSPACE=
ENVIRONMENT_VARIABLES= ENVIRONMENT_VARIABLES=
REMAINING_ARGS= REMAINING_ARGS=
...@@ -43,6 +44,7 @@ INTERACTIVE= ...@@ -43,6 +44,7 @@ INTERACTIVE=
USE_NIXL_GDS= USE_NIXL_GDS=
RUNTIME=nvidia RUNTIME=nvidia
WORKDIR=/workspace WORKDIR=/workspace
NETWORK=host
get_options() { get_options() {
while :; do while :; do
...@@ -148,6 +150,14 @@ get_options() { ...@@ -148,6 +150,14 @@ get_options() {
missing_requirement "$1" missing_requirement "$1"
fi fi
;; ;;
-p|--port)
if [ "$2" ]; then
PORT_MAPPINGS+=" -p $2 "
shift
else
missing_requirement "$1"
fi
;;
-e) -e)
if [ "$2" ]; then if [ "$2" ]; then
ENVIRONMENT_VARIABLES+=" -e $2 " ENVIRONMENT_VARIABLES+=" -e $2 "
...@@ -165,6 +175,14 @@ get_options() { ...@@ -165,6 +175,14 @@ get_options() {
--use-nixl-gds) --use-nixl-gds)
USE_NIXL_GDS=TRUE USE_NIXL_GDS=TRUE
;; ;;
--network)
if [ "$2" ]; then
NETWORK=$2
shift
else
missing_requirement "$1"
fi
;;
--dry-run) --dry-run)
RUN_PREFIX="echo" RUN_PREFIX="echo"
echo "" echo ""
...@@ -304,7 +322,12 @@ show_help() { ...@@ -304,7 +322,12 @@ show_help() {
echo " [--hf-cache directory to volume mount as the hf cache, default is NONE unless mounting workspace]" echo " [--hf-cache directory to volume mount as the hf cache, default is NONE unless mounting workspace]"
echo " [--gpus gpus to enable, default is 'all', 'none' disables gpu support]" echo " [--gpus gpus to enable, default is 'all', 'none' disables gpu support]"
echo " [--use-nixl-gds add volume mounts and capabilities needed for NVIDIA GPUDirect Storage]" echo " [--use-nixl-gds add volume mounts and capabilities needed for NVIDIA GPUDirect Storage]"
echo " [--network network mode for container, default is 'host']"
echo " Options: 'host' (default), 'bridge', 'none', 'container:name'"
echo " Examples: --network bridge (isolated), --network none (no network - WARNING: breaks most functionality)"
echo " --network container:redis (share network with 'redis' container)"
echo " [-v add volume mount]" echo " [-v add volume mount]"
echo " [-p|--port add port mapping (host_port:container_port)]"
echo " [-e add environment variable]" echo " [-e add environment variable]"
echo " [--mount-workspace set up for local development]" echo " [--mount-workspace set up for local development]"
echo " [-- stop processing and pass remaining args as command to docker run]" echo " [-- stop processing and pass remaining args as command to docker run]"
...@@ -335,7 +358,7 @@ ${RUN_PREFIX} docker run \ ...@@ -335,7 +358,7 @@ ${RUN_PREFIX} docker run \
${GPU_STRING} \ ${GPU_STRING} \
${INTERACTIVE} \ ${INTERACTIVE} \
${RM_STRING} \ ${RM_STRING} \
--network host \ --network "$NETWORK" \
${RUNTIME:+--runtime "$RUNTIME"} \ ${RUNTIME:+--runtime "$RUNTIME"} \
--shm-size=10G \ --shm-size=10G \
--ulimit memlock=-1 \ --ulimit memlock=-1 \
...@@ -343,6 +366,7 @@ ${RUN_PREFIX} docker run \ ...@@ -343,6 +366,7 @@ ${RUN_PREFIX} docker run \
--ulimit nofile=65536:65536 \ --ulimit nofile=65536:65536 \
${ENVIRONMENT_VARIABLES} \ ${ENVIRONMENT_VARIABLES} \
${VOLUME_MOUNTS} \ ${VOLUME_MOUNTS} \
${PORT_MAPPINGS} \
-w "$WORKDIR" \ -w "$WORKDIR" \
--cap-add CAP_SYS_PTRACE \ --cap-add CAP_SYS_PTRACE \
${NIXL_GDS_CAPS} \ ${NIXL_GDS_CAPS} \
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment