Unverified Commit 7702f49f authored by Julien Mancuso's avatar Julien Mancuso Committed by GitHub
Browse files

fix: fix permission issues with volume mounts and trtllm multinode (#4341)


Signed-off-by: default avatarJulien Mancuso <jmancuso@nvidia.com>
parent 2405bac7
...@@ -4,6 +4,8 @@ The Dynamo operator automatically applies default values to various fields when ...@@ -4,6 +4,8 @@ The Dynamo operator automatically applies default values to various fields when
- **Health Probes**: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times. - **Health Probes**: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
- **Security Context**: All components receive `fsGroup: 1000` by default to ensure proper file permissions for mounted volumes. This can be overridden via the `extraPodSpec.securityContext` field.
- **Shared Memory**: All components receive an 8Gi shared memory volume mounted at `/dev/shm` by default (can be disabled or resized via the `sharedMemory` field). - **Shared Memory**: All components receive an 8Gi shared memory volume mounted at `/dev/shm` by default (can be disabled or resized via the `sharedMemory` field).
- **Environment Variables**: Components automatically receive environment variables like `DYN_NAMESPACE`, `DYN_PARENT_DGD_K8S_NAME`, `DYNAMO_PORT`, and backend-specific variables. - **Environment Variables**: Components automatically receive environment variables like `DYN_NAMESPACE`, `DYN_PARENT_DGD_K8S_NAME`, `DYNAMO_PORT`, and backend-specific variables.
...@@ -21,6 +23,50 @@ All components receive the following pod-level defaults unless overridden: ...@@ -21,6 +23,50 @@ All components receive the following pod-level defaults unless overridden:
- **`terminationGracePeriodSeconds`**: `60` seconds - **`terminationGracePeriodSeconds`**: `60` seconds
- **`restartPolicy`**: `Always` - **`restartPolicy`**: `Always`
## Security Context
The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
- **`fsGroup`**: `1000` - Sets the group ownership of mounted volumes and any files created in those volumes
This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The `fsGroup` setting is particularly important for:
- Model downloads and caching
- Compilation cache directories
- Persistent volume claims (PVCs)
- SSH key generation in multinode deployments
### Overriding Security Context
To override the default security context, specify your own `securityContext` in the `extraPodSpec` of your component:
```yaml
services:
YourWorker:
extraPodSpec:
securityContext:
fsGroup: 2000 # Custom group ID
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
```
**Important**: When you provide *any* `securityContext` object in `extraPodSpec`, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting `runAsNonRoot` or setting it to `false`).
### OpenShift and Security Context Constraints
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift's admission controllers to assign them dynamically:
```yaml
services:
YourWorker:
extraPodSpec:
securityContext:
# Omit fsGroup to let OpenShift assign it based on SCC
# OpenShift will inject the appropriate UID range
```
Alternatively, if you want to keep the default `fsGroup: 1000` behavior and are certain your cluster allows it, you don't need to specify anything - the operator defaults will work.
## Shared Memory Configuration ## Shared Memory Configuration
Shared memory is enabled by default for all components: Shared memory is enabled by default for all components:
...@@ -215,7 +261,7 @@ Default container ports are configured based on component type: ...@@ -215,7 +261,7 @@ Default container ports are configured based on component type:
For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files: For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
- **Health Probes & Pod Specifications**: [`internal/dynamo/graph.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/graph.go) - Contains the main logic for applying default probes, environment variables, shared memory, and pod configurations - **Health Probes, Security Context & Pod Specifications**: [`internal/dynamo/graph.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/graph.go) - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
- **Component-Specific Defaults**: - **Component-Specific Defaults**:
- [`internal/dynamo/component_frontend.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_frontend.go) - [`internal/dynamo/component_frontend.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_frontend.go)
- [`internal/dynamo/component_worker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_worker.go) - [`internal/dynamo/component_worker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_worker.go)
...@@ -231,5 +277,6 @@ For users who want to understand the implementation details or contribute to the ...@@ -231,5 +277,6 @@ For users who want to understand the implementation details or contribute to the
- All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources - All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
- User-specified probes (via `livenessProbe`, `readinessProbe`, or `startupProbe` fields) take precedence over operator defaults - User-specified probes (via `livenessProbe`, `readinessProbe`, or `startupProbe` fields) take precedence over operator defaults
- For security context, if you provide *any* `securityContext` in `extraPodSpec`, no defaults will be injected, giving you full control
- For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns - For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
- The `extraPodSpec.mainContainer` field can be used to override probe configurations set by the operator - The `extraPodSpec.mainContainer` field can be used to override probe configurations set by the operator
...@@ -24,6 +24,11 @@ const ( ...@@ -24,6 +24,11 @@ const (
MpiRunSshPort = 2222 MpiRunSshPort = 2222
// Default security context values
// These provide secure defaults for running containers as non-root
// Users can override these via extraPodSpec.securityContext in their DynamoGraphDeployment
DefaultSecurityContextFSGroup = 1000
EnvDynamoServicePort = "DYNAMO_PORT" EnvDynamoServicePort = "DYNAMO_PORT"
KubeLabelDynamoSelector = "nvidia.com/selector" KubeLabelDynamoSelector = "nvidia.com/selector"
......
...@@ -802,6 +802,9 @@ func TestDynamoComponentDeploymentReconciler_generateLeaderWorkerSet(t *testing. ...@@ -802,6 +802,9 @@ func TestDynamoComponentDeploymentReconciler_generateLeaderWorkerSet(t *testing.
Spec: corev1.PodSpec{ Spec: corev1.PodSpec{
SchedulerName: "volcano", SchedulerName: "volcano",
TerminationGracePeriodSeconds: ptr.To(int64(10)), TerminationGracePeriodSeconds: ptr.To(int64(10)),
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "shared-memory", Name: "shared-memory",
...@@ -927,6 +930,9 @@ func TestDynamoComponentDeploymentReconciler_generateLeaderWorkerSet(t *testing. ...@@ -927,6 +930,9 @@ func TestDynamoComponentDeploymentReconciler_generateLeaderWorkerSet(t *testing.
Spec: corev1.PodSpec{ Spec: corev1.PodSpec{
TerminationGracePeriodSeconds: ptr.To(int64(10)), TerminationGracePeriodSeconds: ptr.To(int64(10)),
SchedulerName: "volcano", SchedulerName: "volcano",
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "shared-memory", Name: "shared-memory",
......
...@@ -181,7 +181,6 @@ func (b *TRTLLMBackend) setupWorkerContainer(container *corev1.Container) { ...@@ -181,7 +181,6 @@ func (b *TRTLLMBackend) setupWorkerContainer(container *corev1.Container) {
"ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N ''", "ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N ''",
// Create SSH daemon config to use custom host keys location and non-privileged port // Create SSH daemon config to use custom host keys location and non-privileged port
fmt.Sprintf("printf 'Port %d\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config", commonconsts.MpiRunSshPort), fmt.Sprintf("printf 'Port %d\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config", commonconsts.MpiRunSshPort),
"mkdir -p /run/sshd",
"/usr/sbin/sshd -D -f ~/.ssh/sshd_config", "/usr/sbin/sshd -D -f ~/.ssh/sshd_config",
} }
......
...@@ -81,7 +81,7 @@ func TestTRTLLMBackend_UpdateContainer(t *testing.T) { ...@@ -81,7 +81,7 @@ func TestTRTLLMBackend_UpdateContainer(t *testing.T) {
{Name: mpiRunSecretName, MountPath: "/ssh-pk", ReadOnly: true}, {Name: mpiRunSecretName, MountPath: "/ssh-pk", ReadOnly: true},
}, },
expectedCommand: []string{"/bin/sh", "-c"}, expectedCommand: []string{"/bin/sh", "-c"},
expectedArgs: []string{"mkdir -p ~/.ssh ~/.ssh/host_keys ~/.ssh/run && ls -la /ssh-pk/ && cp /ssh-pk/private.key ~/.ssh/id_rsa && cp /ssh-pk/private.key.pub ~/.ssh/id_rsa.pub && cp /ssh-pk/private.key.pub ~/.ssh/authorized_keys && chmod 600 ~/.ssh/id_rsa ~/.ssh/authorized_keys && chmod 644 ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys && printf 'Host *\\nIdentityFile ~/.ssh/id_rsa\\nStrictHostKeyChecking no\\nPort 2222\\n' > ~/.ssh/config && ssh-keygen -t rsa -f ~/.ssh/host_keys/ssh_host_rsa_key -N '' && ssh-keygen -t ecdsa -f ~/.ssh/host_keys/ssh_host_ecdsa_key -N '' && ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N '' && printf 'Port 2222\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config && mkdir -p /run/sshd && /usr/sbin/sshd -D -f ~/.ssh/sshd_config"}, expectedArgs: []string{"mkdir -p ~/.ssh ~/.ssh/host_keys ~/.ssh/run && ls -la /ssh-pk/ && cp /ssh-pk/private.key ~/.ssh/id_rsa && cp /ssh-pk/private.key.pub ~/.ssh/id_rsa.pub && cp /ssh-pk/private.key.pub ~/.ssh/authorized_keys && chmod 600 ~/.ssh/id_rsa ~/.ssh/authorized_keys && chmod 644 ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys && printf 'Host *\\nIdentityFile ~/.ssh/id_rsa\\nStrictHostKeyChecking no\\nPort 2222\\n' > ~/.ssh/config && ssh-keygen -t rsa -f ~/.ssh/host_keys/ssh_host_rsa_key -N '' && ssh-keygen -t ecdsa -f ~/.ssh/host_keys/ssh_host_ecdsa_key -N '' && ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N '' && printf 'Port 2222\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config && /usr/sbin/sshd -D -f ~/.ssh/sshd_config"},
expectedEnv: []corev1.EnvVar{ expectedEnv: []corev1.EnvVar{
{Name: "OMPI_MCA_orte_keep_fqdn_hostnames", Value: "1"}, {Name: "OMPI_MCA_orte_keep_fqdn_hostnames", Value: "1"},
}, },
...@@ -730,13 +730,13 @@ func TestTRTLLMBackend_setupWorkerContainer(t *testing.T) { ...@@ -730,13 +730,13 @@ func TestTRTLLMBackend_setupWorkerContainer(t *testing.T) {
name: "Worker setup with initial args", name: "Worker setup with initial args",
initialArgs: []string{"some", "args"}, initialArgs: []string{"some", "args"},
initialCommand: []string{}, initialCommand: []string{},
expected: "mkdir -p ~/.ssh ~/.ssh/host_keys ~/.ssh/run && ls -la /ssh-pk/ && cp /ssh-pk/private.key ~/.ssh/id_rsa && cp /ssh-pk/private.key.pub ~/.ssh/id_rsa.pub && cp /ssh-pk/private.key.pub ~/.ssh/authorized_keys && chmod 600 ~/.ssh/id_rsa ~/.ssh/authorized_keys && chmod 644 ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys && printf 'Host *\\nIdentityFile ~/.ssh/id_rsa\\nStrictHostKeyChecking no\\nPort 2222\\n' > ~/.ssh/config && ssh-keygen -t rsa -f ~/.ssh/host_keys/ssh_host_rsa_key -N '' && ssh-keygen -t ecdsa -f ~/.ssh/host_keys/ssh_host_ecdsa_key -N '' && ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N '' && printf 'Port 2222\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config && mkdir -p /run/sshd && /usr/sbin/sshd -D -f ~/.ssh/sshd_config", expected: "mkdir -p ~/.ssh ~/.ssh/host_keys ~/.ssh/run && ls -la /ssh-pk/ && cp /ssh-pk/private.key ~/.ssh/id_rsa && cp /ssh-pk/private.key.pub ~/.ssh/id_rsa.pub && cp /ssh-pk/private.key.pub ~/.ssh/authorized_keys && chmod 600 ~/.ssh/id_rsa ~/.ssh/authorized_keys && chmod 644 ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys && printf 'Host *\\nIdentityFile ~/.ssh/id_rsa\\nStrictHostKeyChecking no\\nPort 2222\\n' > ~/.ssh/config && ssh-keygen -t rsa -f ~/.ssh/host_keys/ssh_host_rsa_key -N '' && ssh-keygen -t ecdsa -f ~/.ssh/host_keys/ssh_host_ecdsa_key -N '' && ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N '' && printf 'Port 2222\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config && /usr/sbin/sshd -D -f ~/.ssh/sshd_config",
}, },
{ {
name: "Worker setup with initial command", name: "Worker setup with initial command",
initialArgs: []string{}, initialArgs: []string{},
initialCommand: []string{"original", "command"}, initialCommand: []string{"original", "command"},
expected: "mkdir -p ~/.ssh ~/.ssh/host_keys ~/.ssh/run && ls -la /ssh-pk/ && cp /ssh-pk/private.key ~/.ssh/id_rsa && cp /ssh-pk/private.key.pub ~/.ssh/id_rsa.pub && cp /ssh-pk/private.key.pub ~/.ssh/authorized_keys && chmod 600 ~/.ssh/id_rsa ~/.ssh/authorized_keys && chmod 644 ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys && printf 'Host *\\nIdentityFile ~/.ssh/id_rsa\\nStrictHostKeyChecking no\\nPort 2222\\n' > ~/.ssh/config && ssh-keygen -t rsa -f ~/.ssh/host_keys/ssh_host_rsa_key -N '' && ssh-keygen -t ecdsa -f ~/.ssh/host_keys/ssh_host_ecdsa_key -N '' && ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N '' && printf 'Port 2222\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config && mkdir -p /run/sshd && /usr/sbin/sshd -D -f ~/.ssh/sshd_config", expected: "mkdir -p ~/.ssh ~/.ssh/host_keys ~/.ssh/run && ls -la /ssh-pk/ && cp /ssh-pk/private.key ~/.ssh/id_rsa && cp /ssh-pk/private.key.pub ~/.ssh/id_rsa.pub && cp /ssh-pk/private.key.pub ~/.ssh/authorized_keys && chmod 600 ~/.ssh/id_rsa ~/.ssh/authorized_keys && chmod 644 ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys && printf 'Host *\\nIdentityFile ~/.ssh/id_rsa\\nStrictHostKeyChecking no\\nPort 2222\\n' > ~/.ssh/config && ssh-keygen -t rsa -f ~/.ssh/host_keys/ssh_host_rsa_key -N '' && ssh-keygen -t ecdsa -f ~/.ssh/host_keys/ssh_host_ecdsa_key -N '' && ssh-keygen -t ed25519 -f ~/.ssh/host_keys/ssh_host_ed25519_key -N '' && printf 'Port 2222\\nHostKey ~/.ssh/host_keys/ssh_host_rsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ecdsa_key\\nHostKey ~/.ssh/host_keys/ssh_host_ed25519_key\\nPidFile ~/.ssh/run/sshd.pid\\nPermitRootLogin yes\\nPasswordAuthentication no\\nPubkeyAuthentication yes\\nAuthorizedKeysFile ~/.ssh/authorized_keys\\n' > ~/.ssh/sshd_config && /usr/sbin/sshd -D -f ~/.ssh/sshd_config",
}, },
} }
......
...@@ -697,6 +697,25 @@ func addStandardEnvVars(container *corev1.Container, controllerConfig controller ...@@ -697,6 +697,25 @@ func addStandardEnvVars(container *corev1.Container, controllerConfig controller
container.Env = MergeEnvs(standardEnvVars, container.Env) container.Env = MergeEnvs(standardEnvVars, container.Env)
} }
// applyDefaultSecurityContext sets secure defaults for pod security context.
// Currently only sets fsGroup to solve volume permission issues.
// Does NOT set runAsUser/runAsGroup/runAsNonRoot to maintain backward compatibility
// with images that may expect to run as root.
// User-provided security context values (via extraPodSpec) will override these defaults.
func applyDefaultSecurityContext(podSpec *corev1.PodSpec) {
// Initialize SecurityContext if not present
if podSpec.SecurityContext == nil {
podSpec.SecurityContext = &corev1.PodSecurityContext{}
}
// Only set fsGroup by default
// This fixes volume permission issues without forcing a specific UID/GID
// which maintains compatibility with both root and non-root images
if podSpec.SecurityContext.FSGroup == nil {
podSpec.SecurityContext.FSGroup = ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup))
}
}
// GenerateBasePodSpec creates a basic PodSpec with common logic shared between controller and grove // GenerateBasePodSpec creates a basic PodSpec with common logic shared between controller and grove
// Includes standard environment variables (DYNAMO_PORT, NATS_SERVER, ETCD_ENDPOINTS) // Includes standard environment variables (DYNAMO_PORT, NATS_SERVER, ETCD_ENDPOINTS)
// Deployment-specific environment merging should be handled by the caller // Deployment-specific environment merging should be handled by the caller
...@@ -856,6 +875,11 @@ func GenerateBasePodSpec( ...@@ -856,6 +875,11 @@ func GenerateBasePodSpec(
return nil, fmt.Errorf("failed to get base podspec: %w", err) return nil, fmt.Errorf("failed to get base podspec: %w", err)
} }
// Check if user provided their own security context before merging
userProvidedSecurityContext := component.ExtraPodSpec != nil &&
component.ExtraPodSpec.PodSpec != nil &&
component.ExtraPodSpec.PodSpec.SecurityContext != nil
if component.ExtraPodSpec != nil && component.ExtraPodSpec.PodSpec != nil { if component.ExtraPodSpec != nil && component.ExtraPodSpec.PodSpec != nil {
// merge extraPodSpec PodSpec with base podspec // merge extraPodSpec PodSpec with base podspec
err := mergo.Merge(&podSpec, component.ExtraPodSpec.PodSpec.DeepCopy(), mergo.WithOverride) err := mergo.Merge(&podSpec, component.ExtraPodSpec.PodSpec.DeepCopy(), mergo.WithOverride)
...@@ -864,6 +888,13 @@ func GenerateBasePodSpec( ...@@ -864,6 +888,13 @@ func GenerateBasePodSpec(
} }
} }
// Apply default security context ONLY if user didn't provide any security context
// If user provides ANY securityContext (even partial), they get full control with no defaults injected
// This allows users to intentionally set fields to nil (e.g., to run as root)
if !userProvidedSecurityContext {
applyDefaultSecurityContext(&podSpec)
}
if controllerConfig.IsK8sDiscoveryEnabled(component.Annotations) { if controllerConfig.IsK8sDiscoveryEnabled(component.Annotations) {
podSpec.ServiceAccountName = discovery.GetK8sDiscoveryServiceAccountName(parentGraphDeploymentName) podSpec.ServiceAccountName = discovery.GetK8sDiscoveryServiceAccountName(parentGraphDeploymentName)
} }
......
...@@ -1337,6 +1337,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -1337,6 +1337,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
}, },
}, },
TerminationGracePeriodSeconds: ptr.To(int64(10)), TerminationGracePeriodSeconds: ptr.To(int64(10)),
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
ImagePullSecrets: []corev1.LocalObjectReference{ ImagePullSecrets: []corev1.LocalObjectReference{
{ {
Name: "frontend-secret", Name: "frontend-secret",
...@@ -1512,7 +1515,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -1512,7 +1515,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
}, },
ServiceAccountName: commonconsts.PlannerServiceAccountName, ServiceAccountName: commonconsts.PlannerServiceAccountName,
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
RestartPolicy: corev1.RestartPolicyAlways, SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
RestartPolicy: corev1.RestartPolicyAlways,
Containers: []corev1.Container{ Containers: []corev1.Container{
{ {
...@@ -1896,6 +1902,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -1896,6 +1902,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
PodSpec: corev1.PodSpec{ PodSpec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyAlways, RestartPolicy: corev1.RestartPolicyAlways,
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "shared-memory", Name: "shared-memory",
...@@ -2069,6 +2078,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -2069,6 +2078,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
PodSpec: corev1.PodSpec{ PodSpec: corev1.PodSpec{
RestartPolicy: corev1.RestartPolicyAlways, RestartPolicy: corev1.RestartPolicyAlways,
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "shared-memory", Name: "shared-memory",
...@@ -2214,7 +2226,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -2214,7 +2226,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
}, },
}, },
TerminationGracePeriodSeconds: ptr.To(int64(10)), TerminationGracePeriodSeconds: ptr.To(int64(10)),
RestartPolicy: corev1.RestartPolicyAlways, SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
RestartPolicy: corev1.RestartPolicyAlways,
Containers: []corev1.Container{ Containers: []corev1.Container{
{ {
Name: commonconsts.MainContainerName, Name: commonconsts.MainContainerName,
...@@ -2357,7 +2372,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -2357,7 +2372,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
PodSpec: corev1.PodSpec{ PodSpec: corev1.PodSpec{
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
ServiceAccountName: commonconsts.PlannerServiceAccountName, ServiceAccountName: commonconsts.PlannerServiceAccountName,
RestartPolicy: corev1.RestartPolicyAlways, SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
RestartPolicy: corev1.RestartPolicyAlways,
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "dynamo-pvc", Name: "dynamo-pvc",
...@@ -2791,7 +2809,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -2791,7 +2809,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
}, },
}, },
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
RestartPolicy: corev1.RestartPolicyAlways, SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
RestartPolicy: corev1.RestartPolicyAlways,
Containers: []corev1.Container{ Containers: []corev1.Container{
{ {
Name: commonconsts.MainContainerName, Name: commonconsts.MainContainerName,
...@@ -2940,6 +2961,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -2940,6 +2961,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
// StartsAfter: []string{"worker-ldr"}, // StartsAfter: []string{"worker-ldr"},
PodSpec: corev1.PodSpec{ PodSpec: corev1.PodSpec{
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "shared-memory", Name: "shared-memory",
...@@ -3086,7 +3110,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -3086,7 +3110,10 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
}, },
}, },
TerminationGracePeriodSeconds: ptr.To(int64(10)), TerminationGracePeriodSeconds: ptr.To(int64(10)),
RestartPolicy: corev1.RestartPolicyAlways, SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
RestartPolicy: corev1.RestartPolicyAlways,
Containers: []corev1.Container{ Containers: []corev1.Container{
{ {
Name: commonconsts.MainContainerName, Name: commonconsts.MainContainerName,
...@@ -3229,6 +3256,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) { ...@@ -3229,6 +3256,9 @@ func TestGenerateGrovePodCliqueSet(t *testing.T) {
PodSpec: corev1.PodSpec{ PodSpec: corev1.PodSpec{
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
ServiceAccountName: commonconsts.PlannerServiceAccountName, ServiceAccountName: commonconsts.PlannerServiceAccountName,
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "dynamo-pvc", Name: "dynamo-pvc",
...@@ -5017,6 +5047,10 @@ func TestGenerateBasePodSpec_Worker(t *testing.T) { ...@@ -5017,6 +5047,10 @@ func TestGenerateBasePodSpec_Worker(t *testing.T) {
}, },
RestartPolicy: corev1.RestartPolicyAlways, RestartPolicy: corev1.RestartPolicyAlways,
TerminationGracePeriodSeconds: ptr.To(int64(60)), TerminationGracePeriodSeconds: ptr.To(int64(60)),
SecurityContext: &corev1.PodSecurityContext{
// Only fsGroup is injected by default for volume permissions
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
Volumes: []corev1.Volume{ Volumes: []corev1.Volume{
{ {
Name: "shared-memory", Name: "shared-memory",
...@@ -5637,3 +5671,152 @@ func TestGenerateBasePodSpec_UseAsCompilationCache_BackendSupport(t *testing.T) ...@@ -5637,3 +5671,152 @@ func TestGenerateBasePodSpec_UseAsCompilationCache_BackendSupport(t *testing.T)
}) })
} }
} }
func TestGenerateBasePodSpec_SecurityContext(t *testing.T) {
secretsRetriever := &mockSecretsRetriever{}
controllerConfig := controller_common.Config{}
tests := []struct {
name string
component *v1alpha1.DynamoComponentDeploymentSharedSpec
expectedSecurityContext *corev1.PodSecurityContext
description string
}{
{
name: "no security context provided - should apply fsGroup default only",
component: &v1alpha1.DynamoComponentDeploymentSharedSpec{
ComponentType: commonconsts.ComponentTypeFrontend,
},
expectedSecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(commonconsts.DefaultSecurityContextFSGroup)),
},
description: "Operator should only inject fsGroup for volume permissions, not UID/GID (backward compatible)",
},
{
name: "full security context override - should use user values",
component: &v1alpha1.DynamoComponentDeploymentSharedSpec{
ComponentType: commonconsts.ComponentTypeFrontend,
ExtraPodSpec: &common.ExtraPodSpec{
PodSpec: &corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
RunAsNonRoot: ptr.To(true),
RunAsUser: ptr.To(int64(5000)),
RunAsGroup: ptr.To(int64(5000)),
FSGroup: ptr.To(int64(5000)),
},
},
},
},
expectedSecurityContext: &corev1.PodSecurityContext{
RunAsNonRoot: ptr.To(true),
RunAsUser: ptr.To(int64(5000)),
RunAsGroup: ptr.To(int64(5000)),
FSGroup: ptr.To(int64(5000)),
},
description: "User-provided security context should completely override defaults",
},
{
name: "partial security context override - user gets full control",
component: &v1alpha1.DynamoComponentDeploymentSharedSpec{
ComponentType: commonconsts.ComponentTypeFrontend,
ExtraPodSpec: &common.ExtraPodSpec{
PodSpec: &corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
RunAsUser: ptr.To(int64(2000)),
RunAsGroup: ptr.To(int64(3000)),
},
},
},
},
expectedSecurityContext: &corev1.PodSecurityContext{
RunAsUser: ptr.To(int64(2000)),
RunAsGroup: ptr.To(int64(3000)),
},
description: "Partial user override gets full control - no defaults injected",
},
{
name: "only fsGroup override - user gets full control",
component: &v1alpha1.DynamoComponentDeploymentSharedSpec{
ComponentType: commonconsts.ComponentTypeFrontend,
ExtraPodSpec: &common.ExtraPodSpec{
PodSpec: &corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(7000)),
},
},
},
},
expectedSecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(7000)),
},
description: "Only fsGroup override - user gets full control, no defaults injected",
},
{
name: "fsGroup 2000 example - exactly what user requested",
component: &v1alpha1.DynamoComponentDeploymentSharedSpec{
ComponentType: commonconsts.ComponentTypeFrontend,
ExtraPodSpec: &common.ExtraPodSpec{
PodSpec: &corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(2000)),
},
},
},
},
expectedSecurityContext: &corev1.PodSecurityContext{
FSGroup: ptr.To(int64(2000)),
},
description: "User sets fsGroup:2000, gets ONLY that - critical for allowing root users",
},
{
name: "OpenShift-style namespace range - should use user values",
component: &v1alpha1.DynamoComponentDeploymentSharedSpec{
ComponentType: commonconsts.ComponentTypeFrontend,
ExtraPodSpec: &common.ExtraPodSpec{
PodSpec: &corev1.PodSpec{
SecurityContext: &corev1.PodSecurityContext{
RunAsNonRoot: ptr.To(true),
RunAsUser: ptr.To(int64(1000700001)),
RunAsGroup: ptr.To(int64(1000700001)),
FSGroup: ptr.To(int64(1000700001)),
},
},
},
},
expectedSecurityContext: &corev1.PodSecurityContext{
RunAsNonRoot: ptr.To(true),
RunAsUser: ptr.To(int64(1000700001)),
RunAsGroup: ptr.To(int64(1000700001)),
FSGroup: ptr.To(int64(1000700001)),
},
description: "OpenShift namespace UID/GID ranges should be respected",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
podSpec, err := GenerateBasePodSpec(
tt.component,
BackendFrameworkNoop,
secretsRetriever,
"test-deployment",
"default",
RoleMain,
1,
controllerConfig,
commonconsts.MultinodeDeploymentTypeGrove,
"test-service",
)
if err != nil {
t.Errorf("GenerateBasePodSpec() unexpected error: %v", err)
return
}
// Compare the entire SecurityContext using cmp.Diff
if diff := cmp.Diff(tt.expectedSecurityContext, podSpec.SecurityContext); diff != "" {
t.Errorf("GenerateBasePodSpec() SecurityContext mismatch (-want +got):\n%s\nDescription: %s", diff, tt.description)
}
})
}
}
...@@ -599,6 +599,8 @@ The Dynamo operator automatically applies default values to various fields when ...@@ -599,6 +599,8 @@ The Dynamo operator automatically applies default values to various fields when
- **Health Probes**: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times. - **Health Probes**: Startup, liveness, and readiness probes are configured differently for frontend, worker, and planner components. For example, worker components receive a startup probe with a 2-hour timeout (720 failures × 10 seconds) to accommodate long model loading times.
- **Security Context**: All components receive `fsGroup: 1000` by default to ensure proper file permissions for mounted volumes. This can be overridden via the `extraPodSpec.securityContext` field.
- **Shared Memory**: All components receive an 8Gi shared memory volume mounted at `/dev/shm` by default (can be disabled or resized via the `sharedMemory` field). - **Shared Memory**: All components receive an 8Gi shared memory volume mounted at `/dev/shm` by default (can be disabled or resized via the `sharedMemory` field).
- **Environment Variables**: Components automatically receive environment variables like `DYN_NAMESPACE`, `DYN_PARENT_DGD_K8S_NAME`, `DYNAMO_PORT`, and backend-specific variables. - **Environment Variables**: Components automatically receive environment variables like `DYN_NAMESPACE`, `DYN_PARENT_DGD_K8S_NAME`, `DYNAMO_PORT`, and backend-specific variables.
...@@ -616,6 +618,50 @@ All components receive the following pod-level defaults unless overridden: ...@@ -616,6 +618,50 @@ All components receive the following pod-level defaults unless overridden:
- **`terminationGracePeriodSeconds`**: `60` seconds - **`terminationGracePeriodSeconds`**: `60` seconds
- **`restartPolicy`**: `Always` - **`restartPolicy`**: `Always`
## Security Context
The operator automatically applies default security context settings to all components to ensure proper file permissions, particularly for mounted volumes:
- **`fsGroup`**: `1000` - Sets the group ownership of mounted volumes and any files created in those volumes
This default ensures that non-root containers can write to mounted volumes (like model caches or persistent storage) without permission issues. The `fsGroup` setting is particularly important for:
- Model downloads and caching
- Compilation cache directories
- Persistent volume claims (PVCs)
- SSH key generation in multinode deployments
### Overriding Security Context
To override the default security context, specify your own `securityContext` in the `extraPodSpec` of your component:
```yaml
services:
YourWorker:
extraPodSpec:
securityContext:
fsGroup: 2000 # Custom group ID
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
```
**Important**: When you provide *any* `securityContext` object in `extraPodSpec`, the operator will not inject any defaults. This gives you complete control over the security context, including the ability to run as root (by omitting `runAsNonRoot` or setting it to `false`).
### OpenShift and Security Context Constraints
In OpenShift environments with Security Context Constraints (SCCs), you may need to omit explicit UID/GID values to allow OpenShift's admission controllers to assign them dynamically:
```yaml
services:
YourWorker:
extraPodSpec:
securityContext:
# Omit fsGroup to let OpenShift assign it based on SCC
# OpenShift will inject the appropriate UID range
```
Alternatively, if you want to keep the default `fsGroup: 1000` behavior and are certain your cluster allows it, you don't need to specify anything - the operator defaults will work.
## Shared Memory Configuration ## Shared Memory Configuration
Shared memory is enabled by default for all components: Shared memory is enabled by default for all components:
...@@ -810,7 +856,7 @@ Default container ports are configured based on component type: ...@@ -810,7 +856,7 @@ Default container ports are configured based on component type:
For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files: For users who want to understand the implementation details or contribute to the operator, the default values described in this document are set in the following source files:
- **Health Probes & Pod Specifications**: [`internal/dynamo/graph.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/graph.go) - Contains the main logic for applying default probes, environment variables, shared memory, and pod configurations - **Health Probes, Security Context & Pod Specifications**: [`internal/dynamo/graph.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/graph.go) - Contains the main logic for applying default probes, security context, environment variables, shared memory, and pod configurations
- **Component-Specific Defaults**: - **Component-Specific Defaults**:
- [`internal/dynamo/component_frontend.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_frontend.go) - [`internal/dynamo/component_frontend.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_frontend.go)
- [`internal/dynamo/component_worker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_worker.go) - [`internal/dynamo/component_worker.go`](https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/internal/dynamo/component_worker.go)
...@@ -826,5 +872,6 @@ For users who want to understand the implementation details or contribute to the ...@@ -826,5 +872,6 @@ For users who want to understand the implementation details or contribute to the
- All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources - All these defaults can be overridden by explicitly specifying values in your DynamoComponentDeployment or DynamoGraphDeployment resources
- User-specified probes (via `livenessProbe`, `readinessProbe`, or `startupProbe` fields) take precedence over operator defaults - User-specified probes (via `livenessProbe`, `readinessProbe`, or `startupProbe` fields) take precedence over operator defaults
- For security context, if you provide *any* `securityContext` in `extraPodSpec`, no defaults will be injected, giving you full control
- For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns - For multinode deployments, some defaults are modified or removed as described above to accommodate distributed execution patterns
- The `extraPodSpec.mainContainer` field can be used to override probe configurations set by the operator - The `extraPodSpec.mainContainer` field can be used to override probe configurations set by the operator
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment