"...git@developer.sourcefind.cn:2222/OpenDAS/vllm_cscc.git" did not exist on "145ac73317e0a255cd22ab5f4ac346124800be41"
Unverified Commit 12e144ae authored by atchernych's avatar atchernych Committed by GitHub
Browse files

feat: mesh support fixes [DEP-854] (#8270)


Signed-off-by: default avatarAnna Tchernych <atchernych@nvidia.com>
parent 5671d8e3
...@@ -150,6 +150,11 @@ The chart includes built-in validation to prevent all operator conflicts: ...@@ -150,6 +150,11 @@ The chart includes built-in validation to prevent all operator conflicts:
| dynamo-operator.dynamo.istio.gateway | string | `nil` | Istio gateway name for routing | | dynamo-operator.dynamo.istio.gateway | string | `nil` | Istio gateway name for routing |
| dynamo-operator.dynamo.ingressHostSuffix | string | `""` | Host suffix for generated ingress hostnames | | dynamo-operator.dynamo.ingressHostSuffix | string | `""` | Host suffix for generated ingress hostnames |
| dynamo-operator.dynamo.virtualServiceSupportsHTTPS | bool | `false` | Whether VirtualServices should support HTTPS routing | | dynamo-operator.dynamo.virtualServiceSupportsHTTPS | bool | `false` | Whether VirtualServices should support HTTPS routing |
| dynamo-operator.dynamo.serviceMesh.enabled | bool | `false` | Whether to enable service mesh resource generation for EPP |
| dynamo-operator.dynamo.serviceMesh.provider | string | `"istio"` | Service mesh provider. Supported: "istio" |
| dynamo-operator.dynamo.serviceMesh.istio | object | `{"insecureSkipVerify":true,"tlsMode":"SIMPLE"}` | Istio-specific settings (only used when provider is "istio") |
| dynamo-operator.dynamo.serviceMesh.istio.tlsMode | string | `"SIMPLE"` | TLS mode for DestinationRules: "SIMPLE", "DISABLE", "ISTIO_MUTUAL", "MUTUAL" |
| dynamo-operator.dynamo.serviceMesh.istio.insecureSkipVerify | bool | `true` | Skip TLS certificate verification (for self-signed EPP certs) |
| dynamo-operator.dynamo.metrics.prometheusEndpoint | string | `""` | Endpoint that services can use to retrieve metrics. If set, dynamo operator will automatically inject the PROMETHEUS_ENDPOINT environment variable into services it manages. Users can override the value of the PROMETHEUS_ENDPOINT environment variable by modifying the corresponding deployment's environment variables | | dynamo-operator.dynamo.metrics.prometheusEndpoint | string | `""` | Endpoint that services can use to retrieve metrics. If set, dynamo operator will automatically inject the PROMETHEUS_ENDPOINT environment variable into services it manages. Users can override the value of the PROMETHEUS_ENDPOINT environment variable by modifying the corresponding deployment's environment variables |
| dynamo-operator.dynamo.mpiRun.secretName | string | `"mpi-run-ssh-secret"` | Name of the secret containing the SSH key for MPI Run | | dynamo-operator.dynamo.mpiRun.secretName | string | `"mpi-run-ssh-secret"` | Name of the secret containing the SSH key for MPI Run |
| dynamo-operator.webhook.certificateSecret.name | string | `"webhook-server-cert"` | Name of the Kubernetes secret containing webhook TLS certificates. The secret must contain three keys: tls.crt (server certificate), tls.key (server private key), and ca.crt (Certificate Authority certificate). | | dynamo-operator.webhook.certificateSecret.name | string | `"webhook-server-cert"` | Name of the Kubernetes secret containing webhook TLS certificates. The secret must contain three keys: tls.crt (server certificate), tls.key (server private key), and ca.crt (Certificate Authority certificate). |
......
...@@ -316,10 +316,11 @@ rules: ...@@ -316,10 +316,11 @@ rules:
- patch - patch
- update - update
- watch - watch
{{- if .Values.istioVirtualServiceEnabled }} {{- if or .Values.istioVirtualServiceEnabled (and (hasKey .Values.dynamo "serviceMesh") .Values.dynamo.serviceMesh .Values.dynamo.serviceMesh.enabled) }}
- apiGroups: - apiGroups:
- networking.istio.io - networking.istio.io
resources: resources:
- destinationrules
- virtualservices - virtualservices
verbs: verbs:
- create - create
......
...@@ -118,6 +118,16 @@ data: ...@@ -118,6 +118,16 @@ data:
hostSuffix: {{ $ingressHostSuffix | quote }} hostSuffix: {{ $ingressHostSuffix | quote }}
{{- end }} {{- end }}
{{- end }} {{- end }}
{{- if and (hasKey .Values.dynamo "serviceMesh") .Values.dynamo.serviceMesh .Values.dynamo.serviceMesh.enabled }}
serviceMesh:
provider: {{ .Values.dynamo.serviceMesh.provider | default "istio" | quote }}
{{- if eq (.Values.dynamo.serviceMesh.provider | default "istio") "istio" }}
istio:
{{- $istio := .Values.dynamo.serviceMesh.istio | default dict }}
tlsMode: {{ $istio.tlsMode | default "SIMPLE" | quote }}
insecureSkipVerify: {{ $istio.insecureSkipVerify | default true }}
{{- end }}
{{- end }}
{{- if not .Values.namespaceRestriction.enabled }} {{- if not .Values.namespaceRestriction.enabled }}
rbac: rbac:
plannerClusterRoleName: {{ include "dynamo-operator.fullname" . }}-planner plannerClusterRoleName: {{ include "dynamo-operator.fullname" . }}-planner
......
...@@ -167,6 +167,21 @@ dynamo-operator: ...@@ -167,6 +167,21 @@ dynamo-operator:
# -- Whether VirtualServices should support HTTPS routing # -- Whether VirtualServices should support HTTPS routing
virtualServiceSupportsHTTPS: false virtualServiceSupportsHTTPS: false
# Service mesh integration for EPP components.
# When enabled, the operator generates mesh-specific resources (e.g., Istio
# DestinationRules) so sidecar proxies connect correctly to EPP.
serviceMesh:
# -- Whether to enable service mesh resource generation for EPP
enabled: false
# -- Service mesh provider. Supported: "istio"
provider: "istio"
# -- Istio-specific settings (only used when provider is "istio")
istio:
# -- TLS mode for DestinationRules: "SIMPLE", "DISABLE", "ISTIO_MUTUAL", "MUTUAL"
tlsMode: "SIMPLE"
# -- Skip TLS certificate verification (for self-signed EPP certs)
insecureSkipVerify: true
# Metrics configuration # Metrics configuration
metrics: metrics:
# -- Endpoint that services can use to retrieve metrics. If set, dynamo operator will automatically inject the PROMETHEUS_ENDPOINT environment variable into services it manages. Users can override the value of the PROMETHEUS_ENDPOINT environment variable by modifying the corresponding deployment's environment variables # -- Endpoint that services can use to retrieve metrics. If set, dynamo operator will automatically inject the PROMETHEUS_ENDPOINT environment variable into services it manages. Users can override the value of the PROMETHEUS_ENDPOINT environment variable by modifying the corresponding deployment's environment variables
......
...@@ -85,6 +85,19 @@ func SetDefaultsOperatorConfiguration(obj *OperatorConfiguration) { ...@@ -85,6 +85,19 @@ func SetDefaultsOperatorConfiguration(obj *OperatorConfiguration) {
obj.GPU.DiscoveryEnabled = ptr.To(true) obj.GPU.DiscoveryEnabled = ptr.To(true)
} }
// ServiceMesh defaults
if ServiceMeshProvider(obj.ServiceMesh.Provider) == ServiceMeshProviderIstio && obj.ServiceMesh.Istio == nil {
obj.ServiceMesh.Istio = &IstioMeshConfiguration{}
}
if obj.ServiceMesh.Istio != nil {
if obj.ServiceMesh.Istio.TLSMode == "" {
obj.ServiceMesh.Istio.TLSMode = "SIMPLE"
}
if obj.ServiceMesh.Istio.InsecureSkipVerify == nil {
obj.ServiceMesh.Istio.InsecureSkipVerify = ptr.To(true)
}
}
// Logging defaults // Logging defaults
if obj.Logging.Level == "" { if obj.Logging.Level == "" {
obj.Logging.Level = "info" obj.Logging.Level = "info"
......
...@@ -56,6 +56,10 @@ type OperatorConfiguration struct { ...@@ -56,6 +56,10 @@ type OperatorConfiguration struct {
// Ingress configuration // Ingress configuration
Ingress IngressConfiguration `json:"ingress"` Ingress IngressConfiguration `json:"ingress"`
// ServiceMesh configures automatic generation of service-mesh resources
// (e.g., Istio DestinationRules) for EPP components.
ServiceMesh ServiceMeshConfiguration `json:"serviceMesh"`
// RBAC configuration for cross-namespace resource management (cluster-wide mode) // RBAC configuration for cross-namespace resource management (cluster-wide mode)
RBAC RBACConfiguration `json:"rbac"` RBAC RBACConfiguration `json:"rbac"`
...@@ -244,6 +248,41 @@ func (i *IngressConfiguration) UseVirtualService() bool { ...@@ -244,6 +248,41 @@ func (i *IngressConfiguration) UseVirtualService() bool {
return i.VirtualServiceGateway != "" return i.VirtualServiceGateway != ""
} }
// ServiceMeshProvider enumerates the supported service mesh implementations.
type ServiceMeshProvider string
const (
// ServiceMeshProviderIstio selects Istio as the service mesh.
ServiceMeshProviderIstio ServiceMeshProvider = "istio"
)
// ServiceMeshConfiguration holds service mesh integration settings.
// The operator uses this to generate mesh-specific resources (e.g., Istio
// DestinationRules) for EPP components so that sidecar proxies connect
// correctly without double-TLS issues.
type ServiceMeshConfiguration struct {
// Provider selects the service mesh implementation. Supported: "istio", "".
// Empty string disables service mesh resource generation.
Provider string `json:"provider"`
// Istio holds Istio-specific settings. Only used when Provider is "istio".
Istio *IstioMeshConfiguration `json:"istio,omitempty"`
}
// IsEnabled returns true if a supported service mesh provider is configured.
func (s *ServiceMeshConfiguration) IsEnabled() bool {
return ServiceMeshProvider(s.Provider) == ServiceMeshProviderIstio
}
// IstioMeshConfiguration holds Istio-specific mesh settings.
type IstioMeshConfiguration struct {
// TLSMode is the Istio TLS mode for DestinationRules (e.g., "DISABLE", "SIMPLE", "ISTIO_MUTUAL").
// Defaults to "SIMPLE".
TLSMode string `json:"tlsMode"`
// InsecureSkipVerify skips TLS certificate verification in DestinationRules.
// Defaults to true (matching upstream GAIE behavior with self-signed certs).
InsecureSkipVerify *bool `json:"insecureSkipVerify,omitempty"`
}
// RBACConfiguration holds RBAC settings for cluster-wide mode. // RBACConfiguration holds RBAC settings for cluster-wide mode.
type RBACConfiguration struct { type RBACConfiguration struct {
// PlannerClusterRoleName is the ClusterRole for planner // PlannerClusterRoleName is the ClusterRole for planner
......
...@@ -226,6 +226,26 @@ func (in *IngressConfiguration) DeepCopy() *IngressConfiguration { ...@@ -226,6 +226,26 @@ func (in *IngressConfiguration) DeepCopy() *IngressConfiguration {
return out return out
} }
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *IstioMeshConfiguration) DeepCopyInto(out *IstioMeshConfiguration) {
*out = *in
if in.InsecureSkipVerify != nil {
in, out := &in.InsecureSkipVerify, &out.InsecureSkipVerify
*out = new(bool)
**out = **in
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new IstioMeshConfiguration.
func (in *IstioMeshConfiguration) DeepCopy() *IstioMeshConfiguration {
if in == nil {
return nil
}
out := new(IstioMeshConfiguration)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *KaiSchedulerConfiguration) DeepCopyInto(out *KaiSchedulerConfiguration) { func (in *KaiSchedulerConfiguration) DeepCopyInto(out *KaiSchedulerConfiguration) {
*out = *in *out = *in
...@@ -376,6 +396,7 @@ func (in *OperatorConfiguration) DeepCopyInto(out *OperatorConfiguration) { ...@@ -376,6 +396,7 @@ func (in *OperatorConfiguration) DeepCopyInto(out *OperatorConfiguration) {
in.DRA.DeepCopyInto(&out.DRA) in.DRA.DeepCopyInto(&out.DRA)
out.Infrastructure = in.Infrastructure out.Infrastructure = in.Infrastructure
out.Ingress = in.Ingress out.Ingress = in.Ingress
in.ServiceMesh.DeepCopyInto(&out.ServiceMesh)
out.RBAC = in.RBAC out.RBAC = in.RBAC
out.MPI = in.MPI out.MPI = in.MPI
out.Checkpoint = in.Checkpoint out.Checkpoint = in.Checkpoint
...@@ -484,6 +505,26 @@ func (in *ServerConfiguration) DeepCopy() *ServerConfiguration { ...@@ -484,6 +505,26 @@ func (in *ServerConfiguration) DeepCopy() *ServerConfiguration {
return out return out
} }
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *ServiceMeshConfiguration) DeepCopyInto(out *ServiceMeshConfiguration) {
*out = *in
if in.Istio != nil {
in, out := &in.Istio, &out.Istio
*out = new(IstioMeshConfiguration)
(*in).DeepCopyInto(*out)
}
}
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new ServiceMeshConfiguration.
func (in *ServiceMeshConfiguration) DeepCopy() *ServiceMeshConfiguration {
if in == nil {
return nil
}
out := new(ServiceMeshConfiguration)
in.DeepCopyInto(out)
return out
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. // DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil.
func (in *WebhookServer) DeepCopyInto(out *WebhookServer) { func (in *WebhookServer) DeepCopyInto(out *WebhookServer) {
*out = *in *out = *in
......
...@@ -463,12 +463,16 @@ func main() { ...@@ -463,12 +463,16 @@ func main() {
runtimeConfig.DRAEnabled = false runtimeConfig.DRAEnabled = false
} }
setupLog.Info("Detecting Istio availability...")
runtimeConfig.IstioAvailable = commonController.DetectIstioAvailability(mainCtx, mgr)
setupLog.Info("Detected orchestrators availability", setupLog.Info("Detected orchestrators availability",
"grove", runtimeConfig.GroveEnabled, "grove", runtimeConfig.GroveEnabled,
"lws", runtimeConfig.LWSEnabled, "lws", runtimeConfig.LWSEnabled,
"volcano", volcanoDetected, "volcano", volcanoDetected,
"kai-scheduler", runtimeConfig.KaiSchedulerEnabled, "kai-scheduler", runtimeConfig.KaiSchedulerEnabled,
"dra", runtimeConfig.DRAEnabled, "dra", runtimeConfig.DRAEnabled,
"istio", runtimeConfig.IstioAvailable,
) )
dockerSecretRetriever := secrets.NewDockerSecretIndexer(mgr.GetClient()) dockerSecretRetriever := secrets.NewDockerSecretIndexer(mgr.GetClient())
......
...@@ -186,6 +186,7 @@ rules: ...@@ -186,6 +186,7 @@ rules:
- apiGroups: - apiGroups:
- networking.istio.io - networking.istio.io
resources: resources:
- destinationrules
- virtualservices - virtualservices
verbs: verbs:
- create - create
......
...@@ -94,6 +94,7 @@ type DynamoGraphDeploymentReconciler struct { ...@@ -94,6 +94,7 @@ type DynamoGraphDeploymentReconciler struct {
// +kubebuilder:rbac:groups=grove.io,resources=clustertopologies,verbs=get;list;watch // +kubebuilder:rbac:groups=grove.io,resources=clustertopologies,verbs=get;list;watch
// +kubebuilder:rbac:groups=scheduling.run.ai,resources=queues,verbs=get;list // +kubebuilder:rbac:groups=scheduling.run.ai,resources=queues,verbs=get;list
// +kubebuilder:rbac:groups=inference.networking.k8s.io,resources=inferencepools,verbs=get;list;watch;create;update;patch;delete // +kubebuilder:rbac:groups=inference.networking.k8s.io,resources=inferencepools,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=networking.istio.io,resources=destinationrules,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=resource.k8s.io,resources=resourceclaimtemplates,verbs=get;list;watch;create;update;patch;delete // +kubebuilder:rbac:groups=resource.k8s.io,resources=resourceclaimtemplates,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=resource.k8s.io,resources=deviceclasses,verbs=get;list;watch // +kubebuilder:rbac:groups=resource.k8s.io,resources=deviceclasses,verbs=get;list;watch
// +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch // +kubebuilder:rbac:groups=core,resources=pods,verbs=get;list;watch
...@@ -1626,6 +1627,27 @@ func (r *DynamoGraphDeploymentReconciler) reconcileEPPResources(ctx context.Cont ...@@ -1626,6 +1627,27 @@ func (r *DynamoGraphDeploymentReconciler) reconcileEPPResources(ctx context.Cont
return fmt.Errorf("failed to sync EPP InferencePool: %w", err) return fmt.Errorf("failed to sync EPP InferencePool: %w", err)
} }
// 3. Reconcile service mesh resources (e.g., Istio DestinationRule).
// Only attempt DestinationRule reconciliation when the Istio CRDs are
// present on the cluster; otherwise the API call would fail on every
// reconcile for Istio-less clusters.
if r.RuntimeConfig.IstioAvailable {
meshEnabled := r.Config.ServiceMesh.IsEnabled()
destinationRule := dynamo.GenerateEPPDestinationRule(eppServiceName, dgd.Namespace, r.Config.ServiceMesh)
_, _, err = commoncontroller.SyncResource(ctx, r, dgd, func(ctx context.Context) (*networkingv1beta1.DestinationRule, bool, error) {
return destinationRule, !meshEnabled, nil
})
if err != nil {
logger.Error(err, "Failed to sync EPP DestinationRule")
return fmt.Errorf("failed to sync EPP DestinationRule: %w", err)
}
if meshEnabled {
logger.Info("Synced EPP DestinationRule", "name", eppServiceName)
}
} else if r.Config.ServiceMesh.IsEnabled() {
logger.Error(nil, "Service mesh is enabled but networking.istio.io CRDs are not installed; skipping DestinationRule reconciliation")
}
logger.Info("Successfully reconciled EPP resources", "poolName", inferencePool.GetName()) logger.Info("Successfully reconciled EPP resources", "poolName", inferencePool.GetName())
return nil return nil
} }
...@@ -1679,6 +1701,14 @@ func (r *DynamoGraphDeploymentReconciler) SetupWithManager(mgr ctrl.Manager) err ...@@ -1679,6 +1701,14 @@ func (r *DynamoGraphDeploymentReconciler) SetupWithManager(mgr ctrl.Manager) err
GenericFunc: func(ge event.GenericEvent) bool { return true }, GenericFunc: func(ge event.GenericEvent) bool { return true },
})). })).
WithEventFilter(commoncontroller.EphemeralDeploymentEventFilter(r.Config, r.RuntimeConfig)) WithEventFilter(commoncontroller.EphemeralDeploymentEventFilter(r.Config, r.RuntimeConfig))
if r.RuntimeConfig.IstioAvailable {
ctrlBuilder = ctrlBuilder.Owns(&networkingv1beta1.DestinationRule{}, builder.WithPredicates(predicate.Funcs{
CreateFunc: func(ce event.CreateEvent) bool { return false },
DeleteFunc: func(de event.DeleteEvent) bool { return true },
UpdateFunc: func(de event.UpdateEvent) bool { return true },
GenericFunc: func(ge event.GenericEvent) bool { return false },
}))
}
if r.RuntimeConfig.GroveEnabled { if r.RuntimeConfig.GroveEnabled {
ctrlBuilder = ctrlBuilder.Owns(&grovev1alpha1.PodCliqueSet{}, builder.WithPredicates(predicate.Funcs{ ctrlBuilder = ctrlBuilder.Owns(&grovev1alpha1.PodCliqueSet{}, builder.WithPredicates(predicate.Funcs{
// ignore creation cause we don't want to be called again after we create the pod gang set // ignore creation cause we don't want to be called again after we create the pod gang set
......
...@@ -68,6 +68,13 @@ func DetectDRAAvailability(ctx context.Context, mgr ctrl.Manager) bool { ...@@ -68,6 +68,13 @@ func DetectDRAAvailability(ctx context.Context, mgr ctrl.Manager) bool {
return detectAPIGroupAvailability(ctx, mgr, "resource.k8s.io") return detectAPIGroupAvailability(ctx, mgr, "resource.k8s.io")
} }
// DetectIstioAvailability checks if Istio is available by checking if the
// networking.istio.io API group is registered. Used to guard DestinationRule
// reconciliation so the operator doesn't error on clusters without Istio CRDs.
func DetectIstioAvailability(ctx context.Context, mgr ctrl.Manager) bool {
return detectAPIGroupAvailability(ctx, mgr, "networking.istio.io")
}
// detectAPIGroupAvailability checks if a specific API group is registered in the cluster // detectAPIGroupAvailability checks if a specific API group is registered in the cluster
func detectAPIGroupAvailability(ctx context.Context, mgr ctrl.Manager, groupName string) bool { func detectAPIGroupAvailability(ctx context.Context, mgr ctrl.Manager, groupName string) bool {
logger := log.FromContext(ctx) logger := log.FromContext(ctx)
......
...@@ -28,6 +28,10 @@ type RuntimeConfig struct { ...@@ -28,6 +28,10 @@ type RuntimeConfig struct {
KaiSchedulerEnabled bool KaiSchedulerEnabled bool
// DRAEnabled indicates whether Dynamic Resource Allocation (resource.k8s.io) is available // DRAEnabled indicates whether Dynamic Resource Allocation (resource.k8s.io) is available
DRAEnabled bool DRAEnabled bool
// IstioAvailable indicates whether the networking.istio.io CRDs are installed.
// When false the operator skips DestinationRule reconciliation to avoid errors
// on clusters without Istio.
IstioAvailable bool
// ExcludedNamespaces for cluster-wide mode namespace filtering // ExcludedNamespaces for cluster-wide mode namespace filtering
ExcludedNamespaces ExcludedNamespacesInterface ExcludedNamespaces ExcludedNamespacesInterface
} }
...@@ -26,12 +26,6 @@ import ( ...@@ -26,12 +26,6 @@ import (
"sort" "sort"
"strings" "strings"
istioNetworking "istio.io/api/networking/v1beta1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/utils/ptr"
configv1alpha1 "github.com/ai-dynamo/dynamo/deploy/operator/api/config/v1alpha1" configv1alpha1 "github.com/ai-dynamo/dynamo/deploy/operator/api/config/v1alpha1"
"github.com/ai-dynamo/dynamo/deploy/operator/api/v1alpha1" "github.com/ai-dynamo/dynamo/deploy/operator/api/v1alpha1"
"github.com/ai-dynamo/dynamo/deploy/operator/internal/checkpoint" "github.com/ai-dynamo/dynamo/deploy/operator/internal/checkpoint"
...@@ -42,9 +36,14 @@ import ( ...@@ -42,9 +36,14 @@ import (
gms "github.com/ai-dynamo/dynamo/deploy/operator/internal/gms" gms "github.com/ai-dynamo/dynamo/deploy/operator/internal/gms"
grovev1alpha1 "github.com/ai-dynamo/grove/operator/api/core/v1alpha1" grovev1alpha1 "github.com/ai-dynamo/grove/operator/api/core/v1alpha1"
"github.com/imdario/mergo" "github.com/imdario/mergo"
"google.golang.org/protobuf/types/known/wrapperspb"
istioNetworking "istio.io/api/networking/v1beta1"
networkingv1beta1 "istio.io/client-go/pkg/apis/networking/v1beta1" networkingv1beta1 "istio.io/client-go/pkg/apis/networking/v1beta1"
corev1 "k8s.io/api/core/v1" corev1 "k8s.io/api/core/v1"
networkingv1 "k8s.io/api/networking/v1" networkingv1 "k8s.io/api/networking/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/intstr"
"k8s.io/utils/ptr"
ctrlclient "sigs.k8s.io/controller-runtime/pkg/client" ctrlclient "sigs.k8s.io/controller-runtime/pkg/client"
) )
...@@ -811,6 +810,52 @@ func GenerateComponentVirtualService(ctx context.Context, componentName, compone ...@@ -811,6 +810,52 @@ func GenerateComponentVirtualService(ctx context.Context, componentName, compone
return vs return vs
} }
// GenerateEPPDestinationRule builds an Istio DestinationRule for an EPP service.
// This tells the mesh sidecar how to connect to the EPP's gRPC endpoint,
// avoiding double-TLS issues when the EPP serves TLS (SecureServing=true).
func GenerateEPPDestinationRule(serviceName, namespace string, meshConfig configv1alpha1.ServiceMeshConfiguration) *networkingv1beta1.DestinationRule {
// Normalize the service name the same way GenerateComponentService does
// so the DestinationRule host matches the actual Service DNS name.
normalizedName := strings.ReplaceAll(serviceName, ".", "-")
dr := &networkingv1beta1.DestinationRule{
ObjectMeta: metav1.ObjectMeta{
Name: normalizedName,
Namespace: namespace,
},
}
if !meshConfig.IsEnabled() || meshConfig.Istio == nil {
return dr
}
tlsMode := istioNetworking.ClientTLSSettings_SIMPLE
switch meshConfig.Istio.TLSMode {
case "DISABLE":
tlsMode = istioNetworking.ClientTLSSettings_DISABLE
case "ISTIO_MUTUAL":
tlsMode = istioNetworking.ClientTLSSettings_ISTIO_MUTUAL
case "MUTUAL":
tlsMode = istioNetworking.ClientTLSSettings_MUTUAL
}
skipVerify := true
if meshConfig.Istio.InsecureSkipVerify != nil {
skipVerify = *meshConfig.Istio.InsecureSkipVerify
}
dr.Spec = istioNetworking.DestinationRule{
Host: fmt.Sprintf("%s.%s.svc.cluster.local", normalizedName, namespace),
TrafficPolicy: &istioNetworking.TrafficPolicy{
Tls: &istioNetworking.ClientTLSSettings{
Mode: tlsMode,
InsecureSkipVerify: wrapperspb.Bool(skipVerify),
},
},
}
return dr
}
func GenerateDefaultIngressSpec(dynamoDeployment *v1alpha1.DynamoGraphDeployment, ingressConfig configv1alpha1.IngressConfiguration) v1alpha1.IngressSpec { func GenerateDefaultIngressSpec(dynamoDeployment *v1alpha1.DynamoGraphDeployment, ingressConfig configv1alpha1.IngressConfiguration) v1alpha1.IngressSpec {
res := v1alpha1.IngressSpec{ res := v1alpha1.IngressSpec{
Enabled: ingressConfig.VirtualServiceGateway != "" || ingressConfig.ControllerClassName != "", Enabled: ingressConfig.VirtualServiceGateway != "" || ingressConfig.ControllerClassName != "",
......
...@@ -2008,6 +2008,23 @@ _Appears in:_ ...@@ -2008,6 +2008,23 @@ _Appears in:_
| `hostSuffix` _string_ | HostSuffix is the suffix for ingress hostnames | | | | `hostSuffix` _string_ | HostSuffix is the suffix for ingress hostnames | | |
#### IstioMeshConfiguration
IstioMeshConfiguration holds Istio-specific mesh settings.
_Appears in:_
- [ServiceMeshConfiguration](#servicemeshconfiguration)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `tlsMode` _string_ | TLSMode is the Istio TLS mode for DestinationRules (e.g., "DISABLE", "SIMPLE", "ISTIO_MUTUAL").<br />Defaults to "SIMPLE". | | |
| `insecureSkipVerify` _boolean_ | InsecureSkipVerify skips TLS certificate verification in DestinationRules.<br />Defaults to true (matching upstream GAIE behavior with self-signed certs). | | |
#### KaiSchedulerConfiguration #### KaiSchedulerConfiguration
...@@ -2168,6 +2185,7 @@ OperatorConfiguration is the Schema for the operator configuration. ...@@ -2168,6 +2185,7 @@ OperatorConfiguration is the Schema for the operator configuration.
| `dra` _[DRAConfiguration](#draconfiguration)_ | DRA (Dynamic Resource Allocation) settings with optional override | | | | `dra` _[DRAConfiguration](#draconfiguration)_ | DRA (Dynamic Resource Allocation) settings with optional override | | |
| `infrastructure` _[InfrastructureConfiguration](#infrastructureconfiguration)_ | Service mesh and infrastructure addresses | | | | `infrastructure` _[InfrastructureConfiguration](#infrastructureconfiguration)_ | Service mesh and infrastructure addresses | | |
| `ingress` _[IngressConfiguration](#ingressconfiguration)_ | Ingress configuration | | | | `ingress` _[IngressConfiguration](#ingressconfiguration)_ | Ingress configuration | | |
| `serviceMesh` _[ServiceMeshConfiguration](#servicemeshconfiguration)_ | ServiceMesh configures automatic generation of service-mesh resources<br />(e.g., Istio DestinationRules) for EPP components. | | |
| `rbac` _[RBACConfiguration](#rbacconfiguration)_ | RBAC configuration for cross-namespace resource management (cluster-wide mode) | | | | `rbac` _[RBACConfiguration](#rbacconfiguration)_ | RBAC configuration for cross-namespace resource management (cluster-wide mode) | | |
| `mpi` _[MPIConfiguration](#mpiconfiguration)_ | MPI SSH secret configuration | | | | `mpi` _[MPIConfiguration](#mpiconfiguration)_ | MPI SSH secret configuration | | |
| `checkpoint` _[CheckpointConfiguration](#checkpointconfiguration)_ | Checkpoint/restore configuration | | | | `checkpoint` _[CheckpointConfiguration](#checkpointconfiguration)_ | Checkpoint/restore configuration | | |
...@@ -2266,6 +2284,28 @@ _Appears in:_ ...@@ -2266,6 +2284,28 @@ _Appears in:_
| `webhook` _[WebhookServer](#webhookserver)_ | Webhook server configuration | \{ certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 \} | | | `webhook` _[WebhookServer](#webhookserver)_ | Webhook server configuration | \{ certDir:/tmp/k8s-webhook-server/serving-certs host:0.0.0.0 port:9443 \} | |
#### ServiceMeshConfiguration
ServiceMeshConfiguration holds service mesh integration settings.
The operator uses this to generate mesh-specific resources (e.g., Istio
DestinationRules) for EPP components so that sidecar proxies connect
correctly without double-TLS issues.
_Appears in:_
- [OperatorConfiguration](#operatorconfiguration)
| Field | Description | Default | Validation |
| --- | --- | --- | --- |
| `provider` _string_ | Provider selects the service mesh implementation. Supported: "istio", "".<br />Empty string disables service mesh resource generation. | | |
| `istio` _[IstioMeshConfiguration](#istiomeshconfiguration)_ | Istio holds Istio-specific settings. Only used when Provider is "istio". | | |
#### WebhookServer #### WebhookServer
......
...@@ -269,6 +269,58 @@ To disable the EPP from listening for KV events (e.g., when prefix caching is of ...@@ -269,6 +269,58 @@ To disable the EPP from listening for KV events (e.g., when prefix caching is of
Stand-Alone installation only: Stand-Alone installation only:
- Overwrite the `DYN_NAMESPACE` env var if needed to match your model's dynamo namespace. - Overwrite the `DYN_NAMESPACE` env var if needed to match your model's dynamo namespace.
**Service Mesh Integration (Istio)**
When running under a service mesh such as Istio, the mesh sidecar proxy may conflict with the EPP's own TLS serving, causing connection failures (double-TLS). To avoid this, the mesh must be told how to connect to the EPP service via an Istio `DestinationRule`.
The Dynamo operator can generate this DestinationRule automatically. Enable it by setting the `dynamo.serviceMesh` parameters when installing or upgrading the Dynamo platform Helm chart:
```bash
helm install dynamo deploy/helm/charts/platform \
--set dynamo.serviceMesh.enabled=true
```
Or equivalently in a custom values file:
```yaml
dynamo:
serviceMesh:
enabled: true
provider: "istio"
istio:
tlsMode: "SIMPLE"
insecureSkipVerify: true
```
**Helm Parameters**
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `dynamo.serviceMesh.enabled` | bool | `false` | Enable automatic DestinationRule generation for EPP services. |
| `dynamo.serviceMesh.provider` | string | `"istio"` | Service mesh provider. Only `"istio"` is supported. |
| `dynamo.serviceMesh.istio.tlsMode` | string | `"SIMPLE"` | TLS mode for the DestinationRule. Supported values: `DISABLE`, `SIMPLE`, `MUTUAL`, `ISTIO_MUTUAL`. |
| `dynamo.serviceMesh.istio.insecureSkipVerify` | bool | `true` | Skip TLS certificate verification. Set to `true` when EPP uses self-signed certificates (the default). |
> [!NOTE]
> The Istio CRDs (`networking.istio.io`) must be installed on the cluster before enabling this feature. The operator detects Istio availability at startup — if the CRDs are not present, DestinationRule reconciliation is skipped even when `serviceMesh.enabled` is `true`.
When enabled, the operator produces a `DestinationRule` for each EPP service equivalent to:
```yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: <epp-service-name>
spec:
host: <epp-service-name>.<namespace>.svc.cluster.local
trafficPolicy:
tls:
mode: SIMPLE
insecureSkipVerify: true
```
If you are **not** using the Dynamo operator's Helm chart, you must create this `DestinationRule` manually for each EPP service. Without it, Istio's default mTLS policy will conflict with the EPP's gRPC TLS endpoint.
### 6. Verify Installation ### ### 6. Verify Installation ###
Check that all resources are properly deployed: Check that all resources are properly deployed:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment