Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
dynamo
Commits
ec50db06
Unverified
Commit
ec50db06
authored
Sep 26, 2025
by
Tzu-Ling Kan
Committed by
GitHub
Sep 26, 2025
Browse files
feat: Add sglang k8 FT tests (#3227)
Signed-off-by:
tzulingk@nvidia.com
<
tzulingk@nvidia.com
>
parent
d4f0d2bc
Changes
4
Show whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
305 additions
and
130 deletions
+305
-130
tests/fault_tolerance/deploy/README.md
tests/fault_tolerance/deploy/README.md
+27
-22
tests/fault_tolerance/deploy/scenarios.py
tests/fault_tolerance/deploy/scenarios.py
+173
-76
tests/fault_tolerance/deploy/test_deployment.py
tests/fault_tolerance/deploy/test_deployment.py
+12
-1
tests/utils/managed_deployment.py
tests/utils/managed_deployment.py
+93
-31
No files found.
tests/fault_tolerance/deploy/README.md
View file @
ec50db06
...
@@ -71,15 +71,16 @@ The test suite is organized around three core components: **Deployments**, **Cli
...
@@ -71,15 +71,16 @@ The test suite is organized around three core components: **Deployments**, **Cli
Deployments represent specific graphs that are deployed using the Dynamo Kubernetes Platform.
Deployments represent specific graphs that are deployed using the Dynamo Kubernetes Platform.
The following deployment configurations are defined in
`
scenarios
.py`
:
Below are some representative examples of the generated
scenarios:
|
Deployment Name | Description
|
|
Example Scenario Name | Backend | Type | TP | DP | Description
|
|-------------------------|-----------------------------------------------------------------------------|
|-------------------------
----------------------
|---------
|
--------
|
----
|----|-
--------------------------------------------------------|
|
`agg-tp-1-dp-1`
| Aggregated worker with 1 replica for each service (frontend, decode).
|
|
`
vllm-
agg-tp-1-dp-1`
| vllm | agg | 1 | 1 | Basic aggregated worker.
|
|
`agg-tp-1-dp-2`
| Aggregated worker with 2 replicas for each service (frontend, decode).
|
|
`
vllm-
agg-tp-1-dp-2`
| vllm | agg | 1 | 2 | Aggregated worker with Data Parallelism.
|
|
`
dis
agg-tp-
1
-dp-1`
| Disaggregated deployment with 1 replica for each service (frontend, decode, prefill).
|
|
`
sglang-
agg-tp-
4
-dp-1`
| sglang | agg | 4 | 1 | Aggregated SGLang worker with Tensor Parallelism.
|
|
`
disagg
-tp-
1
-d
p-2`
| Disaggregated deployment with 2 replicas for each service (frontend, decode, prefill).
|
|
`
sglang-disagg-prefill
-tp-
2
-d
ecode-tp-2-dp-1`
| sglang | disagg | 2 | 1 | Disaggregated SGLang workers with Tensor Parallelism.
|
The full test matrix is generated from these parameters, creating comprehensive test coverage across all configurations.
#### Client Load
#### Client Load
...
@@ -95,26 +96,30 @@ sending signals to specified processes.
...
@@ -95,26 +96,30 @@ sending signals to specified processes.
The following failure types are defined in
`scenarios.py`
:
The following failure types are defined in
`scenarios.py`
:
| Failure Name | Description | Injection Method |
| Failure Name | Description | Injection Method | Applicable Backends |
|--------------------------|-----------------------------------------------------------------------------|--------------------------------------------|
|-------------------------------|----------------------------------------------------|-------------------------------|---------------------|
|
`none`
| No failure injection. | N/A |
|
`none`
| No failure injection (baseline). | N/A | All |
|
`frontend`
| Terminate frontend process/pod. |
`SIGINT`
signal to
`dynamo.frontend`
. |
|
`frontend`
| Terminate frontend process. |
`SIGINT`
to
`dynamo.frontend`
| All |
|
`frontend_pod`
| Delete frontend pod. | Kubernetes API pod deletion. |
|
`frontend_pod`
| Delete frontend pod. | Kubernetes API pod deletion | All |
|
`decode_worker`
| Terminate decode worker process/pod. |
`SIGINT`
signal to
`dynamo.vllm`
|
|
`decode_worker`
| Terminate decode worker process. |
`SIGKILL`
to
`dynamo.<backend>`
| All |
|
`decode_worker_pod`
| Delete decode worker pod. | Kubernetes API pod deletion. |
|
`decode_worker_pod`
| Delete decode worker pod. | Kubernetes API pod deletion | All |
|
`prefill_worker`
| Terminate prefill worker process/pod. |
`SIGINT`
signal to
`dynamo.vllm`
|
|
`prefill_worker`
| Terminate prefill worker process. |
`SIGKILL`
to
`dynamo.<backend>`
| All |
|
`prefill_worker_pod`
| Delete prefill worker pod. | Kubernetes API pod deletion. |
|
`prefill_worker_pod`
| Delete prefill worker pod. | Kubernetes API pod deletion | All |
|
`vllm_decode_engine_core`
| Terminate VLLM decode engine core process. |
`SIGKILL`
signal to
`VLLM::EngineCore`
|
|
`vllm_decode_engine_core`
| Terminate VLLM decode engine core process. |
`SIGKILL`
to
`VLLM::EngineCore`
| vllm only |
|
`vllm_prefill_engine_core`
| Terminate VLLM prefill engine core process. |
`SIGKILL`
signal to
`VLLM::EngineCore`
|
|
`vllm_prefill_engine_core`
| Terminate VLLM prefill engine core process. |
`SIGKILL`
to
`VLLM::EngineCore`
| vllm only |
|
`sglang_decode_scheduler`
| Terminate SGLang decode scheduler process. |
`SIGKILL`
to
`sglang::scheduler`
| sglang only |
|
`sglang_decode_detokenizer`
| Terminate SGLang decode detokenizer process. |
`SIGKILL`
to
`sglang::detokenizer`
| sglang only |
|
`sglang_prefill_scheduler`
| Terminate SGLang prefill scheduler process. |
`SIGKILL`
to
`sglang::scheduler`
| sglang only |
|
`sglang_prefill_detokenizer`
| Terminate SGLang prefill detokenizer process. |
`SIGKILL`
to
`sglang::detokenizer`
| sglang only |
#### Example Scenario Breakdown
#### Example Scenario Breakdown
**Scenario**
:
`agg-tp-2-dp-1-decode_worker`
**Scenario**
:
`
sglang-
agg-tp-2-dp-1-decode_worker`
-
**Deployment**
: Aggregation with 1 decoder worker replica (
`agg-tp-2-dp-1`
).
-
**Backend**
:
`sglang`
-
**Deployment**
: Aggregation with 1 decoder worker replica, using 2 GPUs for tensor parallelism (
`agg-tp-2-dp-1`
).
-
**Client Load**
: 10 clients, 100 requests each, max request rate 1/sec.
-
**Client Load**
: 10 clients, 100 requests each, max request rate 1/sec.
-
**Failure**
: Terminates 1 decoder worker process
1
0 seconds into the test.
-
**Failure**
: Terminates 1 decoder worker process
3
0 seconds into the test.
#### Example Scenario Execution:
#### Example Scenario Execution:
...
...
tests/fault_tolerance/deploy/scenarios.py
View file @
ec50db06
...
@@ -18,6 +18,17 @@ from typing import Optional
...
@@ -18,6 +18,17 @@ from typing import Optional
from
tests.utils.managed_deployment
import
DeploymentSpec
from
tests.utils.managed_deployment
import
DeploymentSpec
WORKER_MAP
=
{
"vllm"
:
{
"decode"
:
"VllmDecodeWorker"
,
"prefill"
:
"VllmPrefillWorker"
,
},
"sglang"
:
{
"decode"
:
"decode"
,
"prefill"
:
"prefill"
,
},
}
@
dataclass
@
dataclass
class
Load
:
class
Load
:
...
@@ -45,66 +56,104 @@ class Scenario:
...
@@ -45,66 +56,104 @@ class Scenario:
load
:
Load
load
:
Load
failures
:
list
[
Failure
]
failures
:
list
[
Failure
]
model
:
Optional
[
str
]
=
None
model
:
Optional
[
str
]
=
None
backend
:
str
=
"vllm"
# Backend type for tracking
# Helper functions to create deployment specs
def
_create_deployment_spec
(
backend
,
deploy_type
,
yaml_path
):
"""Create a deployment spec with backend information."""
return
{
"spec"
:
DeploymentSpec
(
yaml_path
),
"backend"
:
backend
}
def
_set_replicas
(
deployment_spec
,
backend
,
deploy_type
,
replicas
):
"""Set replicas for all components in a deployment based on backend type."""
spec
=
deployment_spec
[
"spec"
]
# Frontend is common for all backends
spec
[
"Frontend"
].
replicas
=
replicas
if
backend
in
WORKER_MAP
:
# always scale decode
spec
[
WORKER_MAP
[
backend
][
"decode"
]].
replicas
=
replicas
# scale prefill only for disagg
if
deploy_type
==
"disagg"
:
spec
[
WORKER_MAP
[
backend
][
"prefill"
]].
replicas
=
replicas
def
_set_tensor_parallel
(
deployment_spec
,
backend
,
deploy_type
,
tp_size
):
"""Set tensor parallel size for worker components."""
spec
=
deployment_spec
[
"spec"
]
if
backend
in
WORKER_MAP
:
decode_worker
=
WORKER_MAP
[
backend
][
"decode"
]
prefill_worker
=
WORKER_MAP
[
backend
][
"prefill"
]
if
deploy_type
==
"agg"
:
if
hasattr
(
spec
,
"set_tensor_parallel"
):
spec
.
set_tensor_parallel
(
tp_size
,
[
decode_worker
])
else
:
spec
[
decode_worker
].
tensor_parallel_size
=
tp_size
elif
deploy_type
==
"disagg"
:
spec
[
prefill_worker
].
tensor_parallel_size
=
tp_size
spec
[
decode_worker
].
tensor_parallel_size
=
tp_size
def
_create_deployments_for_backend
(
backend
):
"""Create all deployment specifications for a given backend."""
deployments
=
{}
# Define the yaml files for agg and disagg deployments
yaml_files
=
{
"agg"
:
f
"components/backends/
{
backend
}
/deploy/agg.yaml"
,
"disagg"
:
f
"components/backends/
{
backend
}
/deploy/disagg.yaml"
,
}
# Define the different configurations to test
configurations
=
[
{
"tp"
:
1
,
"dp"
:
1
},
{
"tp"
:
1
,
"dp"
:
2
},
{
"tp"
:
2
,
"dp"
:
1
},
{
"tp"
:
4
,
"dp"
:
1
},
]
for
deploy_type
in
[
"agg"
,
"disagg"
]:
for
config
in
configurations
:
tp_size
=
config
[
"tp"
]
dp_replicas
=
config
[
"dp"
]
# Skip creating disagg scenarios for TP > 1 if DP is also > 1 (uncommon case)
if
deploy_type
==
"disagg"
and
tp_size
>
1
and
dp_replicas
>
1
:
continue
# Construct the scenario name
name_parts
=
[
backend
,
deploy_type
]
# Each Deployment Spec contains
if
deploy_type
==
"agg"
:
# the dynamo deployment configuration
name_parts
.
append
(
f
"tp-
{
tp_size
}
"
)
elif
deploy_type
==
"disagg"
:
name_parts
.
append
(
f
"prefill-tp-
{
tp_size
}
-decode-tp-
{
tp_size
}
"
)
name_parts
.
append
(
f
"dp-
{
dp_replicas
}
"
)
scenario_name
=
"-"
.
join
(
name_parts
)
# Create and configure the deployment
deployment
=
_create_deployment_spec
(
backend
,
deploy_type
,
yaml_files
[
deploy_type
]
)
if
tp_size
>
1
:
_set_tensor_parallel
(
deployment
,
backend
,
deploy_type
,
tp_size
)
if
dp_replicas
>
1
:
_set_replicas
(
deployment
,
backend
,
deploy_type
,
dp_replicas
)
deployments
[
scenario_name
]
=
deployment
return
deployments
deployment_specs
=
{
"agg-tp-1-dp-1"
:
(
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/agg.yaml"
)
),
"disagg-tp-1-dp-1"
:
(
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/disagg.yaml"
)
),
}
# TP-2 scenarios
# Create all deployment specifications
deployment_specs
[
"agg-tp-2-dp-1"
]
=
DeploymentSpec
(
deployment_specs
=
{}
"/workspace/components/backends/vllm/deploy/agg.yaml"
deployment_specs
.
update
(
_create_deployments_for_backend
(
"vllm"
))
)
deployment_specs
.
update
(
_create_deployments_for_backend
(
"sglang"
))
deployment_specs
[
"agg-tp-2-dp-1"
].
set_tensor_parallel
(
2
,
[
"VllmDecodeWorker"
])
deployment_specs
[
"disagg-prefill-tp-2-decode-tp-2-dp-1"
]
=
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/disagg.yaml"
)
deployment_specs
[
"disagg-prefill-tp-2-decode-tp-2-dp-1"
][
"VllmPrefillWorker"
].
tensor_parallel_size
=
2
deployment_specs
[
"disagg-prefill-tp-2-decode-tp-2-dp-1"
][
"VllmDecodeWorker"
].
tensor_parallel_size
=
2
# TP-4 scenarios
deployment_specs
[
"agg-tp-4-dp-1"
]
=
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/agg.yaml"
)
deployment_specs
[
"agg-tp-4-dp-1"
].
set_tensor_parallel
(
4
,
[
"VllmDecodeWorker"
])
deployment_specs
[
"disagg-prefill-tp-4-decode-tp-4-dp-1"
]
=
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/disagg.yaml"
)
deployment_specs
[
"disagg-prefill-tp-4-decode-tp-4-dp-1"
][
"VllmPrefillWorker"
].
tensor_parallel_size
=
4
deployment_specs
[
"disagg-prefill-tp-4-decode-tp-4-dp-1"
][
"VllmDecodeWorker"
].
tensor_parallel_size
=
4
# Derivative Specs With Incremented Replicats
deployment_specs
[
"agg-tp-1-dp-2"
]
=
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/agg.yaml"
)
deployment_specs
[
"agg-tp-1-dp-2"
][
"Frontend"
].
replicas
=
2
deployment_specs
[
"agg-tp-1-dp-2"
][
"VllmDecodeWorker"
].
replicas
=
2
deployment_specs
[
"disagg-tp-1-dp-2"
]
=
DeploymentSpec
(
"/workspace/components/backends/vllm/deploy/disagg.yaml"
)
deployment_specs
[
"disagg-tp-1-dp-2"
][
"Frontend"
].
replicas
=
2
deployment_specs
[
"disagg-tp-1-dp-2"
][
"VllmDecodeWorker"
].
replicas
=
2
deployment_specs
[
"disagg-tp-1-dp-2"
][
"VllmPrefillWorker"
].
replicas
=
2
# Each failure scenaro contains a list of failure injections
# Each failure scenaro contains a list of failure injections
...
@@ -114,25 +163,49 @@ deployment_specs["disagg-tp-1-dp-2"]["VllmPrefillWorker"].replicas = 2
...
@@ -114,25 +163,49 @@ deployment_specs["disagg-tp-1-dp-2"]["VllmPrefillWorker"].replicas = 2
#
#
# Example:
# Example:
#
#
# "prefill_worker": [
[30, [("dynamo_p
refill
w
orker",
1)]]
],
# "prefill_worker": [
Failure(30, "VllmP
refill
W
orker",
"dynamo.vllm", "SIGKILL")
],
#
#
# terminates 1 prefill worker after 30 seconds
# terminates 1 prefill worker after 30 seconds
def
_create_backend_failures
(
backend
):
failures
=
{
"""Generate backend-specific failure scenarios."""
workers
=
WORKER_MAP
[
backend
]
decode_worker
=
workers
[
"decode"
]
prefill_worker
=
workers
[
"prefill"
]
process_name
=
f
"dynamo.
{
backend
}
"
failures
=
{
"frontend"
:
[
Failure
(
30
,
"Frontend"
,
"dynamo.frontend"
)],
"frontend"
:
[
Failure
(
30
,
"Frontend"
,
"dynamo.frontend"
)],
"frontend_pod"
:
[
Failure
(
30
,
"Frontend"
,
"delete_pod"
)],
"frontend_pod"
:
[
Failure
(
30
,
"Frontend"
,
"delete_pod"
)],
"decode_worker"
:
[
Failure
(
30
,
"VllmDecodeWorker"
,
"dynamo.vllm"
,
"SIGKILL"
)],
"decode_worker"
:
[
Failure
(
30
,
decode_worker
,
process_name
,
"SIGKILL"
)],
"decode_worker_pod"
:
[
Failure
(
30
,
"VllmDecodeWorker"
,
"delete_pod"
)],
"decode_worker_pod"
:
[
Failure
(
30
,
decode_worker
,
"delete_pod"
)],
"prefill_worker"
:
[
Failure
(
30
,
"VllmPrefillWorker"
,
"dynamo.vllm"
,
"SIGKILL"
)],
"prefill_worker"
:
[
Failure
(
30
,
prefill_worker
,
process_name
,
"SIGKILL"
)],
"prefill_worker_pod"
:
[
Failure
(
30
,
"VllmPrefillWorker"
,
"delete_pod"
)],
"prefill_worker_pod"
:
[
Failure
(
30
,
prefill_worker
,
"delete_pod"
)],
"vllm_decode_engine_core"
:
[
Failure
(
30
,
"VllmDecodeWorker"
,
"VLLM::EngineCore"
,
"SIGKILL"
)
],
"vllm_prefill_engine_core"
:
[
Failure
(
30
,
"VllmPrefillWorker"
,
"VLLM::EngineCore"
,
"SIGKILL"
)
],
"none"
:
[],
"none"
:
[],
}
}
if
backend
==
"vllm"
:
failures
[
"vllm_decode_engine_core"
]
=
[
Failure
(
30
,
decode_worker
,
"VLLM::EngineCore"
,
"SIGKILL"
)
]
failures
[
"vllm_prefill_engine_core"
]
=
[
Failure
(
30
,
prefill_worker
,
"VLLM::EngineCore"
,
"SIGKILL"
)
]
elif
backend
==
"sglang"
:
failures
[
"sglang_decode_scheduler"
]
=
[
Failure
(
30
,
decode_worker
,
"sglang::scheduler"
,
"SIGKILL"
)
]
failures
[
"sglang_decode_detokenizer"
]
=
[
Failure
(
30
,
decode_worker
,
"sglang::detokenizer"
,
"SIGKILL"
)
]
failures
[
"sglang_prefill_scheduler"
]
=
[
Failure
(
30
,
prefill_worker
,
"sglang::scheduler"
,
"SIGKILL"
)
]
failures
[
"sglang_prefill_detokenizer"
]
=
[
Failure
(
30
,
prefill_worker
,
"sglang::detokenizer"
,
"SIGKILL"
)
]
return
failures
load
=
Load
()
load
=
Load
()
...
@@ -144,10 +217,34 @@ model = None
...
@@ -144,10 +217,34 @@ model = None
scenarios
=
{}
scenarios
=
{}
for
deployment_name
,
deployment_spec
in
deployment_specs
.
items
():
# Map of backend to failure definitions
for
failure_name
,
failure
in
failures
.
items
():
backend_failure_map
=
{
"vllm"
:
_create_backend_failures
(
"vllm"
),
"sglang"
:
_create_backend_failures
(
"sglang"
),
}
for
deployment_name
,
deployment_info
in
deployment_specs
.
items
():
backend
=
deployment_info
[
"backend"
]
# Validate backend
if
backend
not
in
backend_failure_map
:
raise
ValueError
(
f
"Unsupported backend:
{
backend
}
. Supported backends are:
{
list
(
backend_failure_map
.
keys
())
}
"
)
# Get the appropriate failure set for this backend
failure_set
=
backend_failure_map
[
backend
]
for
failure_name
,
failure
in
failure_set
.
items
():
# Skip prefill failures for aggregated deployments
if
"prefill"
in
failure_name
and
"disagg"
not
in
deployment_name
:
if
"prefill"
in
failure_name
and
"disagg"
not
in
deployment_name
:
continue
continue
scenarios
[
f
"
{
deployment_name
}
-
{
failure_name
}
"
]
=
Scenario
(
deployment
=
deployment_spec
,
load
=
load
,
failures
=
failure
,
model
=
model
scenario_name
=
f
"
{
deployment_name
}
-
{
failure_name
}
"
scenarios
[
scenario_name
]
=
Scenario
(
deployment
=
deployment_info
[
"spec"
],
load
=
load
,
failures
=
failure
,
model
=
model
,
backend
=
backend
,
)
)
tests/fault_tolerance/deploy/test_deployment.py
View file @
ec50db06
...
@@ -147,7 +147,18 @@ async def test_fault_scenario(
...
@@ -147,7 +147,18 @@ async def test_fault_scenario(
scenario
.
deployment
.
set_model
(
scenario
.
model
)
scenario
.
deployment
.
set_model
(
scenario
.
model
)
model
=
scenario
.
model
model
=
scenario
.
model
else
:
else
:
# Get model from the appropriate worker based on backend
try
:
if
scenario
.
backend
==
"vllm"
:
model
=
scenario
.
deployment
[
"VllmDecodeWorker"
].
model
model
=
scenario
.
deployment
[
"VllmDecodeWorker"
].
model
elif
scenario
.
backend
==
"sglang"
:
model
=
scenario
.
deployment
[
"decode"
].
model
else
:
model
=
None
except
(
KeyError
,
AttributeError
):
model
=
None
# Fallback to default if still None
model
=
model
or
"Qwen/Qwen3-0.6B"
scenario
.
deployment
.
set_logging
(
True
,
"info"
)
scenario
.
deployment
.
set_logging
(
True
,
"info"
)
...
...
tests/utils/managed_deployment.py
View file @
ec50db06
...
@@ -7,8 +7,8 @@ import os
...
@@ -7,8 +7,8 @@ import os
import
re
import
re
import
shlex
import
shlex
import
time
import
time
from
dataclasses
import
dataclass
from
dataclasses
import
dataclass
,
field
from
typing
import
Any
,
Optional
from
typing
import
Any
,
List
,
Optional
import
kr8s
import
kr8s
import
kubernetes
import
kubernetes
...
@@ -59,7 +59,7 @@ class ServiceSpec:
...
@@ -59,7 +59,7 @@ class ServiceSpec:
@
property
@
property
def
model
(
self
)
->
Optional
[
str
]:
def
model
(
self
)
->
Optional
[
str
]:
"""Model being served by this service"""
"""Model being served by this service
(checks both --model and --model-path)
"""
try
:
try
:
args_list
=
self
.
_spec
[
"extraPodSpec"
][
"mainContainer"
][
"args"
]
args_list
=
self
.
_spec
[
"extraPodSpec"
][
"mainContainer"
][
"args"
]
except
KeyError
:
except
KeyError
:
...
@@ -67,7 +67,7 @@ class ServiceSpec:
...
@@ -67,7 +67,7 @@ class ServiceSpec:
args_str
=
" "
.
join
(
args_list
)
args_str
=
" "
.
join
(
args_list
)
parts
=
shlex
.
split
(
args_str
)
parts
=
shlex
.
split
(
args_str
)
for
i
,
part
in
enumerate
(
parts
):
for
i
,
part
in
enumerate
(
parts
):
if
part
==
"--model"
:
if
part
in
[
"--model"
,
"--model-path"
]
:
return
parts
[
i
+
1
]
if
i
+
1
<
len
(
parts
)
else
None
return
parts
[
i
+
1
]
if
i
+
1
<
len
(
parts
)
else
None
return
None
return
None
...
@@ -82,9 +82,10 @@ class ServiceSpec:
...
@@ -82,9 +82,10 @@ class ServiceSpec:
args_str
=
" "
.
join
(
args_list
)
args_str
=
" "
.
join
(
args_list
)
parts
=
shlex
.
split
(
args_str
)
parts
=
shlex
.
split
(
args_str
)
# Try to update --model first, then --model-path
model_index
=
None
model_index
=
None
for
i
,
part
in
enumerate
(
parts
):
for
i
,
part
in
enumerate
(
parts
):
if
part
==
"--model"
:
if
part
in
[
"--model"
,
"--model-path"
]
:
model_index
=
i
model_index
=
i
break
break
...
@@ -360,6 +361,7 @@ class ManagedDeployment:
...
@@ -360,6 +361,7 @@ class ManagedDeployment:
_port_forward
:
Optional
[
Any
]
=
None
_port_forward
:
Optional
[
Any
]
=
None
_deployment_name
:
Optional
[
str
]
=
None
_deployment_name
:
Optional
[
str
]
=
None
_apps_v1
:
Optional
[
Any
]
=
None
_apps_v1
:
Optional
[
Any
]
=
None
_active_port_forwards
:
List
[
Any
]
=
field
(
default_factory
=
list
)
def
__post_init__
(
self
):
def
__post_init__
(
self
):
self
.
_deployment_name
=
self
.
deployment_spec
.
name
self
.
_deployment_name
=
self
.
deployment_spec
.
name
...
@@ -673,40 +675,100 @@ class ManagedDeployment:
...
@@ -673,40 +675,100 @@ class ManagedDeployment:
raise
raise
def
port_forward
(
self
,
pod
,
remote_port
,
max_connection_attempts
=
3
):
def
port_forward
(
self
,
pod
,
remote_port
,
max_connection_attempts
=
3
):
"""Attempt to connect to a pod and return the port-forward object on success."""
"""Attempt to connect to a pod and return the port-forward object on success.
Note: Port forwards run in background threads. When pods are terminated,
the async cleanup may fail, which is expected and can be safely ignored.
"""
try
:
# Create port forward - this runs in a background thread
# Use 127.0.0.1 (localhost) instead of 0.0.0.0 to prevent port conflicts
port_forward
=
pod
.
portforward
(
port_forward
=
pod
.
portforward
(
remote_port
=
remote_port
,
remote_port
=
remote_port
,
local_port
=
0
,
local_port
=
0
,
# Auto-assign an available port
address
=
"
0
.0.0.
0
"
,
address
=
"
127
.0.0.
1
"
,
# Use localhost for better isolation and conflict prevention
)
)
port_forward
.
start
()
port_forward
.
start
()
for
_
in
range
(
max_connection_attempts
):
# Try to connect with exponential backoff
backoff_delay
=
0.5
# Start with 500ms
for
attempt
in
range
(
max_connection_attempts
):
time
.
sleep
(
backoff_delay
)
backoff_delay
=
min
(
backoff_delay
*
1.5
,
5.0
)
# Double delay, max 5 seconds
# Check if port is assigned
if
port_forward
.
local_port
==
0
:
if
port_forward
.
local_port
==
0
:
time
.
sleep
(
1
)
self
.
_logger
.
debug
(
f
"Port not yet assigned for pod
{
pod
.
name
}
(attempt
{
attempt
+
1
}
/
{
max_connection_attempts
}
)"
)
continue
continue
# Try to connect to the port forwarded service
test_url
=
f
"http://localhost:
{
port_forward
.
local_port
}
/"
test_url
=
f
"http://localhost:
{
port_forward
.
local_port
}
/"
try
:
try
:
# Send HEAD request to test connection
# Send HEAD request to test connection
response
=
requests
.
head
(
test_url
,
timeout
=
5
)
response
=
requests
.
head
(
test_url
,
timeout
=
5
)
if
response
.
status_code
in
(
200
,
404
):
# 404 is acceptable
if
response
.
status_code
in
(
200
,
404
):
# 404 is acceptable
self
.
_active_port_forwards
.
append
(
port_forward
)
return
port_forward
return
port_forward
except
(
requests
.
ConnectionError
,
requests
.
Timeout
)
as
e
:
except
(
requests
.
ConnectionError
,
requests
.
Timeout
)
as
e
:
self
.
_logger
.
warning
(
f
"Connection test failed for pod
{
pod
.
name
}
:
{
e
}
"
)
self
.
_logger
.
warning
(
f
"Connection test failed for pod
{
pod
.
name
}
(attempt
{
attempt
+
1
}
/
{
max_connection_attempts
}
):
{
e
}
"
)
# Retry port-forward
# Restart port-forward for next attempt (except on last attempt)
if
attempt
==
max_connection_attempts
-
1
:
continue
try
:
port_forward
.
stop
()
port_forward
.
stop
()
port_forward
.
start
()
port_forward
.
start
()
time
.
sleep
(
1
)
except
Exception
as
e
:
self
.
_logger
.
debug
(
f
"Error restarting port forward for pod
{
pod
.
name
}
:
{
e
}
"
)
break
# All attempts failed
# All attempts failed
self
.
_logger
.
warning
(
f
"Port forward failed after
{
max_connection_attempts
}
attempts for pod
{
pod
.
name
}
"
)
try
:
port_forward
.
stop
()
port_forward
.
stop
()
except
Exception
:
pass
# Ignore errors during cleanup
return
None
except
Exception
as
e
:
self
.
_logger
.
warning
(
f
"Failed to create port forward for pod
{
pod
.
name
}
:
{
e
}
"
)
return
None
return
None
async
def
_cleanup
(
self
):
async
def
_cleanup
(
self
):
try
:
try
:
# Collect logs/metrics first; any PFs opened here will be tracked and stopped below.
self
.
_get_service_logs
()
self
.
_get_service_logs
()
self
.
_logger
.
info
(
f
"Cleaning up
{
len
(
self
.
_active_port_forwards
)
}
active port forwards"
)
for
port_forward
in
self
.
_active_port_forwards
:
try
:
port_forward
.
stop
()
except
RuntimeError
as
e
:
# Expected error when pod is terminated:
# "anext(): asynchronous generator is already running"
if
"anext()"
in
str
(
e
)
or
"already running"
in
str
(
e
):
self
.
_logger
.
debug
(
f
"Port forward cleanup:
{
e
}
"
)
else
:
self
.
_logger
.
warning
(
f
"Unexpected error stopping port forward:
{
e
}
"
)
except
Exception
as
e
:
self
.
_logger
.
debug
(
f
"Error stopping port forward:
{
e
}
"
)
self
.
_active_port_forwards
.
clear
()
finally
:
finally
:
await
self
.
_delete_deployment
()
await
self
.
_delete_deployment
()
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment