After deployment, validate the predictions against actual performance using [AIPerf](https://github.com/ai-dynamo/aiperf).
<Tip>
Run AIPerf **inside the cluster** to avoid network latency affecting measurements. Use a Kubernetes Job:
</Tip>
> ℹ️ Run AIPerf **inside the cluster** to avoid network latency affecting measurements.
#### Deriving AIPerf Parameters from AIC Output
AIC automatically generates AIPerf scripts along with Dynamo configs and stores them in the results folder (when `--save-dir ...` is specified). For Kubernetes deployments, you can run benchmarks using `k8s_bench.yaml`; while for bare-metal systems, use the `bench_run.sh` script. These scripts execute AIPerf across a concurrency list: the default set (`1 2 8 16 32 64 128`) along with `BenchConfig.estimated_concurrency` and its values within ±5%. You can also customize this concurrency list as needed.
To use AIPerf to benchmark an AIC-recommended configuration, you'll need to translate AIC parameters into AIPerf profiling arguments (we are working to automate this):
By default, AIPerf results will be saved in `/tmp/bench_artifacts` of the containers. If PVC name is specified in `--generator-set K8sConfig.k8s_pvc_name=$YOUR_PVC`, result artifacts will be saved in the PVC volume mount instead.
> **Critical**: Disaggregated deployments **require RDMA** for KV cache transfer. Without RDMA, performance degrades by **40x** (TTFT increases from 355ms to 10+ seconds). See the Disaggregated Deployment section below.
...
...
@@ -586,14 +636,14 @@ Override vLLM engine parameters with `--generator-set`:
@@ -613,55 +663,42 @@ AIConfigurator's default predictions assume no prefix caching. Enable it post-de
### Backends and Versions
| Backend | Versions | Status |
|---------|----------|--------|
| TensorRT-LLM | 1.0.0rc3, 1.2.0rc5 | Production |
| vLLM | 0.12.0 | Production |
| SGLang | 0.5.6.post2 | Production |
For a comprehensive breakdown of which model/system/backend/version combinations are supported in both aggregated and disaggregated modes, refer to the [**support matrix CSV**](https://github.com/ai-dynamo/aiconfigurator/blob/main/src/aiconfigurator/systems/support_matrix.csv). This file is automatically generated and tested to ensure accuracy across all supported configurations.
### Systems
| GPU System | SGLang | TensorRT-LLM | vLLM |
|------------|--------|--------------|------|
| H200 SXM | Yes | Yes | Yes |
| H100 SXM | Yes | Yes | Yes |
| A100 SXM | -- | Yes | Yes |
| B200 SXM | Yes | Yes | -- |
| GB200 SXM | -- | Yes | -- |
### Models
You can also check if a system / framework version is supported via the `aiconfigurator cli support` command. For example:
```bash
aiconfigurator cli support --model Qwen/Qwen3-32B-FP8 --system h100_sxm --backend-version 1.2.0rc5