"lib/kv-router/vscode:/vscode.git/clone" did not exist on "de27efe6e6972f6c21539e257f9109c8fa0dc065"
Unverified Commit b2c59aa4 authored by Yan Ru Pei's avatar Yan Ru Pei Committed by GitHub
Browse files

feat(replay): add shared loadgen workload paths [DYN-2510] (#7593)


Signed-off-by: default avatarPeaBrane <yanrpei@gmail.com>
parent 2b36b175
......@@ -7,8 +7,8 @@ subtitle: Replay Mooncake-style traces through the mocker in offline or online m
This guide covers trace replay support for Mooncake-style JSONL traces via `python -m dynamo.replay`,
which prints an AIPerf-style summary table, writes the full replay report JSON to disk, and exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and synthetic replay inputs
directly.
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, closed-loop concurrency, and
synthetic workload inputs directly.
Unlike normal `dynamo.mocker` usage, offline replay does not launch workers, register endpoints, or
require NATS, etcd, or a frontend. Online replay does exercise the live mock-worker runtime path.
......@@ -47,6 +47,24 @@ python -m dynamo.replay \
--report-json /tmp/replay-report.json
```
Run synthetic workload replay when you want shared-prefix or multi-turn structure without a trace
file:
```bash
python -m dynamo.replay \
--input-tokens 5000 \
--output-tokens 500 \
--request-count 200 \
--turns-per-session 3 \
--shared-prefix-ratio 0.5 \
--num-prefix-groups 8 \
--inter-turn-delay-ms 250 \
--replay-mode offline \
--replay-concurrency 32 \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
`python -m dynamo.replay` prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk.
......@@ -65,12 +83,29 @@ Example:
{"timestamp": 0, "input_length": 6755, "output_length": 500, "hash_ids": [0, 1, 2, 3]}
```
The mocker synthesizes token blocks from `hash_ids` using the configured `--block-size`, so the
Replay also supports multi-turn sessions. Use the same `session_id` on all turns in a session. The
first turn uses `timestamp` or `created_time`; later turns may use either:
- `delay` or `delay_ms` directly
- or an absolute later `timestamp`, in which case replay infers the inter-turn delay from the
previous turn timestamp
Example:
```json
{"session_id":"session-a","timestamp":1000,"input_length":2048,"output_length":128,"hash_ids":[1,2,3,4]}
{"session_id":"session-a","delay":250,"input_length":2560,"output_length":128,"hash_ids":[1,2,3,4,5]}
{"session_id":"session-b","timestamp":1010,"input_length":1024,"output_length":64,"hash_ids":[9,10]}
{"session_id":"session-b","delay_ms":50,"input_length":1536,"output_length":64,"hash_ids":[9,10,11]}
```
The mocker synthesizes token blocks from `hash_ids` using the configured mocker `block_size`, so the
replay block size must match the block size used when the trace was generated. Public Mooncake
traces are commonly block-level hashes at `512` tokens per hash ID, so replaying them with the
default mocker `block_size=64` will fail once `input_length > len(hash_ids) * 64`. For
`engine_type=sglang`, replay still uses canonical `block_size` internally; `sglang.page_size` is
accepted as a compatibility alias and is normalized into `block_size` before replay starts.
default mocker `block_size=64` will fail once `input_length > len(hash_ids) * 64`. Set that
through `--extra-engine-args '{"block_size":512}'`. For `engine_type=sglang`, replay still uses
canonical `block_size` internally; `sglang.page_size` is accepted as a compatibility alias and is
normalized into `block_size` before replay starts.
## Replay Surfaces
......@@ -85,10 +120,19 @@ The dedicated replay CLI exposes:
- `--replay-concurrency`
- `--arrival-interval-ms`
- `--arrival-speedup-ratio`
- `--turns-per-session`
- `--shared-prefix-ratio`
- `--num-prefix-groups`
- `--inter-turn-delay-ms`
- `--extra-engine-args` (JSON string)
- `--router-config` (JSON string)
- `--report-json`
Defaults:
- `--replay-mode offline`
- `--router-mode round_robin`
Example:
```bash
......@@ -115,9 +159,10 @@ SGLang replay uses the same CLI surface. A minimal extra-engine-args file can us
}
```
Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Unspecified fields
fall back to the same defaults used by `MockEngineArgs::default()` and
`KvRouterConfig::default()`.
Both `--extra-engine-args` and `--router-config` accept partial JSON objects. Engine settings such
as `block_size`, `engine_type`, `dp_size`, `speedup_ratio`, and `decode_speedup_ratio` belong in
`--extra-engine-args`, not as top-level replay CLI flags. Unspecified fields fall back to the same
defaults used by `MockEngineArgs::default()` and `KvRouterConfig::default()`.
### Synthetic Replay
......@@ -137,6 +182,19 @@ python -m dynamo.replay \
This is useful for parameter sweeps where Mooncake-style prefix structure is not required.
When `--turns-per-session > 1`, `--request-count` is interpreted as the number of sessions rather
than the total number of emitted turns. The total completed request count becomes:
- `request_count * turns_per_session`
Synthetic workload options:
- `--turns-per-session`: number of turns in each synthetic session
- `--shared-prefix-ratio`: fraction of prompt blocks shared inside a prefix group
- `--num-prefix-groups`: number of shared-prefix groups; `0` disables grouping
- `--inter-turn-delay-ms`: constant delay applied after each completed turn before the next turn in
the same session becomes eligible
## Modes
### Fixed-Schedule Replay
......@@ -155,8 +213,8 @@ This is the right mode when you want deterministic replay of the original arriva
### Closed-Loop Concurrency Replay
Use `--replay-concurrency` to ignore trace arrival timing and keep a fixed number of requests in
flight:
Use `--replay-concurrency` to ignore first-turn trace arrival timing and keep a fixed number of
requests in flight:
```bash
python -m dynamo.replay /path/to/mooncake_trace.jsonl \
......@@ -167,6 +225,13 @@ python -m dynamo.replay /path/to/mooncake_trace.jsonl \
This mode is useful when you want to compare scheduler behavior under a fixed offered concurrency rather than the original trace schedule.
For multi-turn sessions, concurrency mode still enforces session order and inter-turn delays:
- first-turn timestamps are ignored
- turn `n+1` is not eligible until turn `n` completes
- `delay` / `delay_ms` / synthetic `--inter-turn-delay-ms` are still applied after completion
- TTFT is measured from actual dispatch under the cap, not from the ignored trace timestamp
### Online Replay
Online replay launches the mock workers and replays the trace against the live runtime path. This
......@@ -256,14 +321,15 @@ If `--report-json` is not provided, `python -m dynamo.replay` writes a timestamp
Shared replay constraints:
- aggregated mode
- `--engine-type vllm|sglang`
- `--data-parallel-size 1`
- `extra_engine_args.engine_type` must be `vllm` or `sglang`
- `extra_engine_args.dp_size` must be `1`
Additional offline constraints:
- offline `kv_router` requires `num_workers > 1`
- public single-worker offline replay still uses the legacy single-worker runtime for `vllm`
while `sglang` goes through the shared multi-worker replay runtime even when `num_workers=1`
- single-worker offline replay is still a dedicated fast path for `vllm`, but it now supports both
flat request replay and workload-driven multi-turn replay
- `sglang` still goes through the shared multi-worker replay runtime even when `num_workers=1`
Additional online constraints:
......@@ -276,9 +342,12 @@ If you violate those constraints, replay fails immediately with a validation err
- `python -m dynamo.replay` requires exactly one of:
either a trace file, or all of `--input-tokens`, `--output-tokens`, and `--request-count`
- `--replay-concurrency` works with both trace replay and synthetic replay
- `--speedup-ratio` still affects simulated timing
- mocker compute-speed knobs such as `speedup_ratio` still affect simulated timing when passed via
`--extra-engine-args`
- `--arrival-speedup-ratio` affects trace timestamps, not worker compute speed
- `--arrival-interval-ms` only applies to synthetic replay
- `--turns-per-session`, `--shared-prefix-ratio`, `--num-prefix-groups`, and
`--inter-turn-delay-ms` only apply to synthetic replay
- `--extra-engine-args` and `--router-config` are JSON strings on the standalone replay CLI
- offline replay does not need planner runtime setup, router registration, or external event transport
- the replay block size should match the trace block size, because token synthesis expands `hash_ids`
......
......@@ -125,8 +125,11 @@ python -m dynamo.mocker \
## Trace Replay
The mocker supports replaying Mooncake-style traces through the dedicated replay CLI, which exposes
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, and the synthetic replay path
directly:
`offline|online`, `round_robin|kv_router`, `arrival_speedup_ratio`, closed-loop concurrency
admission, and synthetic workload generation directly:
The replay CLI defaults to `--replay-mode offline` and `--router-mode round_robin`. Engine settings
such as `block_size`, `engine_type`, and compute speedups still belong in `--extra-engine-args`.
```bash
python -m dynamo.replay /path/to/mooncake_trace.jsonl \
......@@ -154,9 +157,40 @@ python -m dynamo.replay \
--report-json /tmp/replay-report.json
```
Synthetic replay also supports workload-style generation for shared-prefix and multi-turn tests:
```bash
python -m dynamo.replay \
--input-tokens 5000 \
--output-tokens 500 \
--request-count 200 \
--turns-per-session 3 \
--shared-prefix-ratio 0.5 \
--num-prefix-groups 8 \
--inter-turn-delay-ms 250 \
--replay-mode offline \
--replay-concurrency 32 \
--extra-engine-args '{"block_size":512}' \
--report-json /tmp/replay-report.json
```
For trace files, replay also understands multi-turn sessions when records share `session_id`. The
first turn uses `timestamp`/`created_time`; later turns can use `delay` or `delay_ms`:
```json
{"session_id":"session-a","timestamp":1000,"input_length":2048,"output_length":128,"hash_ids":[1,2,3,4]}
{"session_id":"session-a","delay":250,"input_length":2560,"output_length":128,"hash_ids":[1,2,3,4,5]}
```
The standalone replay CLI prints an AIPerf-style summary table to stdout and writes the full replay
report JSON to disk.
Timing semantics:
- trace mode honors first-turn timestamps and inter-turn delays
- concurrency mode ignores first-turn timestamps but still enforces inter-turn delays
- in concurrency mode, TTFT is measured from actual dispatch under the in-flight cap
For full usage, constraints, and benchmarking guidance, see [Mocker Trace Replay](../benchmarks/mocker-trace-replay.md).
Replay supports aggregated `vllm` and `sglang` engine configs. Internally replay uses canonical
......
......@@ -40,11 +40,11 @@ reqwest = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
tokio = { workspace = true }
dynamo-mocker = { workspace = true }
[dev-dependencies]
async-trait = { workspace = true }
dynamo-kv-router = { workspace = true, features = ["bench"] }
dynamo-mocker = { workspace = true }
dynamo-tokens = { workspace = true }
minstant = "0.1.7"
plotters = { version = "0.3", default-features = false, features = ["svg_backend", "line_series", "point_series", "full_palette"] }
......
......@@ -9,16 +9,11 @@ use clap::Parser;
use common::NoopSequencePublisher;
use dynamo_kv_router::protocols::WorkerWithDpRank;
use dynamo_kv_router::{ActiveSequencesMultiWorker, OverlapScores, SequenceRequest};
use dynamo_mocker::common::protocols::{DirectRequest, KvEventPublishers, OutputSignal};
use dynamo_mocker::scheduler::Scheduler;
use dynamo_mocker::scheduler::SchedulerHandle;
use dynamo_mocker::loadgen::Trace;
use dynamo_tokens::SequenceHash;
use std::collections::HashMap;
use std::sync::Arc;
use tokio::sync::mpsc;
use tokio::task::JoinHandle;
use tokio::time::{Duration, Instant};
use uuid::Uuid;
#[derive(Parser, Debug)]
#[clap(
......@@ -76,69 +71,46 @@ struct SequenceTrace {
/// completed=true → Free
/// 4. Collect timestamps for later replay
async fn generate_sequence_events(
traces: &[Vec<MooncakeRequest>],
traces: &[Trace],
num_gpu_blocks: usize,
block_size: u32,
trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<Vec<SequenceTrace>>> {
println!("Generating sequence events...");
let sched_args = default_mock_engine_args(num_gpu_blocks, block_size as usize)?;
let scaled_traces: Vec<_> = traces
.iter()
.map(|worker_trace| scale_mooncake_trace(worker_trace, trace_simulation_duration_ms))
.collect();
let progress = make_progress_bar(Some(traces.iter().map(|w| w.len() as u64).sum::<u64>()));
let mut tasks: Vec<JoinHandle<anyhow::Result<Vec<SequenceTrace>>>> = Vec::new();
for worker_trace in scaled_traces {
let sched_args = sched_args.clone();
let progress = progress.clone();
tasks.push(tokio::spawn(async move {
let (output_tx, mut output_rx) = mpsc::unbounded_channel::<OutputSignal>();
// No KvCacheEventSink — we only need output signals
let scheduler = Scheduler::new(
sched_args,
0,
Some(output_tx),
KvEventPublishers::default(),
None,
);
let artifacts = generate_replay_artifacts(
traces,
num_gpu_blocks,
block_size,
trace_simulation_duration_ms,
)
.await?;
let mut all_traces = Vec::with_capacity(artifacts.len());
// Pre-compute metadata for each request before submission
let mut metadata: HashMap<Uuid, RequestMetadata> = HashMap::new();
for req in &worker_trace {
let block_hashes: Vec<SequenceHash> = req
.hash_ids
for artifact in artifacts {
let metadata = artifact
.requests
.iter()
.map(|&id| local_block_hash_from_id(id, block_size).0)
.collect();
let isl = req.hash_ids.len() * block_size as usize;
metadata.insert(
req.uuid,
.map(|request| {
(
request.uuid,
RequestMetadata {
block_hashes,
isl,
output_length: req.output_length,
block_hashes: request.replay_hashes.sequence_hashes.clone(),
isl: request.input_length,
output_length: request.output_length as u64,
},
);
}
)
})
.collect::<HashMap<_, _>>();
// Spawn drain task that converts OutputSignals → SequenceTrace entries
let drain_handle: JoinHandle<Vec<SequenceTrace>> = tokio::spawn(async move {
let mut entries = Vec::new();
let mut seen: HashMap<Uuid, bool> = HashMap::new();
let mut seen = HashMap::new();
while let Some(signal) = output_rx.recv().await {
for timed_signal in artifact.output_signals {
let signal = timed_signal.signal;
let request_id = signal.uuid.to_string();
if let std::collections::hash_map::Entry::Vacant(e) = seen.entry(signal.uuid) {
e.insert(false);
if let std::collections::hash_map::Entry::Vacant(entry) = seen.entry(signal.uuid) {
entry.insert(());
if let Some(meta) = metadata.get(&signal.uuid) {
entries.push(SequenceTrace {
entry: SequenceTraceEntry::Add {
......@@ -147,92 +119,26 @@ async fn generate_sequence_events(
isl: meta.isl,
output_length: meta.output_length,
},
timestamp_us: 0, // rescaled later
timestamp_us: timed_signal.timestamp_us,
});
entries.push(SequenceTrace {
entry: SequenceTraceEntry::PrefillComplete {
request_id: request_id.clone(),
},
timestamp_us: 0,
timestamp_us: timed_signal.timestamp_us,
});
}
}
if signal.completed {
seen.insert(signal.uuid, true);
entries.push(SequenceTrace {
entry: SequenceTraceEntry::Free { request_id },
timestamp_us: 0,
});
}
}
entries
});
// Submit requests at scaled timing
let mut i = 0;
let mut target = Instant::now();
let start = target;
while i < worker_trace.len() {
let prev_i = i;
scheduler.receive(DirectRequest {
tokens: tokens_from_request(&worker_trace[i], block_size),
max_output_tokens: worker_trace[i].output_length as usize,
uuid: Some(worker_trace[i].uuid),
dp_rank: 0,
arrival_timestamp_ms: None,
});
i += 1;
while i < worker_trace.len()
&& worker_trace[i].timestamp == worker_trace[i - 1].timestamp
{
scheduler.receive(DirectRequest {
tokens: tokens_from_request(&worker_trace[i], block_size),
max_output_tokens: worker_trace[i].output_length as usize,
uuid: Some(worker_trace[i].uuid),
dp_rank: 0,
arrival_timestamp_ms: None,
timestamp_us: timed_signal.timestamp_us,
});
i += 1;
}
if i < worker_trace.len() {
target += Duration::from_millis(
worker_trace[i].timestamp - worker_trace[i - 1].timestamp,
);
}
tokio::time::sleep_until(target).await;
progress.inc((i - prev_i) as u64);
}
// Drop scheduler → CancelGuard fires → background task exits →
// output_tx dropped → drain task sees None
drop(scheduler);
let mut entries = drain_handle.await?;
// Assign monotonically increasing timestamps based on entry order
let total_us = (Instant::now() - start).as_micros() as u64;
let num_entries = entries.len() as u64;
for (idx, entry) in entries.iter_mut().enumerate() {
entry.timestamp_us = if num_entries > 1 {
idx as u64 * total_us / (num_entries - 1)
} else {
0
};
}
Ok(entries)
}));
}
let mut all_traces = Vec::new();
for task in tasks {
all_traces.push(task.await??);
all_traces.push(entries);
}
let total_adds = all_traces
......@@ -503,30 +409,44 @@ async fn run_tests() -> anyhow::Result<()> {
));
{
let mut f = File::create(&path)?;
for (i, (hash_ids, output_length)) in
[(&[0u64, 1, 2] as &[u64], 10u64), (&[0, 1, 3, 4], 10)]
.iter()
.enumerate()
{
writeln!(
f,
"{}",
serde_json::json!({
"timestamp": i as u64,
"hash_ids": hash_ids,
"output_length": output_length,
"session_id": "session-a",
"timestamp": 0,
"input_length": 4,
"hash_ids": [0u64, 1, 2, 3],
"output_length": 10u64,
})
)?;
writeln!(
f,
"{}",
serde_json::json!({
"session_id": "session-a",
"delay": 5.0,
"input_length": 4,
"hash_ids": [4u64, 5, 6, 7],
"output_length": 10u64,
})
)?;
}
}
let traces = process_mooncake_trace(path.to_str().unwrap(), 1, 1, 2, 42)?;
let traces = process_mooncake_trace(path.to_str().unwrap(), 512, 1, 1, 1, 42)?;
std::fs::remove_file(&path).ok();
println!(
"Loaded {} workers, {} total requests",
traces.len(),
traces.iter().map(|t| t.len()).sum::<usize>()
traces
.iter()
.map(|trace| trace
.sessions
.iter()
.map(|session| session.turns.len())
.sum::<usize>())
.sum::<usize>()
);
let seq_traces = generate_sequence_events(&traces, 1048576, 512, 100).await?;
......@@ -545,6 +465,29 @@ async fn run_tests() -> anyhow::Result<()> {
assert!(total_adds > 0, "expected at least one Add event");
assert!(total_frees > 0, "expected at least one Free event");
assert_eq!(total_adds, total_frees, "adds and frees should match");
for trace in &seq_traces {
assert!(
trace
.windows(2)
.all(|window| window[1].timestamp_us >= window[0].timestamp_us)
);
}
let first_free_us = seq_traces[0]
.iter()
.find_map(|entry| match entry.entry {
SequenceTraceEntry::Free { .. } => Some(entry.timestamp_us),
_ => None,
})
.unwrap();
let second_add_us = seq_traces[0]
.iter()
.filter_map(|entry| match entry.entry {
SequenceTraceEntry::Add { .. } => Some(entry.timestamp_us),
_ => None,
})
.nth(1)
.unwrap();
assert!(second_add_us >= first_free_us);
println!("All tests passed.");
Ok(())
......@@ -567,6 +510,7 @@ async fn main() -> anyhow::Result<()> {
};
let traces = process_mooncake_trace(
path,
args.common.block_size,
args.common.trace_length_factor,
args.common.trace_duplication_factor,
args.common.num_unique_inference_workers,
......
......@@ -12,7 +12,11 @@ use dynamo_kv_router::protocols::{
};
pub use dynamo_kv_router::test_utils::{NoopSequencePublisher, SimpleWorkerConfig};
use dynamo_mocker::common::protocols::{
DirectRequest, KvCacheEventSink, KvEventPublishers, MockEngineArgs,
DirectRequest, KvCacheEventSink, KvEventPublishers, MockEngineArgs, OutputSignal,
};
use dynamo_mocker::loadgen::{
ArrivalSpec, DelaySpec, LengthSpec, ReplayRequestHashes, RouterSequence, SequenceHashMode,
SessionPartitionSpec, SyntheticTraceSpec, Trace,
};
use dynamo_mocker::scheduler::Scheduler;
use dynamo_mocker::scheduler::SchedulerHandle;
......@@ -24,6 +28,7 @@ use serde::{Deserialize, Serialize};
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::sync::{Arc, Mutex};
use tokio::sync::mpsc;
use tokio::task::JoinHandle;
use tokio::time::Instant;
use uuid::Uuid;
......@@ -101,6 +106,8 @@ pub struct MooncakeRequest {
#[serde(default = "Uuid::new_v4")]
pub uuid: uuid::Uuid,
pub timestamp: u64,
#[serde(default)]
pub input_length: usize,
pub hash_ids: Vec<u64>,
pub output_length: u64,
}
......@@ -133,6 +140,35 @@ impl KvCacheEventSink for EventCollector {
}
}
#[derive(Clone)]
pub struct TimedReplayRequest {
pub uuid: Uuid,
pub timestamp_us: u64,
pub scheduled_ready_at_ms: f64,
pub input_length: usize,
pub output_length: usize,
pub replay_hashes: ReplayRequestHashes,
}
#[derive(Clone)]
pub struct TimedOutputSignal {
pub signal: OutputSignal,
pub timestamp_us: u64,
}
#[derive(Clone)]
pub struct TimedKvEvent {
pub event: KvCacheEvent,
pub timestamp_us: u64,
}
#[derive(Clone)]
pub struct WorkerReplayArtifacts {
pub requests: Vec<TimedReplayRequest>,
pub output_signals: Vec<TimedOutputSignal>,
pub kv_events: Vec<TimedKvEvent>,
}
/// Load the mooncake trace from disk into a flat list of requests.
pub fn load_mooncake_trace(path: &str) -> anyhow::Result<Vec<MooncakeRequest>> {
let file = File::open(path)?;
......@@ -257,11 +293,15 @@ pub fn duplicate_traces(requests: Vec<MooncakeRequest>, factor: usize) -> Vec<Mo
/// Expand a request's block-level hash_ids into per-token IDs by repeating each
/// hash_id `block_size` times.
pub fn tokens_from_request(request: &MooncakeRequest, block_size: u32) -> Vec<u32> {
request
let mut tokens = request
.hash_ids
.iter()
.flat_map(|id| (0..block_size).map(|_| *id as u32))
.collect()
.collect::<Vec<_>>();
if request.input_length > 0 && request.input_length < tokens.len() {
tokens.truncate(request.input_length);
}
tokens
}
/// Compute the LocalBlockHash for a block-level hash_id the same way the mock
......@@ -304,15 +344,19 @@ pub struct BenchmarkResults {
/// Load, transform, and partition the mooncake trace into per-worker request lists.
pub fn process_mooncake_trace(
path: &str,
block_size: u32,
trace_length_factor: usize,
trace_duplication_factor: usize,
num_workers: usize,
seed: u64,
) -> anyhow::Result<Vec<Vec<MooncakeRequest>>> {
let requests = load_mooncake_trace(path)?;
let requests = expand_trace_lengths(requests, trace_length_factor);
let requests = duplicate_traces(requests, trace_duplication_factor);
Ok(partition_trace(requests, num_workers, seed))
) -> anyhow::Result<Vec<Trace>> {
let trace = Trace::from_mooncake(std::path::Path::new(path), block_size as usize)?
.expand_hash_prefix_depth(trace_length_factor)
.duplicate_hash_space(trace_duplication_factor);
Ok(trace.partition_by_session(SessionPartitionSpec::Random {
num_partitions: num_workers,
seed,
}))
}
/// Build default MockEngineArgs suitable for event generation.
......@@ -330,98 +374,155 @@ pub fn default_mock_engine_args(
.build()?)
}
/// Replay each worker's request trace through a mock engine in real-time to
/// produce the KV cache events (store/remove/clear) that the engine would emit.
///
/// Returns one event list per worker, each entry paired with the wall-clock
/// instant it was produced.
pub async fn generate_kv_events(
traces: &[Vec<MooncakeRequest>],
num_gpu_blocks: usize,
block_size: u32,
async fn replay_worker_trace(
trace: Trace,
sched_args: MockEngineArgs,
trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<Vec<(KvCacheEvent, Instant)>>> {
println!("Generating events...");
let sched_args = default_mock_engine_args(num_gpu_blocks, block_size as usize)?;
let scaled_traces = traces
progress: ProgressBar,
) -> anyhow::Result<WorkerReplayArtifacts> {
let total_turns = trace
.sessions
.iter()
.map(|worker_trace| scale_mooncake_trace(worker_trace, trace_simulation_duration_ms));
let progress = make_progress_bar(Some(
traces.iter().map(|worker| worker.len() as u64).sum::<u64>(),
));
let mut tasks: Vec<JoinHandle<Vec<(KvCacheEvent, Instant)>>> = Vec::new();
for worker_trace in scaled_traces {
let sched_args = sched_args.clone();
let progress = progress.clone();
tasks.push(tokio::spawn(async move {
.map(|session| session.turns.len())
.sum::<usize>();
let mut driver = trace
.rescale_ready_span(trace_simulation_duration_ms)?
.into_trace_driver()?;
let collector = EventCollector::new();
let (output_tx, mut output_rx) = mpsc::unbounded_channel::<OutputSignal>();
let scheduler = Scheduler::new(
sched_args,
0,
None,
Some(output_tx),
KvEventPublishers::new(Some(collector.clone()), None),
None,
);
let start = Instant::now();
let mut requests = Vec::with_capacity(total_turns);
let mut output_signals = Vec::new();
let mut completed_turns = 0usize;
while completed_turns < total_turns {
let now_ms = start.elapsed().as_secs_f64() * 1000.0;
for ready_turn in driver.pop_ready(now_ms, usize::MAX) {
let replay_hashes = ready_turn.replay_hashes.ok_or_else(|| {
anyhow::anyhow!("bench replay requires synthesized request hashes")
})?;
requests.push(TimedReplayRequest {
uuid: ready_turn.request_uuid,
timestamp_us: start.elapsed().as_micros() as u64,
scheduled_ready_at_ms: ready_turn.scheduled_ready_at_ms,
input_length: ready_turn.request.tokens.len(),
output_length: ready_turn.request.max_output_tokens,
replay_hashes,
});
scheduler.receive(ready_turn.request);
progress.inc(1);
}
let mut i = 0;
let mut target = Instant::now();
if completed_turns >= total_turns {
break;
}
while i < worker_trace.len() {
let prev_i = i;
scheduler.receive(DirectRequest {
tokens: tokens_from_request(&worker_trace[i], block_size),
max_output_tokens: worker_trace[i].output_length as usize,
uuid: Some(worker_trace[i].uuid),
dp_rank: 0,
arrival_timestamp_ms: None,
match driver.next_ready_time_ms() {
Some(next_ready_ms) => {
let deadline = start + Duration::from_secs_f64((next_ready_ms.max(0.0)) / 1000.0);
tokio::select! {
maybe_signal = output_rx.recv() => {
let Some(signal) = maybe_signal else {
anyhow::bail!("scheduler ended before workload replay drained");
};
output_signals.push(TimedOutputSignal {
signal: signal.clone(),
timestamp_us: start.elapsed().as_micros() as u64,
});
i += 1;
while i < worker_trace.len()
&& worker_trace[i].timestamp == worker_trace[i - 1].timestamp
{
scheduler.receive(DirectRequest {
tokens: tokens_from_request(&worker_trace[i], block_size),
max_output_tokens: worker_trace[i].output_length as usize,
uuid: Some(worker_trace[i].uuid),
dp_rank: 0,
arrival_timestamp_ms: None,
if signal.completed {
completed_turns += 1;
driver.on_complete(signal.uuid, start.elapsed().as_secs_f64() * 1000.0)?;
}
}
_ = tokio::time::sleep_until(deadline) => {}
}
}
None => {
let Some(signal) = output_rx.recv().await else {
anyhow::bail!("scheduler ended before workload replay drained");
};
output_signals.push(TimedOutputSignal {
signal: signal.clone(),
timestamp_us: start.elapsed().as_micros() as u64,
});
i += 1;
if signal.completed {
completed_turns += 1;
driver.on_complete(signal.uuid, start.elapsed().as_secs_f64() * 1000.0)?;
}
if i < worker_trace.len() {
target += Duration::from_millis(
worker_trace[i].timestamp - worker_trace[i - 1].timestamp,
);
}
tokio::time::sleep_until(target).await;
progress.inc((i - prev_i) as u64);
}
}
drop(scheduler);
Ok(WorkerReplayArtifacts {
requests,
output_signals,
kv_events: collector
.get_events()
.into_iter()
.map(|(event, timestamp)| TimedKvEvent {
event,
timestamp_us: timestamp.saturating_duration_since(start).as_micros() as u64,
})
.collect(),
})
}
pub async fn generate_replay_artifacts(
traces: &[Trace],
num_gpu_blocks: usize,
block_size: u32,
trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<WorkerReplayArtifacts>> {
println!("Generating events...");
let sched_args = default_mock_engine_args(num_gpu_blocks, block_size as usize)?;
let progress = make_progress_bar(Some(
traces
.iter()
.map(|trace| {
trace
.sessions
.iter()
.map(|session| session.turns.len() as u64)
.sum::<u64>()
})
.sum::<u64>(),
));
collector.get_events()
let mut tasks: Vec<JoinHandle<anyhow::Result<WorkerReplayArtifacts>>> = Vec::new();
for trace in traces.iter().cloned() {
let sched_args = sched_args.clone();
let progress = progress.clone();
tasks.push(tokio::spawn(async move {
replay_worker_trace(trace, sched_args, trace_simulation_duration_ms, progress).await
}));
}
let mut events = Vec::new();
let mut artifacts = Vec::new();
for task in tasks {
events.push(task.await?);
artifacts.push(task.await??);
}
for worker_events in &events {
for worker_events in artifacts.iter().map(|artifact| &artifact.kv_events) {
for i in 1..worker_events.len() {
assert!(worker_events[i].1 >= worker_events[i - 1].1);
assert!(worker_events[i].timestamp_us >= worker_events[i - 1].timestamp_us);
}
}
println!(
"Generated {} events. Processing...",
events.iter().map(|e| e.len()).sum::<usize>()
artifacts
.iter()
.map(|artifact| artifact.kv_events.len())
.sum::<usize>()
);
if progress.elapsed() > Duration::from_millis(trace_simulation_duration_ms * 11 / 10) {
......@@ -432,8 +533,11 @@ pub async fn generate_kv_events(
let mut num_stored_events = 0;
let mut num_removed_events = 0;
for event in events.iter().flatten() {
match event.0.data {
for event in artifacts
.iter()
.flat_map(|artifact| artifact.kv_events.iter())
{
match event.event.data {
KvCacheEventData::Stored(_) => num_stored_events += 1,
KvCacheEventData::Removed(_) => num_removed_events += 1,
_ => (),
......@@ -443,7 +547,25 @@ pub async fn generate_kv_events(
println!("Store events: {}", num_stored_events);
println!("Remove events: {}", num_removed_events);
Ok(events)
Ok(artifacts)
}
pub async fn generate_kv_events(
traces: &[Trace],
num_gpu_blocks: usize,
block_size: u32,
trace_simulation_duration_ms: u64,
) -> anyhow::Result<Vec<Vec<TimedKvEvent>>> {
Ok(generate_replay_artifacts(
traces,
num_gpu_blocks,
block_size,
trace_simulation_duration_ms,
)
.await?
.into_iter()
.map(|artifact| artifact.kv_events)
.collect())
}
pub fn plot_sweep(
......@@ -591,6 +713,16 @@ pub struct SequenceData {
pub external_hashes: Vec<ExternalSequenceBlockHash>,
}
impl From<RouterSequence> for SequenceData {
fn from(sequence: RouterSequence) -> Self {
Self {
worker_id: sequence.worker_id,
local_hashes: sequence.local_hashes,
external_hashes: sequence.external_hashes,
}
}
}
impl SequenceData {
/// Create a new sequence with synthetic hashes based on sequence ID.
pub fn new(seq_id: u64, worker_id: WorkerId, depth: usize) -> Self {
......@@ -673,58 +805,46 @@ pub fn generate_sequences(
seed: u64,
use_cumulative_hash: bool,
) -> Vec<SequenceData> {
let mut sequences = Vec::with_capacity(num_sequences);
let prefix_length = (depth as f64 * prefix_ratio).round() as usize;
let mut rng: StdRng = StdRng::seed_from_u64(seed);
for seq_id in 0..num_sequences {
let seq_id_u64 = seq_id as u64;
let worker_id = (seq_id % num_workers) as WorkerId;
let group_id = if num_prefix_groups > 0 && prefix_length > 0 {
Some(rng.random_range(0..num_prefix_groups) as u64)
let trace = Trace::synthetic(SyntheticTraceSpec {
block_size: 1,
num_sessions: num_sequences,
turns_per_session: 1,
input_tokens: LengthSpec {
mean: depth,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: 1,
stddev: 0.0,
},
shared_prefix_ratio: prefix_ratio,
num_prefix_groups,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: DelaySpec::None,
seed,
})
.expect("sequence generation spec must be valid");
let hash_mode = if use_cumulative_hash {
SequenceHashMode::Cumulative
} else {
None
SequenceHashMode::Raw
};
let local_hashes: Vec<LocalBlockHash> = (0..depth)
.map(|block_idx| {
let block_idx_u64 = block_idx as u64;
if let Some(gid) = group_id
&& block_idx < prefix_length
{
return LocalBlockHash(0xDEAD_BEEF_0000_0000 | (gid << 32) | block_idx_u64);
}
LocalBlockHash((seq_id_u64 << 32) | block_idx_u64)
trace
.partition_by_session(SessionPartitionSpec::RoundRobin {
num_partitions: num_workers,
})
.collect();
if use_cumulative_hash {
sequences.push(SequenceData::from_local_hashes(worker_id, local_hashes));
} else {
let external_hashes: Vec<ExternalSequenceBlockHash> = (0..depth)
.map(|block_idx| {
let block_idx_u64 = block_idx as u64;
if let Some(gid) = group_id
&& block_idx < prefix_length
{
return ExternalSequenceBlockHash(
0xDEAD_BEEF_0000_0000 | (gid << 32) | block_idx_u64,
);
}
ExternalSequenceBlockHash((seq_id_u64 << 32) | block_idx_u64)
.into_iter()
.enumerate()
.flat_map(|(worker_idx, partition)| {
partition
.to_router_sequences(worker_idx as WorkerId, hash_mode)
.expect("synthetic trace conversion must succeed")
.into_iter()
.map(SequenceData::from)
.collect::<Vec<_>>()
})
.collect();
sequences.push(SequenceData {
worker_id,
local_hashes,
external_hashes,
});
}
}
sequences
.collect()
}
/// Compute median of durations.
......@@ -736,3 +856,60 @@ pub fn median(durations: &[Duration]) -> Duration {
sorted.sort();
sorted[sorted.len() / 2]
}
#[cfg(test)]
mod tests {
use super::*;
fn multiturn_trace() -> Trace {
Trace {
block_size: 2,
sessions: vec![dynamo_mocker::loadgen::SessionTrace {
session_id: "session-a".to_string(),
first_arrival_timestamp_ms: Some(0.0),
turns: vec![
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
},
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![3, 4],
delay_after_previous_ms: 5.0,
},
],
}],
}
}
#[tokio::test]
async fn test_replay_worker_trace_releases_follow_up_turn_after_completion_delay() {
let artifacts = replay_worker_trace(
multiturn_trace(),
default_mock_engine_args(1024, 2).unwrap(),
5,
make_progress_bar(Some(2)),
)
.await
.unwrap();
assert_eq!(artifacts.requests.len(), 2);
let first_uuid = artifacts.requests[0].uuid;
let first_completion_ms = artifacts
.output_signals
.iter()
.find(|signal| signal.signal.uuid == first_uuid && signal.signal.completed)
.unwrap()
.timestamp_us as f64
/ 1000.0;
assert!(
artifacts.requests[1].scheduled_ready_at_ms + 0.1 >= first_completion_ms + 5.0,
"expected follow-up turn to wait for completion plus delay, got ready_at={} completion_at={}",
artifacts.requests[1].scheduled_ready_at_ms,
first_completion_ms
);
}
}
......@@ -14,6 +14,7 @@ use dynamo_kv_router::protocols::{KvCacheEvent, KvCacheEventData, RouterEvent};
use dynamo_kv_router::{
ConcurrentRadixTree, ConcurrentRadixTreeCompressed, PositionalIndexer, ThreadPoolIndexer,
};
use dynamo_mocker::loadgen::Trace;
use serde::Serialize;
use std::sync::Arc;
use tokio::time::{Duration, Instant};
......@@ -194,68 +195,33 @@ struct WorkerTrace {
/// Timestamps are rescaled from the original trace / simulation durations
/// into the benchmark duration (microseconds).
fn prepare_worker_traces(
traces: Vec<Vec<MooncakeRequest>>,
events: Vec<Vec<(KvCacheEvent, Instant)>>,
block_size: u32,
artifacts: Vec<WorkerReplayArtifacts>,
benchmark_duration_ms: u64,
trace_simulation_duration_ms: u64,
) -> Vec<Vec<WorkerTrace>> {
assert!(traces.len() == events.len());
let scaled_request_traces: Vec<_> = traces
artifacts
.into_iter()
.map(|trace| {
let Some(first) = trace.first() else {
return Vec::new();
};
let first_ts = first.timestamp;
let trace_duration_ms = trace.last().unwrap().timestamp - first_ts;
trace
.map(|artifact| {
let mut merged = artifact
.requests
.into_iter()
.map(|request| WorkerTrace {
timestamp_us: if trace_duration_ms == 0 {
timestamp_us: request.timestamp_us,
entry: WorkerTraceEntry::Request(request.replay_hashes.local_block_hashes),
})
.chain(artifact.kv_events.into_iter().map(|event| WorkerTrace {
timestamp_us: event.timestamp_us,
entry: WorkerTraceEntry::Event(event.event),
}))
.collect::<Vec<_>>();
merged.sort_by_key(|entry| entry.timestamp_us);
let max_timestamp_us = merged.last().map(|entry| entry.timestamp_us).unwrap_or(0);
for entry in &mut merged {
entry.timestamp_us = if max_timestamp_us == 0 {
0
} else {
(request.timestamp - first_ts) * 1000 * benchmark_duration_ms
/ trace_duration_ms
},
entry: WorkerTraceEntry::Request(
request
.hash_ids
.iter()
.map(|id| local_block_hash_from_id(*id, block_size))
.collect(),
),
})
.collect::<Vec<_>>()
})
.collect();
let scaled_event_traces: Vec<_> = events
.into_iter()
.map(|worker_events| {
let Some(&(_, start_instant)) = worker_events.first() else {
return Vec::new();
entry.timestamp_us * benchmark_duration_ms * 1000 / max_timestamp_us
};
worker_events
.into_iter()
.map(|(event, timestamp)| WorkerTrace {
timestamp_us: (timestamp - start_instant).as_micros() as u64
* benchmark_duration_ms
/ trace_simulation_duration_ms,
entry: WorkerTraceEntry::Event(event),
})
.collect::<Vec<_>>()
})
.collect();
scaled_request_traces
.into_iter()
.zip(scaled_event_traces)
.map(|(request_trace, event_trace)| {
let mut merged: Vec<WorkerTrace> =
request_trace.into_iter().chain(event_trace).collect();
merged.sort_by_key(|entry| entry.timestamp_us);
}
merged
})
.collect()
......@@ -276,19 +242,12 @@ struct SweepStepResult {
/// flushed and latency percentiles / throughput stats are printed.
async fn run_benchmark(
indexer: Arc<dyn KvIndexerInterface + Send + Sync>,
traces: Vec<Vec<MooncakeRequest>>,
events: Vec<Vec<(KvCacheEvent, Instant)>>,
artifacts: Vec<WorkerReplayArtifacts>,
args: &Args,
benchmark_duration_ms: u64,
count_events: bool,
) -> anyhow::Result<BenchmarkResults> {
let worker_traces = prepare_worker_traces(
traces,
events,
args.common.block_size,
benchmark_duration_ms,
args.common.trace_simulation_duration_ms,
);
let worker_traces = prepare_worker_traces(artifacts, benchmark_duration_ms);
let worker_traces = worker_traces.into_iter().map(Arc::new).collect::<Vec<_>>();
let progress = make_progress_bar(Some(
......@@ -460,7 +419,7 @@ async fn run_benchmark(
})
}
fn run_tests() -> anyhow::Result<()> {
async fn run_tests() -> anyhow::Result<()> {
use std::collections::HashSet;
use std::fs::File;
use std::io::Write;
......@@ -479,6 +438,7 @@ fn run_tests() -> anyhow::Result<()> {
"{}",
serde_json::json!({
"timestamp": i as u64,
"input_length": hash_ids.len(),
"hash_ids": hash_ids,
"output_length": output_length,
})
......@@ -486,12 +446,13 @@ fn run_tests() -> anyhow::Result<()> {
}
}
let traces = process_mooncake_trace(path.to_str().unwrap(), 2, 2, 2, 42)?;
let traces = process_mooncake_trace(path.to_str().unwrap(), 512, 2, 2, 2, 42)?;
std::fs::remove_file(&path).ok();
let mut all_hashes: Vec<Vec<u64>> = traces
.into_iter()
.flat_map(|w| w.into_iter().map(|r| r.hash_ids))
.flat_map(|worker| worker.sessions.into_iter())
.flat_map(|session| session.turns.into_iter().map(|turn| turn.hash_ids))
.collect();
all_hashes.sort();
......@@ -519,6 +480,43 @@ fn run_tests() -> anyhow::Result<()> {
let set1: HashSet<u64> = copy1.iter().flat_map(|h| h.iter().copied()).collect();
assert!(set0.is_disjoint(&set1), "copies are not hash-disjoint");
let replay_trace = Trace {
block_size: 2,
sessions: vec![dynamo_mocker::loadgen::SessionTrace {
session_id: "session-a".to_string(),
first_arrival_timestamp_ms: Some(0.0),
turns: vec![
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
},
dynamo_mocker::loadgen::TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![3, 4],
delay_after_previous_ms: 5.0,
},
],
}],
};
let artifacts = generate_replay_artifacts(&[replay_trace], 1024, 2, 5).await?;
assert_eq!(artifacts.len(), 1);
assert_eq!(artifacts[0].requests.len(), 2);
let first_uuid = artifacts[0].requests[0].uuid;
let first_completion_ms = artifacts[0]
.output_signals
.iter()
.find(|signal| signal.signal.uuid == first_uuid && signal.signal.completed)
.expect("first request must complete")
.timestamp_us as f64
/ 1000.0;
assert!(
artifacts[0].requests[1].scheduled_ready_at_ms + 0.1 >= first_completion_ms + 5.0,
"expected second request to wait for completion plus delay"
);
println!("All tests passed.");
Ok(())
}
......@@ -528,7 +526,7 @@ async fn main() -> anyhow::Result<()> {
let args = Args::parse();
if args.common.test {
return run_tests();
return run_tests().await;
}
let path = match args.common.mooncake_trace_path.as_deref() {
......@@ -540,12 +538,13 @@ async fn main() -> anyhow::Result<()> {
};
let traces = process_mooncake_trace(
path,
args.common.block_size,
args.common.trace_length_factor,
args.common.trace_duplication_factor,
args.common.num_unique_inference_workers,
args.common.seed,
)?;
let events = generate_kv_events(
let artifacts = generate_replay_artifacts(
&traces,
args.common.num_gpu_blocks,
args.common.block_size,
......@@ -599,15 +598,8 @@ async fn main() -> anyhow::Result<()> {
IndexerArgs::from_name(name, args.common.block_size, args.num_event_workers)?
};
let count_events = IndexerArgs::supports_remove(name);
let result = run_benchmark(
indexer,
traces.clone(),
events.clone(),
&args,
dur_ms,
count_events,
)
.await?;
let result =
run_benchmark(indexer, artifacts.clone(), &args, dur_ms, count_events).await?;
if multi_threaded {
if result.block_throughput >= result.offered_block_throughput * 0.95 {
......@@ -674,8 +666,7 @@ async fn main() -> anyhow::Result<()> {
let count_events = IndexerArgs::supports_remove(name);
run_benchmark(
indexer,
traces.clone(),
events.clone(),
artifacts.clone(),
&args,
args.common.benchmark_duration_ms,
count_events,
......
......@@ -13,6 +13,9 @@
use anyhow::{Context, Result};
use clap::Parser;
use dynamo_bench::common::{ChatMessage, LatencyStats, fetch_model_name};
use dynamo_mocker::loadgen::{
ArrivalSpec, DelaySpec, LengthSpec, SessionTrace, SyntheticTraceSpec, Trace,
};
use futures_util::StreamExt;
use indicatif::{ProgressBar, ProgressStyle};
use rand::rngs::StdRng;
......@@ -283,10 +286,10 @@ async fn run_user(
model: String,
args: Arc<Args>,
user_id: usize,
session: SessionTrace,
progress: ProgressBar,
) -> Vec<TurnResult> {
let mut rng = StdRng::seed_from_u64(args.seed.wrapping_add(user_id as u64));
let mean_delay = args.mean_delay_ms as f64;
let system_prompt = generate_system_prompt(user_id);
let mut messages = vec![ChatMessage {
......@@ -294,11 +297,10 @@ async fn run_user(
content: system_prompt,
}];
let mut results = Vec::with_capacity(args.num_turns);
let mut results = Vec::with_capacity(session.turns.len());
for turn in 0..args.num_turns {
// Generate user prompt
let user_text = generate_lorem(&mut rng, args.num_user_tokens);
for (turn, turn_spec) in session.turns.iter().enumerate() {
let user_text = generate_lorem(&mut rng, turn_spec.input_length);
messages.push(ChatMessage {
role: "user".to_string(),
content: user_text,
......@@ -307,7 +309,7 @@ async fn run_user(
let body = MultiturnRequest {
model: model.clone(),
messages: messages.clone(),
max_completion_tokens: args.max_completion_tokens,
max_completion_tokens: turn_spec.max_output_tokens as u32,
ignore_eos: if args.ignore_eos { Some(true) } else { None },
stream: true,
nvext: if args.speculative_prefill {
......@@ -392,7 +394,7 @@ async fn run_user(
" [user {}][turn {}/{}] ttft={:.1}ms total={:.1}s ok={}",
user_id,
turn + 1,
args.num_turns,
session.turns.len(),
result.ttft_us as f64 / 1000.0,
result.total_latency_us as f64 / 1_000_000.0,
result.success,
......@@ -404,10 +406,13 @@ async fn run_user(
// Exponential inter-turn delay (skip after last turn)
// Exp(1/mean) = -mean * ln(U), U ~ Uniform(0,1)
if turn + 1 < args.num_turns {
let u: f64 = rng.random();
let delay_ms = (-mean_delay * u.ln()).max(0.0);
tokio::time::sleep(Duration::from_millis(delay_ms as u64)).await;
if let Some(next_turn) = session.turns.get(turn + 1)
&& next_turn.delay_after_previous_ms > 0.0
{
tokio::time::sleep(Duration::from_secs_f64(
next_turn.delay_after_previous_ms / 1000.0,
))
.await;
}
}
......@@ -569,6 +574,32 @@ async fn main() -> Result<()> {
.build()
.context("Failed to create HTTP client")?;
let workload = Trace::synthetic(SyntheticTraceSpec {
block_size: 1,
num_sessions: args.num_users,
turns_per_session: args.num_turns,
input_tokens: LengthSpec {
mean: args.num_user_tokens,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: args.max_completion_tokens as usize,
stddev: 0.0,
},
shared_prefix_ratio: 0.0,
num_prefix_groups: 0,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: if args.mean_delay_ms == 0 {
DelaySpec::None
} else {
DelaySpec::ExponentialMs {
mean_ms: args.mean_delay_ms as f64,
}
},
seed: args.seed,
})?;
let sessions = workload.sessions;
let args = Arc::new(args);
let chat_url = format!("{}/v1/chat/completions", args.url);
......@@ -592,14 +623,18 @@ async fn main() -> Result<()> {
.progress_chars("#>-"),
);
let handles: Vec<_> = (0..args.num_users)
.map(|user_id| {
let handles: Vec<_> = sessions
.into_iter()
.enumerate()
.map(|(user_id, session)| {
let client = client.clone();
let url = chat_url.clone();
let model = model.clone();
let args = args.clone();
let progress = progress.clone();
tokio::spawn(async move { run_user(client, url, model, args, user_id, progress).await })
tokio::spawn(async move {
run_user(client, url, model, args, user_id, session, progress).await
})
})
.collect();
......
......@@ -10,6 +10,9 @@ use dynamo_mocker::common::protocols::{
PreemptionMode as RsPreemptionMode, ReasoningConfig as RsReasoningConfig,
SglangArgs as RsSglangArgs, WorkerType as RsWorkerType,
};
use dynamo_mocker::loadgen::{
ArrivalSpec, DelaySpec, LengthSpec, SyntheticTraceSpec, Trace as RsTrace,
};
use pyo3::{exceptions::PyException, prelude::*};
use pythonize::pythonize;
use uuid::Uuid;
......@@ -356,7 +359,7 @@ pub fn run_mocker_trace_replay(
}
#[pyfunction]
#[pyo3(signature = (input_tokens, output_tokens, request_count, extra_engine_args=None, router_config=None, num_workers=1, replay_concurrency=None, replay_mode="offline", router_mode="round_robin", arrival_speedup_ratio=1.0, arrival_interval_ms=1.0))]
#[pyo3(signature = (input_tokens, output_tokens, request_count, extra_engine_args=None, router_config=None, num_workers=1, replay_concurrency=None, replay_mode="offline", router_mode="round_robin", arrival_speedup_ratio=1.0, arrival_interval_ms=1.0, turns_per_session=1, shared_prefix_ratio=0.0, num_prefix_groups=0, inter_turn_delay_ms=0.0))]
#[allow(clippy::too_many_arguments)]
pub fn run_mocker_synthetic_trace_replay(
py: Python<'_>,
......@@ -371,6 +374,10 @@ pub fn run_mocker_synthetic_trace_replay(
router_mode: &str,
arrival_speedup_ratio: f64,
arrival_interval_ms: f64,
turns_per_session: usize,
shared_prefix_ratio: f64,
num_prefix_groups: usize,
inter_turn_delay_ms: f64,
) -> PyResult<PyObject> {
let args = load_replay_mocker_args(py, extra_engine_args)?;
let router_config = load_replay_router_config(router_config);
......@@ -378,6 +385,73 @@ pub fn run_mocker_synthetic_trace_replay(
let router_mode = parse_replay_router_mode(router_mode)?;
let report = py.allow_threads(move || {
let replay_concurrency = parse_replay_concurrency(replay_concurrency)?;
let use_workload = turns_per_session > 1
|| shared_prefix_ratio > 0.0
|| num_prefix_groups > 0
|| inter_turn_delay_ms > 0.0;
if use_workload {
let mut trace = build_synthetic_workload(
args.block_size.max(1),
input_tokens,
output_tokens,
request_count,
arrival_interval_ms,
turns_per_session,
shared_prefix_ratio,
num_prefix_groups,
inter_turn_delay_ms,
)?;
if replay_concurrency.is_none() {
trace = trace.speed_up_timing(arrival_speedup_ratio)?;
}
return match (replay_mode.as_str(), replay_concurrency) {
("offline", Some(max_in_flight)) => {
dynamo_mocker::replay::simulate_concurrency_workload_with_router_mode(
args,
router_config.clone(),
trace,
max_in_flight,
num_workers,
router_mode,
)
}
("offline", None) => {
dynamo_mocker::replay::simulate_trace_workload_with_router_mode(
args,
router_config.clone(),
trace,
num_workers,
router_mode,
)
}
("online", Some(max_in_flight)) => {
dynamo_mocker::replay::simulate_concurrency_live_workload_with_router_mode(
args,
router_config.clone(),
trace,
max_in_flight,
num_workers,
router_mode,
)
}
("online", None) => {
dynamo_mocker::replay::simulate_trace_live_workload_with_router_mode(
args,
router_config.clone(),
trace,
num_workers,
router_mode,
)
}
(other, _) => anyhow::bail!(
"replay_mode must be either 'offline' or 'online', got '{}'",
other
),
};
}
let requests = build_synthetic_requests(
input_tokens,
output_tokens,
......@@ -509,6 +583,69 @@ fn parse_replay_concurrency(replay_concurrency: Option<isize>) -> anyhow::Result
}
}
#[allow(clippy::too_many_arguments)]
fn build_synthetic_workload(
block_size: usize,
input_tokens: usize,
output_tokens: usize,
request_count: usize,
arrival_interval_ms: f64,
turns_per_session: usize,
shared_prefix_ratio: f64,
num_prefix_groups: usize,
inter_turn_delay_ms: f64,
) -> anyhow::Result<RsTrace> {
if input_tokens == 0 {
anyhow::bail!("input_tokens must be at least 1");
}
if output_tokens == 0 {
anyhow::bail!("output_tokens must be at least 1");
}
if request_count == 0 {
anyhow::bail!("request_count must be at least 1");
}
if turns_per_session == 0 {
anyhow::bail!("turns_per_session must be at least 1");
}
if !arrival_interval_ms.is_finite() || arrival_interval_ms < 0.0 {
anyhow::bail!("arrival_interval_ms must be a finite non-negative number");
}
if !inter_turn_delay_ms.is_finite() || inter_turn_delay_ms < 0.0 {
anyhow::bail!("inter_turn_delay_ms must be a finite non-negative number");
}
let first_turn_arrivals = if arrival_interval_ms == 0.0 {
ArrivalSpec::Burst
} else {
ArrivalSpec::ConstantQps {
qps: 1000.0 / arrival_interval_ms,
}
};
RsTrace::synthetic(SyntheticTraceSpec {
block_size,
num_sessions: request_count,
turns_per_session,
input_tokens: LengthSpec {
mean: input_tokens,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: output_tokens,
stddev: 0.0,
},
shared_prefix_ratio,
num_prefix_groups,
first_turn_arrivals,
inter_turn_delays: if inter_turn_delay_ms == 0.0 {
DelaySpec::None
} else {
DelaySpec::ConstantMs(inter_turn_delay_ms)
},
seed: 42,
})
}
fn build_synthetic_requests(
input_tokens: usize,
output_tokens: usize,
......
......@@ -1388,6 +1388,10 @@ def run_mocker_synthetic_trace_replay(
router_mode: Literal["round_robin", "kv_router"] = "round_robin",
arrival_speedup_ratio: float = 1.0,
arrival_interval_ms: float = 1.0,
turns_per_session: int = 1,
shared_prefix_ratio: float = 0.0,
num_prefix_groups: int = 0,
inter_turn_delay_ms: float = 0.0,
) -> Dict[str, Any]:
"""Replay a synthetic mocker workload without requiring a trace file."""
...
......
......@@ -43,6 +43,10 @@ def run_synthetic_trace_replay(
router_mode="round_robin",
arrival_speedup_ratio=1.0,
arrival_interval_ms=1.0,
turns_per_session=1,
shared_prefix_ratio=0.0,
num_prefix_groups=0,
inter_turn_delay_ms=0.0,
):
return _run_mocker_synthetic_trace_replay(
input_tokens,
......@@ -56,4 +60,8 @@ def run_synthetic_trace_replay(
router_mode=router_mode,
arrival_speedup_ratio=arrival_speedup_ratio,
arrival_interval_ms=arrival_interval_ms,
turns_per_session=turns_per_session,
shared_prefix_ratio=shared_prefix_ratio,
num_prefix_groups=num_prefix_groups,
inter_turn_delay_ms=inter_turn_delay_ms,
)
......@@ -22,8 +22,16 @@ def main(argv: Sequence[str] | None = None) -> int:
parser.add_argument("--router-config")
parser.add_argument("--input-tokens", type=int)
parser.add_argument("--output-tokens", type=int)
parser.add_argument("--request-count", type=int)
parser.add_argument(
"--request-count",
type=int,
help="number of synthetic requests; when --turns-per-session > 1, this is the number of sessions",
)
parser.add_argument("--arrival-interval-ms", type=float, default=1.0)
parser.add_argument("--turns-per-session", type=int, default=1)
parser.add_argument("--shared-prefix-ratio", type=float, default=0.0)
parser.add_argument("--num-prefix-groups", type=int, default=0)
parser.add_argument("--inter-turn-delay-ms", type=float, default=0.0)
parser.add_argument("--num-workers", type=int, default=1)
parser.add_argument("--replay-concurrency", type=int)
parser.add_argument(
......@@ -45,7 +53,14 @@ def main(argv: Sequence[str] | None = None) -> int:
using_trace_file = args.trace_file is not None
synthetic_args = (args.input_tokens, args.output_tokens, args.request_count)
using_synthetic = any(value is not None for value in synthetic_args)
using_synthetic = any(value is not None for value in synthetic_args) or any(
(
args.turns_per_session != 1,
args.shared_prefix_ratio != 0.0,
args.num_prefix_groups != 0,
args.inter_turn_delay_ms != 0.0,
)
)
if using_trace_file == using_synthetic:
parser.error(
......@@ -91,6 +106,10 @@ def main(argv: Sequence[str] | None = None) -> int:
router_mode=args.router_mode,
arrival_speedup_ratio=args.arrival_speedup_ratio,
arrival_interval_ms=args.arrival_interval_ms,
turns_per_session=args.turns_per_session,
shared_prefix_ratio=args.shared_prefix_ratio,
num_prefix_groups=args.num_prefix_groups,
inter_turn_delay_ms=args.inter_turn_delay_ms,
)
report_path = write_report_json(report, args.report_json)
......
......@@ -110,6 +110,45 @@ def _write_trace_and_args(tmp_path):
return trace_path
def _write_multiturn_trace(tmp_path):
trace_path = tmp_path / "multiturn_trace.jsonl"
records = [
{
"session_id": "session-a",
"timestamp": 1000.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [101],
},
{
"session_id": "session-b",
"timestamp": 1002.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [202],
},
{
"session_id": "session-a",
"delay": 5.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [303],
},
{
"session_id": "session-b",
"delay": 1.0,
"input_length": 64,
"output_length": 2,
"hash_ids": [404],
},
]
trace_path.write_text(
"\n".join(json.dumps(record) for record in records) + "\n",
encoding="utf-8",
)
return trace_path
def _write_cli_smoke_trace(tmp_path):
trace_path = tmp_path / "cli_smoke_trace.jsonl"
records = []
......@@ -283,6 +322,26 @@ def test_run_trace_replay_invariant_counts_match(tmp_path, engine_type, replay_m
assert single[field] == multi_kv_router[field]
@pytest.mark.parametrize("replay_mode", ["offline", "online"])
def test_run_trace_replay_supports_multiturn_sessions(tmp_path, replay_mode):
trace_path = _write_multiturn_trace(tmp_path)
report = run_trace_replay(
trace_path,
extra_engine_args=_vllm_args(),
num_workers=2,
replay_mode=replay_mode,
router_mode="kv_router",
)
_assert_basic_report_counts(
report,
num_requests=4,
input_tokens=64,
output_tokens=2,
)
@pytest.mark.parametrize("engine_type", ["vllm", "sglang"])
@pytest.mark.parametrize("replay_mode", ["offline", "online"])
@pytest.mark.parametrize("router_mode", ["round_robin", "kv_router"])
......@@ -358,6 +417,53 @@ def test_run_synthetic_trace_replay_invariant_counts_match(
assert single[field] == multi_kv_router[field]
@pytest.mark.parametrize("replay_mode", ["offline", "online"])
def test_run_synthetic_trace_replay_supports_multiturn_workloads(tmp_path, replay_mode):
report = run_synthetic_trace_replay(
64,
2,
3,
extra_engine_args=_vllm_args(),
num_workers=2,
replay_mode=replay_mode,
router_mode="kv_router",
turns_per_session=2,
inter_turn_delay_ms=5.0,
shared_prefix_ratio=0.5,
num_prefix_groups=2,
)
_assert_basic_report_counts(
report,
num_requests=6,
input_tokens=64,
output_tokens=2,
)
@pytest.mark.parametrize(
("input_tokens", "output_tokens", "expected_message"),
[
(0, 2, "input_tokens must be at least 1"),
(2, 0, "output_tokens must be at least 1"),
],
)
def test_run_synthetic_trace_replay_workload_validates_zero_token_lengths(
input_tokens, output_tokens, expected_message
):
with pytest.raises(Exception, match=expected_message):
run_synthetic_trace_replay(
input_tokens,
output_tokens,
2,
extra_engine_args=_vllm_args(),
num_workers=2,
replay_mode="offline",
router_mode="kv_router",
turns_per_session=2,
)
@pytest.mark.parametrize("engine_type", ["vllm", "sglang"])
@pytest.mark.parametrize("replay_mode", ["offline", "online"])
def test_run_synthetic_concurrency_replay_counts_match(
......@@ -551,6 +657,48 @@ def test_replay_cli_prints_table_and_saves_json(tmp_path, monkeypatch, capsys):
assert json.loads(report_path.read_text(encoding="utf-8")) == report
def test_replay_cli_passes_multiturn_workload_kwargs(monkeypatch):
captured = {}
def fake_run(*args, **kwargs):
captured["args"] = args
captured["kwargs"] = kwargs
return {
"completed_requests": 4,
"request_throughput_rps": 1.0,
"output_throughput_tok_s": 1.0,
}
monkeypatch.setattr("dynamo.replay.main.run_synthetic_trace_replay", fake_run)
exit_code = main(
[
"--input-tokens",
"16",
"--output-tokens",
"8",
"--request-count",
"2",
"--turns-per-session",
"2",
"--shared-prefix-ratio",
"0.5",
"--num-prefix-groups",
"3",
"--inter-turn-delay-ms",
"7.0",
]
)
assert exit_code == 0
assert captured["args"] == (16, 8, 2)
assert captured["kwargs"]["turns_per_session"] == 2
assert captured["kwargs"]["shared_prefix_ratio"] == 0.5
assert captured["kwargs"]["num_prefix_groups"] == 3
assert captured["kwargs"]["inter_turn_delay_ms"] == 7.0
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_synthetic_smoke(tmp_path):
report_path = tmp_path / "synthetic_report.json"
......@@ -582,6 +730,45 @@ def test_replay_cli_subprocess_synthetic_smoke(tmp_path):
_assert_basic_report_metrics(report)
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_synthetic_multiturn_smoke(tmp_path):
report_path = tmp_path / "synthetic_multiturn_report.json"
completed = _run_replay_cli(
tmp_path,
"--input-tokens",
"64",
"--output-tokens",
"4",
"--request-count",
"3",
"--turns-per-session",
"2",
"--shared-prefix-ratio",
"0.5",
"--num-prefix-groups",
"2",
"--inter-turn-delay-ms",
"5.0",
"--num-workers",
"2",
"--report-json",
str(report_path),
"--extra-engine-args",
'{"block_size":64,"speedup_ratio":1000.0}',
)
report = _assert_replay_cli_outputs(completed, report_path)
_assert_basic_report_counts(
report,
num_requests=6,
input_tokens=64,
output_tokens=4,
)
_assert_basic_report_metrics(report)
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_trace_smoke(tmp_path):
trace_path = _write_cli_smoke_trace(tmp_path)
report_path = tmp_path / "trace_report.json"
......@@ -609,3 +796,33 @@ def test_replay_cli_subprocess_trace_smoke(tmp_path):
output_tokens=25,
)
_assert_basic_report_metrics(report)
@pytest.mark.timeout(30)
def test_replay_cli_subprocess_multiturn_trace_smoke(tmp_path):
trace_path = _write_multiturn_trace(tmp_path)
report_path = tmp_path / "multiturn_trace_report.json"
completed = _run_replay_cli(
tmp_path,
str(trace_path),
"--replay-mode",
"online",
"--router-mode",
"kv_router",
"--num-workers",
"2",
"--report-json",
str(report_path),
"--extra-engine-args",
'{"block_size":64,"speedup_ratio":1000.0}',
)
report = _assert_replay_cli_outputs(completed, report_path)
_assert_basic_report_counts(
report,
num_requests=4,
input_tokens=64,
output_tokens=2,
)
_assert_basic_report_metrics(report)
......@@ -86,6 +86,9 @@ pub enum SequenceError {
#[error("Failed to publish event: {0}")]
PublishFailed(#[from] anyhow::Error),
#[error("Synchronous mutation requires replica_sync=false")]
SyncMutationRequiresNoReplicaSync,
}
/// Bundled parameters for adding a request to the sequence tracker.
......@@ -364,7 +367,14 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
}
}
pub async fn add_request(&self, req: SequenceRequest) -> Result<(), SequenceError> {
fn ensure_sync_mutation_allowed(&self) -> Result<(), SequenceError> {
if self.replica_sync {
return Err(SequenceError::SyncMutationRequiresNoReplicaSync);
}
Ok(())
}
fn add_request_local(&self, req: SequenceRequest) -> Result<(), SequenceError> {
let SequenceRequest {
request_id,
token_sequence,
......@@ -386,22 +396,6 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
});
}
if self.replica_sync {
let event = ActiveSequenceEvent {
request_id: request_id.clone(),
worker,
data: ActiveSequenceEventData::AddRequest {
token_sequence: token_sequence.clone(),
isl,
overlap,
expected_output_tokens,
},
router_id: self.router_id,
lora_name: lora_name.clone(),
};
self.publisher.publish_event(&event).await?;
}
self.request_to_worker.insert(request_id.clone(), worker);
if let Some(lora) = lora_name {
......@@ -434,12 +428,36 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
Ok(())
}
pub async fn add_request(&self, req: SequenceRequest) -> Result<(), SequenceError> {
if self.replica_sync {
let event = ActiveSequenceEvent {
request_id: req.request_id.clone(),
worker: req.worker,
data: ActiveSequenceEventData::AddRequest {
token_sequence: req.token_sequence.clone(),
isl: req.isl,
overlap: req.overlap,
expected_output_tokens: req.expected_output_tokens,
},
router_id: self.router_id,
lora_name: req.lora_name.clone(),
};
self.publisher.publish_event(&event).await?;
}
self.add_request_local(req)
}
pub fn add_request_sync(&self, req: SequenceRequest) -> Result<(), SequenceError> {
self.ensure_sync_mutation_allowed()?;
self.add_request_local(req)
}
/// Send a mutation to the worker assigned to a request, optionally publishing
/// a replica-sync event and cleaning up request mappings afterward.
async fn mutate_request_worker(
fn mutate_request_worker_local(
&self,
request_id: &RequestId,
event_data: ActiveSequenceEventData,
mutate_fn: impl FnOnce(&mut ActiveSequences, &RequestId),
remove_mapping: bool,
) -> Result<(), SequenceError> {
......@@ -451,22 +469,6 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
request_id: request_id.clone(),
})?;
if self.replica_sync {
let lora_name = self
.request_to_lora
.get(request_id)
.map(|entry| entry.value().clone());
let event = ActiveSequenceEvent {
request_id: request_id.clone(),
worker,
data: event_data,
router_id: self.router_id,
lora_name,
};
self.publisher.publish_event(&event).await?;
}
{
let table = self.workers.read();
let &idx = table
......@@ -487,6 +489,40 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
Ok(())
}
async fn mutate_request_worker(
&self,
request_id: &RequestId,
event_data: ActiveSequenceEventData,
mutate_fn: impl FnOnce(&mut ActiveSequences, &RequestId),
remove_mapping: bool,
) -> Result<(), SequenceError> {
let worker = self
.request_to_worker
.get(request_id)
.map(|entry| *entry)
.ok_or_else(|| SequenceError::RequestNotFound {
request_id: request_id.clone(),
})?;
if self.replica_sync {
let lora_name = self
.request_to_lora
.get(request_id)
.map(|entry| entry.value().clone());
let event = ActiveSequenceEvent {
request_id: request_id.clone(),
worker,
data: event_data,
router_id: self.router_id,
lora_name,
};
self.publisher.publish_event(&event).await?;
}
self.mutate_request_worker_local(request_id, mutate_fn, remove_mapping)
}
/// Free all blocks associated with a request.
///
/// Note: This operation is idempotent. Calling it multiple times for the same request
......@@ -508,6 +544,21 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
.await
}
pub fn free_sync(&self, request_id: &RequestId) -> Result<(), SequenceError> {
self.ensure_sync_mutation_allowed()?;
if !self.request_to_worker.contains_key(request_id) {
tracing::debug!("Request {request_id} not found, already freed (idempotent)");
return Ok(());
}
self.mutate_request_worker_local(
request_id,
|seqs, rid| {
seqs.free(rid);
},
true,
)
}
/// Mark prefill as completed for a request.
///
/// Note: Calling this multiple times for the same request is allowed and will be a no-op
......@@ -527,6 +578,17 @@ impl<P: SequencePublisher + 'static> ActiveSequencesMultiWorker<P> {
.await
}
pub fn mark_prefill_completed_sync(&self, request_id: &RequestId) -> Result<(), SequenceError> {
self.ensure_sync_mutation_allowed()?;
self.mutate_request_worker_local(
request_id,
|seqs, rid| {
seqs.mark_prefill_completed(rid);
},
false,
)
}
/// Add an output block with optional fractional decay weight.
///
/// This is used during generation to track output blocks as they are created.
......
......@@ -40,6 +40,7 @@ use dynamo_llm::model_card::ModelDeploymentCard;
use dynamo_llm::preprocessor::prompt::{
ChatTemplate, ContextMixins, OAIChatLikeRequest, PromptFormatter,
};
use dynamo_mocker::loadgen::RouterSequence;
/// KV Router event subject suffix (appended to Component.subject())
/// Full subject format: namespace.{namespace}.component.{component}.kv-events
......@@ -532,41 +533,34 @@ impl PrefixData {
}
/// Pre-generated sequence data for benchmarking
#[derive(Clone)]
struct SequenceData {
worker_id: WorkerId,
local_hashes: Vec<LocalBlockHash>,
external_hashes: Vec<ExternalSequenceBlockHash>,
}
type SequenceData = RouterSequence;
impl SequenceData {
/// Create a sequence from the exact request content.
fn from_request_content(
fn sequence_from_request_content(
content: &str,
worker_id: WorkerId,
kv_block_size: u32,
tokenizer: &Tokenizer,
prompt_renderer: Option<&PromptRenderer>,
) -> Result<Self> {
) -> Result<SequenceData> {
let (local_hashes, external_hashes) =
compute_hashes_for_content(content, tokenizer, kv_block_size, prompt_renderer)?;
Ok(Self {
Ok(SequenceData {
worker_id,
local_hashes,
external_hashes,
})
}
}
fn to_router_event(&self, event_id: u64) -> RouterEvent {
fn sequence_to_router_event(sequence: &SequenceData, event_id: u64) -> RouterEvent {
let kv_event = KvCacheEvent {
event_id,
data: KvCacheEventData::Stored(KvCacheStoreData {
parent_hash: None,
blocks: self
blocks: sequence
.local_hashes
.iter()
.zip(self.external_hashes.iter())
.zip(sequence.external_hashes.iter())
.map(|(local, ext)| KvCacheStoredBlockData {
block_hash: *ext,
tokens_hash: *local,
......@@ -576,8 +570,7 @@ impl SequenceData {
}),
dp_rank: 0,
};
RouterEvent::new(self.worker_id, kv_event)
}
RouterEvent::new(sequence.worker_id, kv_event)
}
/// Response from the frontend's /health endpoint
......@@ -692,7 +685,7 @@ fn generate_sequences_for_requests(
num_prefix_prompts,
seed,
);
let seq = SequenceData::from_request_content(
let seq = sequence_from_request_content(
&content,
worker_id,
kv_block_size,
......@@ -749,7 +742,7 @@ async fn build_tree_via_nats(
};
for (event_id, seq) in sequences.iter().enumerate() {
let event = seq.to_router_event(event_id as u64);
let event = sequence_to_router_event(seq, event_id as u64);
let data = encode_event_with_envelope(&event, KV_EVENT_SUBJECT)?;
nats_client
.publish(subject.clone(), data.into())
......@@ -1165,7 +1158,7 @@ async fn publish_events_at_rate(
while start.elapsed() < duration {
let seq = &sequences[(event_id as usize) % sequences.len()];
let event = seq.to_router_event(event_id);
let event = sequence_to_router_event(seq, event_id);
match encode_event_with_envelope(&event, KV_EVENT_SUBJECT) {
Ok(data) => {
......
......@@ -11,5 +11,6 @@ pub mod cache;
pub mod common;
pub mod engine;
pub mod kv_manager;
pub mod loadgen;
pub mod replay;
pub mod scheduler;
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use std::cmp::Ordering;
use std::collections::{BinaryHeap, HashMap};
use anyhow::{Result, anyhow, bail};
use uuid::Uuid;
use super::types::{ReadyTurn, Trace, TurnTrace};
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
enum DriverMode {
Trace,
Concurrency,
}
#[derive(Debug)]
struct SessionRuntime {
session_id: String,
turns: Vec<TurnTrace>,
next_turn_index: usize,
next_ready_at_ms: Option<f64>,
in_flight: Option<Uuid>,
}
#[derive(Debug)]
struct InFlightTurn {
session_index: usize,
turn_index: usize,
}
#[derive(Debug, Clone, Copy)]
struct ReadySession {
ready_at_ms: f64,
session_index: usize,
turn_index: usize,
}
impl PartialEq for ReadySession {
fn eq(&self, other: &Self) -> bool {
self.ready_at_ms.to_bits() == other.ready_at_ms.to_bits()
&& self.session_index == other.session_index
&& self.turn_index == other.turn_index
}
}
impl Eq for ReadySession {}
impl Ord for ReadySession {
fn cmp(&self, other: &Self) -> Ordering {
other
.ready_at_ms
.total_cmp(&self.ready_at_ms)
.then_with(|| other.session_index.cmp(&self.session_index))
.then_with(|| other.turn_index.cmp(&self.turn_index))
}
}
impl PartialOrd for ReadySession {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}
#[derive(Debug)]
pub struct WorkloadDriver {
mode: DriverMode,
block_size: usize,
sessions: Vec<SessionRuntime>,
in_flight: HashMap<Uuid, InFlightTurn>,
ready_sessions: BinaryHeap<ReadySession>,
}
impl WorkloadDriver {
pub(crate) fn new_trace(trace: Trace) -> Result<Self> {
Self::new(trace, DriverMode::Trace)
}
pub(crate) fn new_concurrency(trace: Trace) -> Result<Self> {
Self::new(trace, DriverMode::Concurrency)
}
fn new(trace: Trace, mode: DriverMode) -> Result<Self> {
let sessions: Vec<SessionRuntime> = trace
.sessions
.into_iter()
.map(|session| SessionRuntime {
session_id: session.session_id,
turns: session.turns,
next_turn_index: 0,
next_ready_at_ms: Some(match mode {
DriverMode::Trace => session.first_arrival_timestamp_ms.unwrap_or(0.0),
DriverMode::Concurrency => 0.0,
}),
in_flight: None,
})
.collect();
let ready_sessions = sessions
.iter()
.enumerate()
.filter_map(|(session_index, session)| {
Some(ReadySession {
ready_at_ms: session.next_ready_at_ms?,
session_index,
turn_index: session.next_turn_index,
})
})
.collect();
Ok(Self {
mode,
block_size: trace.block_size,
sessions,
in_flight: HashMap::new(),
ready_sessions,
})
}
pub fn pop_ready(&mut self, now_ms: f64, limit: usize) -> Vec<ReadyTurn> {
if limit == 0 {
return Vec::new();
}
let mut emitted = Vec::new();
while emitted.len() < limit {
let Some(ready_session) = self.ready_sessions.pop() else {
break;
};
if ready_session.ready_at_ms > now_ms {
self.ready_sessions.push(ready_session);
break;
}
let session_index = ready_session.session_index;
let session = &mut self.sessions[session_index];
if session.in_flight.is_some()
|| session.next_turn_index != ready_session.turn_index
|| session.next_ready_at_ms != Some(ready_session.ready_at_ms)
{
continue;
}
let turn_index = session.next_turn_index;
let scheduled_ready_at_ms = session
.next_ready_at_ms
.expect("ready session must have a timestamp");
let request_uuid = Uuid::new_v4();
let replay_hashes = session.turns[turn_index]
.to_replay_hashes(self.block_size)
.expect("validated trace should always synthesize replay hashes");
let arrival_timestamp_ms = match self.mode {
DriverMode::Trace => Some(scheduled_ready_at_ms),
DriverMode::Concurrency => None,
};
let request = session.turns[turn_index]
.to_direct_request(self.block_size, request_uuid, arrival_timestamp_ms)
.expect("validated trace should always synthesize into a direct request");
session.in_flight = Some(request_uuid);
session.next_ready_at_ms = None;
self.in_flight.insert(
request_uuid,
InFlightTurn {
session_index,
turn_index,
},
);
emitted.push(ReadyTurn {
request_uuid,
session_id: session.session_id.clone(),
turn_index,
scheduled_ready_at_ms,
replay_hashes: Some(replay_hashes),
request,
});
}
emitted
}
pub fn on_complete(&mut self, request_uuid: Uuid, now_ms: f64) -> Result<()> {
let in_flight = self
.in_flight
.remove(&request_uuid)
.ok_or_else(|| anyhow!("unknown workload request completion for {request_uuid}"))?;
let session = self
.sessions
.get_mut(in_flight.session_index)
.ok_or_else(|| anyhow!("unknown workload session {}", in_flight.session_index))?;
if session.in_flight != Some(request_uuid) {
bail!(
"session {} completion for {} does not match in-flight request {:?}",
session.session_id,
request_uuid,
session.in_flight
);
}
session.in_flight = None;
session.next_turn_index = in_flight.turn_index + 1;
if session.next_turn_index < session.turns.len() {
let ready_at_ms =
now_ms + session.turns[session.next_turn_index].delay_after_previous_ms;
session.next_ready_at_ms = Some(ready_at_ms);
self.ready_sessions.push(ReadySession {
ready_at_ms,
session_index: in_flight.session_index,
turn_index: session.next_turn_index,
});
} else {
session.next_ready_at_ms = None;
}
Ok(())
}
pub fn next_ready_time_ms(&mut self) -> Option<f64> {
loop {
let ready_session = *self.ready_sessions.peek()?;
let session = &self.sessions[ready_session.session_index];
if session.in_flight.is_some()
|| session.next_turn_index != ready_session.turn_index
|| session.next_ready_at_ms != Some(ready_session.ready_at_ms)
{
self.ready_sessions.pop();
continue;
}
return Some(ready_session.ready_at_ms);
}
}
pub fn is_drained(&self) -> bool {
self.in_flight.is_empty()
&& self
.sessions
.iter()
.all(|session| session.next_turn_index >= session.turns.len())
}
}
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
mod driver;
mod trace;
mod types;
pub use driver::WorkloadDriver;
pub use types::{
ArrivalSpec, DelaySpec, LengthSpec, ReadyTurn, ReplayRequestHashes, RouterSequence,
SequenceHashMode, SessionPartitionSpec, SessionTrace, SyntheticTraceSpec, Trace, TurnTrace,
};
#[cfg(test)]
mod tests;
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use dynamo_kv_router::protocols::{compute_block_hash_for_seq, compute_seq_hash_for_block};
use tempfile::NamedTempFile;
use uuid::Uuid;
use super::*;
fn write_trace(lines: &[serde_json::Value]) -> NamedTempFile {
let mut file = NamedTempFile::new().unwrap();
for line in lines {
use std::io::Write;
writeln!(file, "{}", serde_json::to_string(line).unwrap()).unwrap();
}
file
}
#[test]
fn test_from_mooncake_single_turn_preserves_fields() {
let file = write_trace(&[serde_json::json!({
"timestamp": 123.0,
"input_length": 8,
"output_length": 4,
"hash_ids": [7, 8],
})]);
let trace = Trace::from_mooncake(file.path(), 4).unwrap();
assert_eq!(trace.sessions.len(), 1);
let session = &trace.sessions[0];
assert_eq!(session.first_arrival_timestamp_ms, Some(123.0));
assert_eq!(session.turns.len(), 1);
assert_eq!(session.turns[0].input_length, 8);
assert_eq!(session.turns[0].max_output_tokens, 4);
assert_eq!(session.turns[0].hash_ids, vec![7, 8]);
}
#[test]
fn test_from_mooncake_multi_turn_uses_session_id_and_delay() {
let file = write_trace(&[
serde_json::json!({
"session_id": "a",
"timestamp": 10.0,
"input_length": 4,
"output_length": 1,
"hash_ids": [1],
}),
serde_json::json!({
"session_id": "a",
"delay": 25.0,
"input_length": 8,
"output_length": 2,
"hash_ids": [1, 2],
}),
serde_json::json!({
"session_id": "b",
"timestamp": 20.0,
"input_length": 4,
"output_length": 1,
"hash_ids": [3],
}),
]);
let trace = Trace::from_mooncake(file.path(), 4).unwrap();
assert_eq!(trace.sessions.len(), 2);
assert_eq!(trace.sessions[0].session_id, "a");
assert_eq!(trace.sessions[0].turns.len(), 2);
assert_eq!(trace.sessions[0].turns[1].delay_after_previous_ms, 25.0);
assert_eq!(trace.sessions[1].session_id, "b");
}
#[test]
fn test_from_mooncake_defaults_missing_input_length_from_hash_capacity() {
let file = write_trace(&[serde_json::json!({
"timestamp": 7.0,
"output_length": 3,
"hash_ids": [5, 6],
})]);
let trace = Trace::from_mooncake(file.path(), 4).unwrap();
assert_eq!(trace.sessions.len(), 1);
assert_eq!(trace.sessions[0].turns[0].input_length, 8);
}
#[test]
fn test_turn_to_direct_request_repeats_hash_ids_by_block_size() {
let turn = TurnTrace {
input_length: 6,
max_output_tokens: 3,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
};
let request = turn
.to_direct_request(4, Uuid::from_u128(1), Some(5.0))
.unwrap();
assert_eq!(request.tokens, vec![1, 1, 1, 1, 2, 2]);
assert_eq!(request.arrival_timestamp_ms, Some(5.0));
}
#[test]
fn test_turn_replay_hashes_match_full_blocks_only() {
let turn = TurnTrace {
input_length: 6,
max_output_tokens: 3,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
};
let request = turn
.to_direct_request(4, Uuid::from_u128(1), Some(5.0))
.unwrap();
let replay_hashes = turn.to_replay_hashes(4).unwrap();
let expected_local = compute_block_hash_for_seq(&request.tokens, 4, None, None);
assert_eq!(replay_hashes.local_block_hashes, expected_local);
assert_eq!(
replay_hashes.sequence_hashes,
compute_seq_hash_for_block(&expected_local)
);
assert_eq!(replay_hashes.local_block_hashes.len(), 1);
}
#[test]
fn test_partition_by_session_round_robin_keeps_sessions_intact() {
let trace = Trace::synthetic(SyntheticTraceSpec {
block_size: 4,
num_sessions: 4,
turns_per_session: 2,
input_tokens: LengthSpec {
mean: 8,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: 2,
stddev: 0.0,
},
shared_prefix_ratio: 0.5,
num_prefix_groups: 2,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: DelaySpec::ConstantMs(5.0),
seed: 7,
})
.unwrap();
let partitions =
trace.partition_by_session(SessionPartitionSpec::RoundRobin { num_partitions: 2 });
assert_eq!(partitions.len(), 2);
assert_eq!(partitions[0].sessions.len(), 2);
assert_eq!(partitions[1].sessions.len(), 2);
assert!(
partitions
.iter()
.flat_map(|partition| partition.sessions.iter())
.all(|session| session.turns.len() == 2)
);
}
#[test]
fn test_synthetic_prefix_groups_share_prefixes_within_group() {
let trace = Trace::synthetic(SyntheticTraceSpec {
block_size: 4,
num_sessions: 6,
turns_per_session: 1,
input_tokens: LengthSpec {
mean: 16,
stddev: 0.0,
},
output_tokens: LengthSpec {
mean: 2,
stddev: 0.0,
},
shared_prefix_ratio: 0.5,
num_prefix_groups: 2,
first_turn_arrivals: ArrivalSpec::Burst,
inter_turn_delays: DelaySpec::None,
seed: 42,
})
.unwrap();
let prefix_len = 2;
let prefixes = trace
.sessions
.iter()
.map(|session| session.turns[0].hash_ids[..prefix_len].to_vec())
.collect::<Vec<_>>();
assert!(prefixes.windows(2).any(|window| window[0] == window[1]));
}
#[test]
fn test_expand_hash_prefix_depth_scales_hashes_and_input_length() {
let trace = Trace {
block_size: 4,
sessions: vec![SessionTrace {
session_id: "session".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![TurnTrace {
input_length: 6,
max_output_tokens: 2,
hash_ids: vec![7, 8],
delay_after_previous_ms: 0.0,
}],
}],
}
.expand_hash_prefix_depth(3);
let turn = &trace.sessions[0].turns[0];
assert_eq!(turn.input_length, 18);
assert_eq!(turn.hash_ids, vec![21, 22, 23, 24, 25, 26]);
let request = turn
.to_direct_request(trace.block_size, Uuid::from_u128(2), Some(10.0))
.unwrap();
assert_eq!(request.tokens.len(), 18);
}
#[test]
fn test_rescale_ready_span_scales_session_starts_and_inter_turn_delays() {
let trace = Trace {
block_size: 4,
sessions: vec![
SessionTrace {
session_id: "a".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![1],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![2],
delay_after_previous_ms: 20.0,
},
],
},
SessionTrace {
session_id: "b".to_string(),
first_arrival_timestamp_ms: Some(30.0),
turns: vec![TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![3],
delay_after_previous_ms: 0.0,
}],
},
],
}
.rescale_ready_span(100)
.unwrap();
assert_eq!(trace.sessions[0].first_arrival_timestamp_ms, Some(0.0));
assert_eq!(trace.sessions[1].first_arrival_timestamp_ms, Some(100.0));
assert_eq!(trace.sessions[0].turns[1].delay_after_previous_ms, 100.0);
}
#[test]
fn test_driver_requires_completion_before_follow_up_turn() {
let trace = Trace {
block_size: 4,
sessions: vec![SessionTrace {
session_id: "s".to_string(),
first_arrival_timestamp_ms: Some(0.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![1],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![2],
delay_after_previous_ms: 10.0,
},
],
}],
};
let mut driver = trace.into_trace_driver().unwrap();
let first = driver.pop_ready(0.0, 1);
assert_eq!(first.len(), 1);
assert!(driver.pop_ready(100.0, 1).is_empty());
driver.on_complete(first[0].request_uuid, 5.0).unwrap();
assert!(driver.pop_ready(14.0, 1).is_empty());
let second = driver.pop_ready(15.0, 1);
assert_eq!(second.len(), 1);
assert_eq!(second[0].turn_index, 1);
}
#[test]
fn test_driver_next_ready_time_tracks_earliest_pending_turn() {
let trace = Trace {
block_size: 4,
sessions: vec![
SessionTrace {
session_id: "a".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![1],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![2],
delay_after_previous_ms: 5.0,
},
],
},
SessionTrace {
session_id: "b".to_string(),
first_arrival_timestamp_ms: Some(20.0),
turns: vec![TurnTrace {
input_length: 4,
max_output_tokens: 1,
hash_ids: vec![3],
delay_after_previous_ms: 0.0,
}],
},
],
};
let mut driver = trace.into_trace_driver().unwrap();
assert_eq!(driver.next_ready_time_ms(), Some(10.0));
let first = driver.pop_ready(10.0, 1);
assert_eq!(first.len(), 1);
assert_eq!(driver.next_ready_time_ms(), Some(20.0));
driver.on_complete(first[0].request_uuid, 25.0).unwrap();
assert_eq!(driver.next_ready_time_ms(), Some(20.0));
let second = driver.pop_ready(20.0, 1);
assert_eq!(second.len(), 1);
assert_eq!(driver.next_ready_time_ms(), Some(30.0));
}
#[test]
fn test_trace_driver_round_trips_turn_semantics_into_ready_requests() {
let trace = Trace {
block_size: 2,
sessions: vec![
SessionTrace {
session_id: "session-a".to_string(),
first_arrival_timestamp_ms: Some(10.0),
turns: vec![
TurnTrace {
input_length: 4,
max_output_tokens: 2,
hash_ids: vec![1, 2],
delay_after_previous_ms: 0.0,
},
TurnTrace {
input_length: 2,
max_output_tokens: 3,
hash_ids: vec![3],
delay_after_previous_ms: 5.0,
},
],
},
SessionTrace {
session_id: "session-b".to_string(),
first_arrival_timestamp_ms: Some(12.0),
turns: vec![TurnTrace {
input_length: 2,
max_output_tokens: 1,
hash_ids: vec![4],
delay_after_previous_ms: 0.0,
}],
},
],
};
let expected = trace.clone();
let mut driver = trace.into_trace_driver().unwrap();
assert!(driver.pop_ready(9.0, usize::MAX).is_empty());
let first = driver.pop_ready(10.0, usize::MAX);
assert_eq!(first.len(), 1);
let first = &first[0];
assert_eq!(first.session_id, "session-a");
assert_eq!(first.turn_index, 0);
assert_eq!(first.scheduled_ready_at_ms, 10.0);
assert_eq!(
first.request.tokens.len(),
expected.sessions[0].turns[0].input_length
);
assert_eq!(
first.request.max_output_tokens,
expected.sessions[0].turns[0].max_output_tokens
);
assert_eq!(first.request.arrival_timestamp_ms, Some(10.0));
assert_eq!(
first.replay_hashes.as_ref(),
Some(
&expected.sessions[0].turns[0]
.to_replay_hashes(expected.block_size)
.unwrap()
)
);
let expected_first_request = expected.sessions[0].turns[0]
.to_direct_request(expected.block_size, first.request_uuid, Some(10.0))
.unwrap();
assert_eq!(first.request.tokens, expected_first_request.tokens);
assert_eq!(
first.request.max_output_tokens,
expected_first_request.max_output_tokens
);
assert_eq!(first.request.uuid, expected_first_request.uuid);
assert_eq!(
first.request.arrival_timestamp_ms,
expected_first_request.arrival_timestamp_ms
);
let second = driver.pop_ready(12.0, usize::MAX);
assert_eq!(second.len(), 1);
let second = &second[0];
assert_eq!(second.session_id, "session-b");
assert_eq!(second.turn_index, 0);
assert_eq!(second.scheduled_ready_at_ms, 12.0);
assert_eq!(
second.request.tokens.len(),
expected.sessions[1].turns[0].input_length
);
assert_eq!(
second.request.max_output_tokens,
expected.sessions[1].turns[0].max_output_tokens
);
assert_eq!(second.request.arrival_timestamp_ms, Some(12.0));
assert_eq!(
second.replay_hashes.as_ref(),
Some(
&expected.sessions[1].turns[0]
.to_replay_hashes(expected.block_size)
.unwrap()
)
);
driver.on_complete(first.request_uuid, 20.0).unwrap();
assert!(driver.pop_ready(24.0, usize::MAX).is_empty());
let third = driver.pop_ready(25.0, usize::MAX);
assert_eq!(third.len(), 1);
let third = &third[0];
assert_eq!(third.session_id, "session-a");
assert_eq!(third.turn_index, 1);
assert_eq!(third.scheduled_ready_at_ms, 25.0);
assert_eq!(
third.request.tokens.len(),
expected.sessions[0].turns[1].input_length
);
assert_eq!(
third.request.max_output_tokens,
expected.sessions[0].turns[1].max_output_tokens
);
assert_eq!(third.request.arrival_timestamp_ms, Some(25.0));
assert_eq!(
third.replay_hashes.as_ref(),
Some(
&expected.sessions[0].turns[1]
.to_replay_hashes(expected.block_size)
.unwrap()
)
);
let expected_third_request = expected.sessions[0].turns[1]
.to_direct_request(expected.block_size, third.request_uuid, Some(25.0))
.unwrap();
assert_eq!(third.request.tokens, expected_third_request.tokens);
assert_eq!(
third.request.max_output_tokens,
expected_third_request.max_output_tokens
);
assert_eq!(third.request.uuid, expected_third_request.uuid);
assert_eq!(
third.request.arrival_timestamp_ms,
expected_third_request.arrival_timestamp_ms
);
}
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use std::collections::HashMap;
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
use anyhow::{Context, Result, anyhow, bail};
use dynamo_kv_router::LocalBlockHash;
use dynamo_kv_router::protocols::{
ExternalSequenceBlockHash, WorkerId, XXH3_SEED, compute_seq_hash_for_block,
};
use dynamo_tokens::compute_hash_v2;
use rand::rngs::StdRng;
use rand::{Rng, SeedableRng};
use serde::Deserialize;
use uuid::Uuid;
use super::driver::WorkloadDriver;
use super::types::{
ArrivalSpec, DelaySpec, LengthSpec, ReplayRequestHashes, RouterSequence, SequenceHashMode,
SessionPartitionSpec, SessionTrace, SyntheticTraceSpec, Trace, TurnTrace,
};
use crate::common::protocols::DirectRequest;
#[derive(Debug, Deserialize)]
struct RawMooncakeRecord {
#[serde(default)]
session_id: Option<String>,
#[serde(default)]
timestamp: Option<f64>,
#[serde(default)]
created_time: Option<f64>,
#[serde(default, alias = "input_tokens")]
input_length: Option<usize>,
#[serde(default, alias = "output_tokens")]
output_length: Option<usize>,
#[serde(default)]
hash_ids: Option<Vec<u64>>,
#[serde(default)]
delay: Option<f64>,
#[serde(default)]
delay_ms: Option<f64>,
}
impl TurnTrace {
fn validate_block_size_and_capacity(&self, block_size: usize) -> Result<()> {
if block_size == 0 {
bail!("block_size must be greater than 0");
}
if self.hash_ids.len() * block_size < self.input_length {
bail!(
"input_length {} exceeds synthesized capacity {}",
self.input_length,
self.hash_ids.len() * block_size
);
}
Ok(())
}
pub fn to_direct_request(
&self,
block_size: usize,
request_uuid: Uuid,
arrival_timestamp_ms: Option<f64>,
) -> Result<DirectRequest> {
self.validate_block_size_and_capacity(block_size)?;
let mut tokens = Vec::with_capacity(self.input_length);
for &hash_id in &self.hash_ids {
let token_id = hash_id as u32;
tokens.extend((0..block_size).map(|_| token_id));
if tokens.len() >= self.input_length {
tokens.truncate(self.input_length);
break;
}
}
if tokens.len() != self.input_length {
bail!(
"failed to synthesize {} tokens from {} hash_ids",
self.input_length,
self.hash_ids.len()
);
}
Ok(DirectRequest {
tokens,
max_output_tokens: self.max_output_tokens,
uuid: Some(request_uuid),
dp_rank: 0,
arrival_timestamp_ms,
})
}
pub fn to_replay_hashes(&self, block_size: usize) -> Result<ReplayRequestHashes> {
self.validate_block_size_and_capacity(block_size)?;
let num_full_blocks = self.input_length / block_size;
let local_block_hashes = self
.hash_ids
.iter()
.take(num_full_blocks)
.map(|&hash_id| local_block_hash_from_id(hash_id, block_size))
.collect::<Vec<_>>();
let sequence_hashes = compute_seq_hash_for_block(&local_block_hashes);
Ok(ReplayRequestHashes {
local_block_hashes,
sequence_hashes,
})
}
}
impl Trace {
pub fn from_mooncake(path: &Path, block_size: usize) -> Result<Self> {
if block_size == 0 {
bail!("block_size must be greater than 0");
}
let file = File::open(path)
.with_context(|| format!("failed to open trace file {}", path.display()))?;
let reader = BufReader::new(file);
let mut sessions = Vec::new();
let mut session_indices = HashMap::new();
let mut last_timestamps: Vec<Option<f64>> = Vec::new();
for (line_idx, line) in reader.lines().enumerate() {
let line = line.with_context(|| {
format!(
"failed to read line {} from {}",
line_idx + 1,
path.display()
)
})?;
if line.trim().is_empty() {
continue;
}
let raw: RawMooncakeRecord = serde_json::from_str(&line).with_context(|| {
format!(
"failed to parse line {} from {} as JSON",
line_idx + 1,
path.display()
)
})?;
let session_id = raw
.session_id
.unwrap_or_else(|| format!("request_{}", line_idx + 1));
let hash_ids = raw
.hash_ids
.ok_or_else(|| anyhow!("trace line {} is missing hash_ids", line_idx + 1))?;
let input_length = raw.input_length.unwrap_or(hash_ids.len() * block_size);
let output_length = raw
.output_length
.ok_or_else(|| anyhow!("trace line {} is missing output_length", line_idx + 1))?;
let timestamp_ms = raw.timestamp.or(raw.created_time);
let explicit_delay_ms = raw.delay.or(raw.delay_ms);
let session_index = *session_indices
.entry(session_id.clone())
.or_insert_with(|| {
let idx = sessions.len();
sessions.push(SessionTrace {
session_id: session_id.clone(),
first_arrival_timestamp_ms: timestamp_ms,
turns: Vec::new(),
});
last_timestamps.push(timestamp_ms);
idx
});
let session = sessions
.get_mut(session_index)
.expect("newly inserted session must exist");
let turn_idx = session.turns.len();
let delay_after_previous_ms = if turn_idx == 0 {
let delay = explicit_delay_ms.unwrap_or(0.0);
if delay != 0.0 {
bail!(
"trace line {} sets delay on the first turn of session {}",
line_idx + 1,
session.session_id
);
}
0.0
} else if let Some(delay_ms) = explicit_delay_ms {
delay_ms
} else if let Some(timestamp_ms) = timestamp_ms {
let previous_timestamp_ms = last_timestamps[session_index].ok_or_else(|| {
anyhow!(
"trace line {} for session {} cannot infer delay without a previous timestamp",
line_idx + 1,
session.session_id
)
})?;
timestamp_ms - previous_timestamp_ms
} else {
0.0
};
if !delay_after_previous_ms.is_finite() || delay_after_previous_ms < 0.0 {
bail!(
"trace line {} has invalid delay {}",
line_idx + 1,
delay_after_previous_ms
);
}
if hash_ids.len() * block_size < input_length {
bail!(
"trace line {} input_length {} exceeds synthesized capacity {}",
line_idx + 1,
input_length,
hash_ids.len() * block_size
);
}
session.turns.push(TurnTrace {
input_length,
max_output_tokens: output_length,
hash_ids,
delay_after_previous_ms,
});
if let Some(timestamp_ms) = timestamp_ms {
last_timestamps[session_index] = Some(timestamp_ms);
}
}
if sessions.is_empty() {
bail!("trace file {} did not contain any requests", path.display());
}
Ok(Self {
block_size,
sessions,
})
}
pub fn synthetic(spec: SyntheticTraceSpec) -> Result<Self> {
if spec.block_size == 0 {
bail!("block_size must be greater than 0");
}
if spec.num_sessions == 0 {
bail!("num_sessions must be greater than 0");
}
if spec.turns_per_session == 0 {
bail!("turns_per_session must be greater than 0");
}
if !(0.0..=1.0).contains(&spec.shared_prefix_ratio) {
bail!(
"shared_prefix_ratio must be between 0.0 and 1.0, got {}",
spec.shared_prefix_ratio
);
}
let mut rng = StdRng::seed_from_u64(spec.seed);
let mut sessions = Vec::with_capacity(spec.num_sessions);
let mut first_arrivals = Vec::with_capacity(spec.num_sessions);
let mean_gap_ms = arrival_spec_mean_gap_ms(&spec.first_turn_arrivals)?;
let mut next_arrival_ms = 0.0;
for session_idx in 0..spec.num_sessions {
if session_idx == 0 {
first_arrivals.push(0.0);
continue;
}
next_arrival_ms +=
sample_arrival_gap_ms(&spec.first_turn_arrivals, mean_gap_ms, &mut rng)?;
first_arrivals.push(next_arrival_ms);
}
let mut next_unique_hash = 1_u64;
for (session_idx, first_arrival_timestamp_ms) in first_arrivals.into_iter().enumerate() {
let group_id = if spec.num_prefix_groups > 0 && spec.shared_prefix_ratio > 0.0 {
Some(rng.random_range(0..spec.num_prefix_groups) as u64)
} else {
None
};
let mut turns = Vec::with_capacity(spec.turns_per_session);
for turn_idx in 0..spec.turns_per_session {
let input_length = sample_length(&spec.input_tokens, 1, &mut rng);
let max_output_tokens = sample_length(&spec.output_tokens, 1, &mut rng);
let num_blocks = input_length.div_ceil(spec.block_size);
let prefix_blocks =
((num_blocks as f64) * spec.shared_prefix_ratio).round() as usize;
let prefix_blocks = prefix_blocks.min(num_blocks);
let mut hash_ids = Vec::with_capacity(num_blocks);
for block_idx in 0..prefix_blocks {
if let Some(group_id) = group_id {
hash_ids.push(0xD00D_0000_0000_0000 | (group_id << 32) | block_idx as u64);
}
}
while hash_ids.len() < num_blocks {
hash_ids.push(next_unique_hash);
next_unique_hash = next_unique_hash
.checked_add(1)
.expect("synthetic hash id overflow");
}
turns.push(TurnTrace {
input_length,
max_output_tokens,
hash_ids,
delay_after_previous_ms: if turn_idx == 0 {
0.0
} else {
sample_delay_ms(&spec.inter_turn_delays, &mut rng)?
},
});
}
sessions.push(SessionTrace {
session_id: format!("session_{session_idx}"),
first_arrival_timestamp_ms: Some(first_arrival_timestamp_ms),
turns,
});
}
Ok(Self {
block_size: spec.block_size,
sessions,
})
}
pub fn validate_for_trace_mode(&self) -> Result<()> {
self.validate(false)
}
pub fn validate_for_concurrency_mode(&self) -> Result<()> {
self.validate(true)
}
pub fn normalize_session_starts(mut self) -> Result<Self> {
let Some(min_timestamp_ms) = self
.sessions
.iter()
.filter_map(|session| session.first_arrival_timestamp_ms)
.min_by(|left, right| left.total_cmp(right))
else {
return Ok(self);
};
for session in &mut self.sessions {
if let Some(timestamp_ms) = session.first_arrival_timestamp_ms.as_mut() {
*timestamp_ms -= min_timestamp_ms;
}
}
Ok(self)
}
pub fn speed_up_timing(mut self, ratio: f64) -> Result<Self> {
if !ratio.is_finite() || ratio <= 0.0 {
bail!("ratio must be a finite positive number, got {ratio}");
}
for session in &mut self.sessions {
if let Some(timestamp_ms) = session.first_arrival_timestamp_ms.as_mut() {
*timestamp_ms /= ratio;
}
for turn in &mut session.turns {
turn.delay_after_previous_ms /= ratio;
}
}
Ok(self)
}
pub fn rescale_session_start_span(mut self, duration_ms: u64) -> Result<Self> {
let Some(min_timestamp_ms) = self
.sessions
.iter()
.filter_map(|session| session.first_arrival_timestamp_ms)
.min_by(|left, right| left.total_cmp(right))
else {
return Ok(self);
};
let Some(max_timestamp_ms) = self
.sessions
.iter()
.filter_map(|session| session.first_arrival_timestamp_ms)
.max_by(|left, right| left.total_cmp(right))
else {
return Ok(self);
};
let target_span_ms = duration_ms as f64;
let source_span_ms = max_timestamp_ms - min_timestamp_ms;
for session in &mut self.sessions {
if let Some(timestamp_ms) = session.first_arrival_timestamp_ms.as_mut() {
*timestamp_ms = if source_span_ms == 0.0 {
0.0
} else {
(*timestamp_ms - min_timestamp_ms) * target_span_ms / source_span_ms
};
}
}
Ok(self)
}
pub fn rescale_ready_span(mut self, duration_ms: u64) -> Result<Self> {
let Some(min_start_ms) = self
.sessions
.iter()
.map(|session| session.first_arrival_timestamp_ms.unwrap_or(0.0))
.min_by(|left, right| left.total_cmp(right))
else {
return Ok(self);
};
let Some(max_ready_ms) = self
.sessions
.iter()
.map(|session| {
session.first_arrival_timestamp_ms.unwrap_or(0.0)
+ session
.turns
.iter()
.enumerate()
.filter(|(turn_idx, _)| *turn_idx > 0)
.map(|(_, turn)| turn.delay_after_previous_ms)
.sum::<f64>()
})
.max_by(|left, right| left.total_cmp(right))
else {
return Ok(self);
};
let ratio = duration_ms as f64 / (max_ready_ms - min_start_ms).max(1.0);
for session in &mut self.sessions {
if let Some(start_ms) = session.first_arrival_timestamp_ms.as_mut() {
*start_ms = (*start_ms - min_start_ms) * ratio;
}
for (turn_idx, turn) in session.turns.iter_mut().enumerate() {
if turn_idx > 0 {
turn.delay_after_previous_ms *= ratio;
}
}
}
Ok(self)
}
pub fn expand_hash_prefix_depth(mut self, factor: usize) -> Self {
if factor <= 1 {
return self;
}
for session in &mut self.sessions {
for turn in &mut session.turns {
turn.input_length = turn
.input_length
.checked_mul(factor)
.expect("input_length expansion overflow");
turn.hash_ids = turn
.hash_ids
.iter()
.flat_map(|&hash_id| {
let base = hash_id
.checked_mul(factor as u64)
.expect("hash prefix expansion overflow");
(0..factor as u64).map(move |offset| base + offset)
})
.collect();
}
}
self
}
pub fn duplicate_hash_space(mut self, copies: usize) -> Self {
if copies <= 1 {
return self;
}
let max_hash_id = self
.sessions
.iter()
.flat_map(|session| session.turns.iter())
.flat_map(|turn| turn.hash_ids.iter().copied())
.max()
.unwrap_or(0);
let offset_base = max_hash_id + 1;
let original_sessions = self.sessions.clone();
self.sessions.clear();
for copy_idx in 0..copies {
let offset = offset_base * copy_idx as u64;
for session in &original_sessions {
let mut duplicated = session.clone();
duplicated.session_id = format!("{}:copy_{copy_idx}", session.session_id);
for turn in &mut duplicated.turns {
turn.hash_ids = turn
.hash_ids
.iter()
.map(|&hash_id| {
hash_id
.checked_add(offset)
.expect("hash duplication overflow")
})
.collect();
}
self.sessions.push(duplicated);
}
}
self
}
pub fn partition_by_session(&self, spec: SessionPartitionSpec) -> Vec<Self> {
let num_partitions = match spec {
SessionPartitionSpec::Random { num_partitions, .. } => num_partitions,
SessionPartitionSpec::RoundRobin { num_partitions } => num_partitions,
}
.max(1);
let mut partitions = vec![
Self {
block_size: self.block_size,
sessions: Vec::new(),
};
num_partitions
];
let mut rng = match spec {
SessionPartitionSpec::Random { seed, .. } => Some(StdRng::seed_from_u64(seed)),
SessionPartitionSpec::RoundRobin { .. } => None,
};
for (session_idx, session) in self.sessions.iter().cloned().enumerate() {
let partition_idx = match spec {
SessionPartitionSpec::Random { .. } => rng
.as_mut()
.expect("random partitioner must exist")
.random_range(0..num_partitions),
SessionPartitionSpec::RoundRobin { .. } => session_idx % num_partitions,
};
partitions[partition_idx].sessions.push(session);
}
partitions
}
pub fn to_single_turn_requests(&self) -> Result<Vec<DirectRequest>> {
let mut requests = Vec::with_capacity(self.sessions.len());
for session in &self.sessions {
if session.turns.len() != 1 {
bail!(
"to_single_turn_requests requires exactly one turn per session, but session {} has {} turns",
session.session_id,
session.turns.len()
);
}
requests.push(session.turns[0].to_direct_request(
self.block_size,
Uuid::new_v4(),
session.first_arrival_timestamp_ms,
)?);
}
Ok(requests)
}
pub fn to_router_sequences(
&self,
worker_id: WorkerId,
hash_mode: SequenceHashMode,
) -> Result<Vec<RouterSequence>> {
let mut sequences = Vec::new();
for session in &self.sessions {
for turn in &session.turns {
let local_hashes = turn
.hash_ids
.iter()
.map(|&hash_id| local_block_hash_from_id(hash_id, self.block_size))
.collect::<Vec<_>>();
let external_hashes = match hash_mode {
SequenceHashMode::Raw => local_hashes
.iter()
.map(|hash| ExternalSequenceBlockHash(hash.0))
.collect(),
SequenceHashMode::Cumulative => compute_seq_hash_for_block(&local_hashes)
.into_iter()
.map(ExternalSequenceBlockHash)
.collect(),
};
sequences.push(RouterSequence {
worker_id,
local_hashes,
external_hashes,
});
}
}
Ok(sequences)
}
pub fn into_trace_driver(self) -> Result<WorkloadDriver> {
self.validate_for_trace_mode()?;
WorkloadDriver::new_trace(self)
}
pub fn into_concurrency_driver(self) -> Result<WorkloadDriver> {
self.validate_for_concurrency_mode()?;
WorkloadDriver::new_concurrency(self)
}
fn validate(&self, allow_missing_first_timestamp: bool) -> Result<()> {
if self.block_size == 0 {
bail!("block_size must be greater than 0");
}
if self.sessions.is_empty() {
bail!("trace must contain at least one session");
}
for session in &self.sessions {
if session.turns.is_empty() {
bail!(
"session {} must contain at least one turn",
session.session_id
);
}
if !allow_missing_first_timestamp {
let timestamp_ms = session.first_arrival_timestamp_ms.ok_or_else(|| {
anyhow!(
"trace mode requires first_arrival_timestamp_ms for session {}",
session.session_id
)
})?;
if !timestamp_ms.is_finite() || timestamp_ms < 0.0 {
bail!(
"session {} has invalid first_arrival_timestamp_ms {}",
session.session_id,
timestamp_ms
);
}
} else if let Some(timestamp_ms) = session.first_arrival_timestamp_ms
&& (!timestamp_ms.is_finite() || timestamp_ms < 0.0)
{
bail!(
"session {} has invalid first_arrival_timestamp_ms {}",
session.session_id,
timestamp_ms
);
}
for (turn_idx, turn) in session.turns.iter().enumerate() {
if turn.input_length == 0 {
bail!(
"session {} turn {} must have a positive input_length",
session.session_id,
turn_idx
);
}
if turn.hash_ids.is_empty() {
bail!(
"session {} turn {} must contain at least one hash id",
session.session_id,
turn_idx
);
}
if turn.hash_ids.len() * self.block_size < turn.input_length {
bail!(
"session {} turn {} input_length {} exceeds synthesized capacity {}",
session.session_id,
turn_idx,
turn.input_length,
turn.hash_ids.len() * self.block_size
);
}
if !turn.delay_after_previous_ms.is_finite() || turn.delay_after_previous_ms < 0.0 {
bail!(
"session {} turn {} has invalid delay {}",
session.session_id,
turn_idx,
turn.delay_after_previous_ms
);
}
if turn_idx == 0 && turn.delay_after_previous_ms != 0.0 {
bail!(
"session {} first turn must have delay_after_previous_ms == 0.0",
session.session_id
);
}
}
}
Ok(())
}
}
fn arrival_spec_mean_gap_ms(spec: &ArrivalSpec) -> Result<f64> {
match spec {
ArrivalSpec::Burst => Ok(0.0),
ArrivalSpec::ConstantQps { qps }
| ArrivalSpec::PoissonQps { qps }
| ArrivalSpec::GammaQps { qps, .. } => {
if !qps.is_finite() || *qps <= 0.0 {
bail!("qps must be a finite positive number, got {qps}");
}
Ok(1000.0 / qps)
}
}
}
fn sample_arrival_gap_ms(spec: &ArrivalSpec, mean_gap_ms: f64, rng: &mut StdRng) -> Result<f64> {
match spec {
ArrivalSpec::Burst => Ok(0.0),
ArrivalSpec::ConstantQps { .. } => Ok(mean_gap_ms),
ArrivalSpec::PoissonQps { .. } => Ok(sample_exponential_ms(mean_gap_ms, rng)),
ArrivalSpec::GammaQps { smoothness, .. } => {
if !smoothness.is_finite() || *smoothness <= 0.0 {
bail!("gamma smoothness must be a finite positive number, got {smoothness}");
}
Ok(sample_gamma_ms(*smoothness, mean_gap_ms / smoothness, rng))
}
}
}
fn sample_delay_ms(spec: &DelaySpec, rng: &mut StdRng) -> Result<f64> {
match spec {
DelaySpec::None => Ok(0.0),
DelaySpec::ConstantMs(delay_ms) => {
if !delay_ms.is_finite() || *delay_ms < 0.0 {
bail!("delay must be a finite non-negative number, got {delay_ms}");
}
Ok(*delay_ms)
}
DelaySpec::ExponentialMs { mean_ms } => {
if !mean_ms.is_finite() || *mean_ms < 0.0 {
bail!("mean_ms must be a finite non-negative number, got {mean_ms}");
}
Ok(sample_exponential_ms(*mean_ms, rng))
}
}
}
fn sample_length(spec: &LengthSpec, min_value: usize, rng: &mut StdRng) -> usize {
if spec.stddev == 0.0 {
return spec.mean.max(min_value);
}
let stddev = spec.stddev.abs();
let u1 = (1.0 - rng.random::<f64>()).clamp(f64::MIN_POSITIVE, 1.0);
let u2 = rng.random::<f64>();
let z0 = (-2.0 * u1.ln()).sqrt() * (std::f64::consts::TAU * u2).cos();
let sample = spec.mean as f64 + z0 * stddev;
sample.round().max(min_value as f64) as usize
}
fn sample_exponential_ms(mean_ms: f64, rng: &mut StdRng) -> f64 {
if mean_ms == 0.0 {
return 0.0;
}
let u = (1.0 - rng.random::<f64>()).clamp(f64::MIN_POSITIVE, 1.0);
-mean_ms * u.ln()
}
fn sample_gamma_ms(shape: f64, scale: f64, rng: &mut StdRng) -> f64 {
if scale == 0.0 {
return 0.0;
}
if shape < 1.0 {
let u = (1.0 - rng.random::<f64>()).clamp(f64::MIN_POSITIVE, 1.0);
return sample_gamma_ms(shape + 1.0, scale, rng) * u.powf(1.0 / shape);
}
let d = shape - 1.0 / 3.0;
let c = (1.0 / (9.0 * d)).sqrt();
loop {
let u1 = (1.0 - rng.random::<f64>()).clamp(f64::MIN_POSITIVE, 1.0);
let u2 = rng.random::<f64>();
let z = (-2.0 * u1.ln()).sqrt() * (std::f64::consts::TAU * u2).cos();
let v = (1.0 + c * z).powi(3);
if v <= 0.0 {
continue;
}
let u = rng.random::<f64>();
if u < 1.0 - 0.0331 * z.powi(4) {
return d * v * scale;
}
if u.ln() < 0.5 * z * z + d * (1.0 - v + v.ln()) {
return d * v * scale;
}
}
}
fn local_block_hash_from_id(hash_id: u64, block_size: usize) -> LocalBlockHash {
let tokens: Vec<u32> = (0..block_size).map(|_| hash_id as u32).collect();
let bytes = unsafe {
std::slice::from_raw_parts(
tokens.as_ptr() as *const u8,
std::mem::size_of_val(tokens.as_slice()),
)
};
LocalBlockHash(compute_hash_v2(bytes, XXH3_SEED))
}
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
use dynamo_kv_router::LocalBlockHash;
use dynamo_kv_router::protocols::{ExternalSequenceBlockHash, WorkerId};
use dynamo_tokens::SequenceHash;
use uuid::Uuid;
use crate::common::protocols::DirectRequest;
#[derive(Debug, Clone)]
pub struct Trace {
pub block_size: usize,
pub sessions: Vec<SessionTrace>,
}
#[derive(Debug, Clone)]
pub struct SessionTrace {
pub session_id: String,
pub first_arrival_timestamp_ms: Option<f64>,
pub turns: Vec<TurnTrace>,
}
#[derive(Debug, Clone)]
pub struct TurnTrace {
pub input_length: usize,
pub max_output_tokens: usize,
pub hash_ids: Vec<u64>,
pub delay_after_previous_ms: f64,
}
#[derive(Debug, Clone)]
pub struct LengthSpec {
pub mean: usize,
pub stddev: f64,
}
#[derive(Debug, Clone)]
pub enum ArrivalSpec {
Burst,
ConstantQps { qps: f64 },
PoissonQps { qps: f64 },
GammaQps { qps: f64, smoothness: f64 },
}
#[derive(Debug, Clone)]
pub enum DelaySpec {
None,
ConstantMs(f64),
ExponentialMs { mean_ms: f64 },
}
#[derive(Debug, Clone)]
pub struct SyntheticTraceSpec {
pub block_size: usize,
pub num_sessions: usize,
pub turns_per_session: usize,
pub input_tokens: LengthSpec,
pub output_tokens: LengthSpec,
pub shared_prefix_ratio: f64,
pub num_prefix_groups: usize,
pub first_turn_arrivals: ArrivalSpec,
pub inter_turn_delays: DelaySpec,
pub seed: u64,
}
#[derive(Debug, Clone, Copy)]
pub enum SequenceHashMode {
Raw,
Cumulative,
}
#[derive(Debug, Clone, Copy)]
pub enum SessionPartitionSpec {
Random { num_partitions: usize, seed: u64 },
RoundRobin { num_partitions: usize },
}
#[derive(Debug, Clone)]
pub struct RouterSequence {
pub worker_id: WorkerId,
pub local_hashes: Vec<LocalBlockHash>,
pub external_hashes: Vec<ExternalSequenceBlockHash>,
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ReplayRequestHashes {
pub local_block_hashes: Vec<LocalBlockHash>,
pub sequence_hashes: Vec<SequenceHash>,
}
#[derive(Debug, Clone)]
pub struct ReadyTurn {
pub request_uuid: Uuid,
pub session_id: String,
pub turn_index: usize,
pub scheduled_ready_at_ms: f64,
pub replay_hashes: Option<ReplayRequestHashes>,
pub request: DirectRequest,
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment